Normal Equation


  • An analytical way to find the best function
numpy.linalg.pinv(x.transpose * x) * x.transpose * y

  • Gradient Descent vs. Normal Equation
  • The latter migh work faster but only if the number of features is small. n = 10,000 might be the limit, depending on the computer power.
  • Noninvertibility
  • Redundant features: If two features are linearly dependent then the matrix is noninvertable (e.g. area in square mater and square feet)
  • Too many features (m <= n) - delete some features or use regularization