Gradient descent


  • Start at a random location.
  • Make a small step downhill.
  • Stop when around you everything is higher than where you are.
  • Problem is that depending on the starting point this can lead us to different local(!) minumum.
  • Learning rate (alpha) - the size of the steps we take on every iteration.
  • Derivative term - (a function of a and b).
  • If the learning rate is too large, the algorithm might diverge.
  • If the learning rate is too small, it might take a lot of steps to converge.
  • The above cost function of Linear regression is a Convex function so there is only one local minimum which is also the global minimum.
  • "Batch" Gradient Descent - means that at every step we use all the training examples.
  • There are other versions of Gradient descent.