Gradient descent
- Start at a random location.
- Make a small step downhill.
- Stop when around you everything is higher than where you are.
- Problem is that depending on the starting point this can lead us to different local(!) minumum.
- Learning rate (alpha) - the size of the steps we take on every iteration.
- Derivative term - (a function of a and b).
- If the learning rate is too large, the algorithm might diverge.
- If the learning rate is too small, it might take a lot of steps to converge.
- The above cost function of Linear regression is a Convex function so there is only one local minimum which is also the global minimum.
- "Batch" Gradient Descent - means that at every step we use all the training examples.
- There are other versions of Gradient descent.