Gradient descent

Gradient descent

Gradient descent a generic algorithm to find a local minimum of a function.

Start at a random location.
Make a small step downhill.
Stop when around you everything is higher than where you are.

Problem is that depending on the starting point this can lead us to different local(!) minumum.
Learning rate (alpha) - the size of the steps we take on every iteration.
Derivative term - (a function of a and b).

If the learning rate is too large, the algorithm might diverge.
If the learning rate is too small, it might take a lot of steps to converge.

The above cost function of Linear regression is a Convex function so there is only one local minimum which is also the global minimum.

"Batch" Gradient Descent - means that at every step we use all the training examples.
There are other versions of Gradient descent.