Gradient Descent Introduction

Gradient descent is an optimization algorithm for finding the maximum/minimum of some function. It is an iterative approach where you are calculating the slope of a derivative function to move step wise to the maximum/minimum value.


    \[f(x) = x^2 + 3\]

    \[\frac{dy}{dx}f(x) = 2x\]

The slope on this example is at 2. By decreasing/increasing x step wise and comparing the result with the previous one, the algorithm moves towards the maximum/minimum value.

The step size is also commonly known as the learning rate. A smaller learning rate means a more precise result but as a drawback it requires more iterations and has therefore a worse run time.

Gradient descent algorithm

If no further background information is available, a random initialization position can be used. Putting it all together one iteration process is defined with the formula:

    \[x_{new} = x_{old} - (learning  rate) * \frac{dy}{dx}f(x_{old})\]

Implementing the gradient descent algorithm for finding the minimum at the example formula is straight forward. It basically needs a while loop with two exit conditions, one if the previous result is smaller than the new one and the second if a maximum of iterations are reached. The second condition is for preventing infinite loops.

cur_x = 4
rate = 0.001
precision = 0.000001
previous_step_size = 1
max_iters = 10000
iters = 0
df = lambda x: 2*x

while previous_step_size > precision and iters < max_iters:
    prev_x = cur_x
    cur_x = cur_x - rate * df(prev_x)
    previous_step_size = abs(cur_x - prev_x)
    iters = iters + 1
    print("iteration", iters, "\nX Value is", cur_x)

print("The local minimum occurs at", cur_x)

The final result should be a value close to 0!

Leave a Reply

Your email address will not be published. Required fields are marked *