Can someone please ELI5

hshdhdhehd · 2025-10-08T09:06:31 1759914391

ELI programmer

You have a function cleverly designed so that being zero is optimal. Closer to zero the better. It has 1000 dials to control it bit otherwise input and output.

So like a AWS Lambda with 1000 env vars!

Some clever math gal designed it so if you do this gradient descent thing it learns! But let's not worry about that for now. We just want to understand gradient descent.

So you have an input you like. And a desired output and thia function that makes an actual output and a way to turn that into a score of closeness. Closer to Zero better.

So you put the input, env vats, output and you get say 0.3

Not bad. But then you decide to wiggle an env var just a bit to see if it makes it better. 0.31 doh! Ok the other way. 0.29 yay! Ok so leave it there and do the next one and so on.

Now repeat with the next input and output pair.

And again with another.

Then do the whole set again!

You will find the average amount you are wrong by gets better!

This is sort of gradient descent.

One extra trick. Using maths and calculus you can figure out how to adjust the env vars so you dont need to guess and the amount you adjust them will be more optimal.

Calculus is about the rate things change, and if you say do A + B then a change in A becomes the same change in A + B but you can also do this in reverse! This let's you calculate not guess those changes needed to the env vars.

yorwba · 2025-10-08T10:55:06 1759920906

Imagine you're in a hilly landscape and want to go down as far as possible, but you can only see a small area around you, so you can't just directly go to the lowest point, because you don't know in which direction it is. Gradient descent is based on the idea of looking at which direction your local area is sloping upward (the gradient) and jumping in the opposite direction (i.e. down) a distance proportional to the strength of the slope.

This works well when the slope near the lowest point gets flatter and flatter, so that gradient descent makes smaller and smaller jumps, until you reach the bottom and stop. But if you end up on a very steep wall, you would make a very large jump, maybe so large that you overshoot the target and end up on an even steeper wall, make an even larger jump in the opposite direction and so on, getting farther and farther from the goal.

So one idea is to make sure that your jumps are always small enough that even the steepest wall you could possibly encounter won't throw you into a vicious cycle of increasing step sizes. For example, if you pour water on the ground, the water molecules make truly tiny jumps flowing down to the bottom, and in the article they call this path the gradient flow.

But what they show is that gradient descent typically splits off from this smooth gradient flow and instead gets into an area where the jumps get bigger and bigger for a while, but then the cycle is broken and they get smaller again. That is surprising! Even though it seems like the jumps can only keep getting farther, somehow they must've gotten close enough to the goal to calm down again.

So what the authors did is to remove the direction in which the jumps get bigger and look at what happens in the middle. You can imagine this as a valley with steep walls and a river smoothly flowing down at the bottom, and the jumps go back and forth across this river.

They call this river the central flow and show that it doesn't only flow along the direction of the gradient, but also a little bit in a direction where the steepness decreases. So when the jumps cross the river, they're also moving downriver a little, until they get to a point where the valley isn't so steep anymore and the jumps get smaller again.