Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TBF backpropagation was introduced only in the 1970's, although in hindsight it's a quite trivial application of the chain rule.

There were also plenty of "hacks" involved to make the networks scale such as dropout regularization, batch normalization, semi-linear activation functions (e.g. ReLU) and adaptive stochastic gradient descent methods.

The maths for basic NNs is really simple but the practice of them is really messy.



Residual connections are also worth mentioning as an extremely ubiquitous adaptation, one will be hard-pressed to find a modern architecture that doesn't use those at least to some extent, to the point where the original Resnet paper sits at over 200k citations according to google scholar[1].

[1] https://scholar.google.com/citations?view_op=view_citation&h...


Highway nets introduced them in the 90s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: