Yoshua Bengio's answer to Why did it take so long to invent the backpropagation algorithm? Isn't it just a straightforward (albeit cumbersome) application of the chain rule?

Why did it take so long to invent the backpropagation algorithm? Isn't it just a straightforward (albeit cumbersome) application of the chain rule?

Yoshua Bengio

My lab has been one of the three that started the deep learning approach, back in 2006, along with Hinton's at Toronto and LeCun's at NYU, followed soon after by Ng's at Stanford (NIPS 2007), many more now. · Author has 170 answers and 4.5M answer views · 10y ·

Lots of apparently obvious ideas only became obvious after the fact...

The use of the chain-rule through many non-linearities was known and used in control many years before it was used in neural nets. In those days when back-prop was invented (early 80's) neural nets had discrete (binary) outputs, barring the use of gradient-based optimization. But David Rumelhart, +Geoff Hinton et al (and my friend +Yann LeCun, independently) figured that if you had smooth (sigmoidal) outputs, you could use the chain rule and train multi-layer networks. So it was not just about using the chain rule, but about accepting to let go of the established binary-output neurons. It is an interesting lesson

70.2K views ·

View upvotes

1 of 9 answers

Something went wrong. Wait a moment and try again.

View 8 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·