Profile photo for Yoshua Bengio

Lots of apparently obvious ideas only became obvious after the fact...

The use of the chain-rule through many non-linearities was known and used in control many years before it was used in neural nets. In those days when back-prop was invented (early 80's) neural nets had discrete (binary) outputs, barring the use of gradient-based optimization. But David Rumelhart, +Geoff Hinton et al (and my friend +Yann LeCun, independently) figured that if you had smooth (sigmoidal) outputs, you could use the chain rule and train multi-layer networks. So it was not just about using the chain rule, but about accepting to let go of the established binary-output neurons. It is an interesting lesson

View 8 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025