My lab has been one of the three that started the deep learning approach, back in 2006, along with Hinton's at Toronto and LeCun's at NYU, followed soon after by Ng's at Stanford (NIPS 2007), many more now. · Author has 170 answers and 4.5M answer views · 10y ·
Lots of apparently obvious ideas only became obvious after the fact...
The use of the chain-rule through many non-linearities was known and used in control many years before it was used in neural nets. In those days when back-prop was invented (early 80's) neural nets had discrete (binary) outputs, barring the use of gradient-based optimization. But David Rumelhart, +Geoff Hinton et al (and my friend +Yann LeCun, independently) figured that if you had smooth (sigmoidal) outputs, you could use the chain rule and train multi-layer networks. So it was not just about using the chain rule, but about accepting to let go of the established binary-output neurons. It is an interesting lesson
70.2K views ·
View upvotes
· 1 of 9 answers
Something went wrong. Wait a moment and try again.