Profile photo for Sridhar Mahadevan

Deep learning will eventually “die out” when the AI/ML community realizes two facts: minimizing error over a training set, no matter how large, is not enough to solve the AI problem; second, the true test of a scientific theory is not its accuracy at making predictions over some fixed dataset, but the level of insight it gives us into a problem.

As Thomas Kuhn, former MIT professor and author of the classic “Structure of Scientific Revolutions” astutely noted, science is like many other human professions where the driver is not progress towards some absolute measure of truth, but shaped by a series of ‘paradigms’ that workers in the field tacitly and unquestionably believe in.

Deep learning is a paradigm that necessitates absolute loyalty in a set of core beliefs: what matters above all else is performance at minimizing error over some fixed dataset, and the explainability of the resulting solution matters not one whit.

Deep learning will die out when AI researchers realize that both these tenets are neither necessary nor sufficient, and far from furthering AI’s progress as a science, are actually hindering it.

Let’s examine both these core beliefs. Take the widely influential Imagenet vision dataset that popularized deep learning over the past few years. The computer vision community has accepted lock stock and barrel the assumptions that any network that produces lower error on Imagenet, no matter how incomprehensible it is, even if it comprised of thousands of layers, is a sign of ‘forward progress’ in the field. Here’s a graph measuring “progress” at this problem (somewhat dated):

Golly, the performance is now “better” than humans can do. If you believe that this result means we have truly made progress in understanding human vision or that computer vision systems are better than human vision in general, you are a true believer in deep learning.

It might be time to step back and ask yourself some basic questions: can our ability to perceive be quantified by some dataset like Imagenet? Do we have any idea what these massively overparameterized billion parameter nets are doing?

As Richard Feynman, iconoclastic safe breaker in the Manhattan project and Nobel prize winning physicist noted in his appendix to the space shuttle Challenger disaster report: “For a successful technology, science must take precedence over public relations as nature cannot be fooled”.

Anyone of you can do a simple experiment to test whether these Imagenet derived networks truly work well in the real world. Download the latest version of MATLAB (free for a month) and the computer vision/deep learning toolboxes, hook up a webcam to your laptop, go around your house and run the test program. You will discover, as I did, that performance is woefully abysmal, and far far poorer than a two year old child or even one of my dogs. These networks give the illusion of progress, but it’s a false sense of hope.

The first test I did was point it to my living room; the network classified it as a ‘barbershop’. Repeated tests showed performance accuracy was lower than 20%. Only with great difficulty were even simple objects like cups or plants recognized for what they were. Most often, the classifications produced were hilarious.

I’m not the first to point out the emperor has no clothes. Many others have, including Alan Yuille, Bloomberg Distinguished Professor of visual cognition at Johns Hopkins. Is anyone listening?

Limitations of Deep Learning for Vision, and How We Might Fix Them

Professor Yuille notes in his article “Deep Nets perform well on benchmarked datasets, but can fail badly on real world images outside the dataset.”. He also noted and gives simple examples to illustrate the point that “Deep Nets are overly sensitive to changes in the image which would not fool a human observer.”.

One might argue in deep learning’s favor that perhaps this is to be expected. Imagenet is not large enough. Just a few million images. Perhaps if we used a billion or a trillion images, at some point, surely, we will achieve true success. But, as Alan notes, once again: “the set of real world images is combinatorially large, and so it is hard for any dataset, no matter how big, to be representative of the complexity of the real world.”

In short, the entire paradigm is based on the assumption that any real world human ability, be it perception or language or behavior, is just a matter of building a black box that achieves superhuman performance at some test suite, be it Imagenet or Coco or Atari video games or Go. After all, surely this gives us a concrete quantifiable measure of progress, so we can plot “progress” over time.

Deep learning will die when the AI/ML community realizes this “emperor has no clothes” and, as Robert Coase, Nobel prize winning economist at the University of Chicago blithely noted, “a scientific theory is not like a bus timetable”. Accuracy of its predictions is not the main metric of its success, but rather it is the insight it provides, arguing that he would prefer a poorer predicting theory if it yielded greater insight.

Ultimately, deep learning is fed by the belief that humans are tabula rasa learning machines, that the brain is a “blank slate”, that the 100 billion neurons in the brain, each connecting to as many as a thousand neurons each generates a search space that’s a million times larger than the number of seconds the average human lives on the earth, and that gradient descent is enough to set these parameters, even though it is biologically completely implausible, or that it could not possibly explain the behavior of millions of biological species that are capable of remarkable behavior a few seconds or minutes after birth.

Recently a fascinating article was published in the prestigious journal Nature, arguing that this entire enterprise was based on the false assumption that biological systems work because of some magical unsupervised or supervised or reinforcement learning algorithm, when in fact the behavior of many, if not most, animals is almost completely hardwired at birth, as it must if the animal is to have any hope of survival in a highly hostile environment:

A critique of pure learning and what artificial neural networks can learn from animal brains

As the article argues, we are like the proverbial physicist looking under the light for a key, not because we know we lost it there, but because there’s light there. Instead, the article poses the following fascinating challenge the AI/ML community would be well advised to pay attention to. The challenge is neatly summarized by the abstract of this article, which I will quote in its full to highlight its remarkable message:

”Artificial neural networks (ANNs) have undergone a revolution, catalyzed by better supervised learning algorithms. However, in stark contrast to young animals (including humans), training such networks requires enormous numbers of labeled examples, leading to the belief that animals must rely instead mainly on unsupervised learning. Here we argue that most animal behavior is not the result of clever learning algorithms—supervised or unsupervised—but is encoded in the genome. Specifically, animals are born with highly structured brain connectivity, which enables them to learn very rapidly. Because the wiring diagram is far too complex to be specified explicitly in the genome, it must be compressed through a “genomic bottleneck”. The genomic bottleneck suggests a path toward ANNs capable of rapid learning.”

Is anyone listening?

View 4 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025