I think it’s safe to say that nothing in the current arsenal of methods in ML surpasses deep learning *overall*, which is to say, in its ability to handle very large amounts of high-dimensional data, and extract meaningful structure. That doesn’t mean no such model will emerge, and it is possible that an improved framework will emerge out of the current effort to understand deep learning “more deeply” (pardon the pun!).
Part of the mystery is understanding precisely what it is that makes deep learning so successful. The world’s best theorists are currently engaged in understanding this question. Sanjeev Arora’s group at Princeton, for example, is one of the most active in this area, and his web page is a good place to begin to understand the issues. He has a You tube video of a recent tutorial as well. No, this is not for the faint hearted. It’s not a 20 second Tik-Tok sort of thing. It’s over two hours long.
So, what’s the mystery here? Deep learning flies in the face of logic. It’s a highly non-convex ill-defined optimization problem, and all existing theory suggests it should not be possible to optimize well in this space. However, not only are deep learning approaches successful, there is an incredibly fast linear time algorithm (gradient descent) that finds seemingly close-to-optimal solutions. For a theorist like Sanjeev, this is like the “dark matter” mystery in physics. Dark matter is the strangest thing in the physical universe. It begs for an explanation. The world’s top theorists are drawn to it. Deep learning is in essence the dark matter mystery in CS and math. Something that shouldn’t exist, but does, and something that shouldn’t work, but does.
Another puzzle is that deep learning nets can memorize noise. You can feed them random noise images with random labels, and they will happily reduce training error to 0. Conventional ML theory says when a function approximator has this much “capacity” (from Vapnik Chervonenkis style analyses), the model is essentially memorizing, and cannot generalize. Yet, this turns out to be false as well. Fed clean sensible data (e.g., Imagenet), deep learning generalizes well. No, by no means does it solve the *real world* computer vision problem, as some have claimed, but it does as well as one would expect. So, deep learning does require rethinking not just traditional optimization, but traditional models of generalization.
Understanding deep learning requires rethinking generalization
One way to chart a future for AI/ML beyond deep learning is to redefine the question. Deep learning is essentially good at learning from an existing data set. Unfortunately, this is not good enough for many of the world’s challenges. Take climate change, for example. It is not enough to simply fit a deep learning model to years of climate data. Of course, folks are working on that, and in fact, fitting exascale level deep learning models to climate data.
Exascale Deep Learning for Climate Analytics
This paper studies using 5000 P100 level GPUs, scaling to 24,000 GPUs, to fit tens of terabytes of climate data. This paper shows the power of deep learning to scale to large data sets, given enough compute resources. But, I’m not convinced that this is the right approach.
Climate science requires understanding how the world is changing, not how it is now. It requires more than building a statistical descriptive model of the world as it was and as it is now, but projecting to an uncertain future 50 or 100 years from now. That’s what analytical climate models try to do. What you need is causal analysis, and that’s currently beyond deep learning’s capability. Judea Pearl, a critic of ML and of deep learning, has argued in a recent paper that ML ignores causal analysis at its own peril.
Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution
This year’s Nobel-prize in economics went to three economists (two from MIT, and one from Harvard), who pioneered the use of causal models using randomized trials in development economics. Many of their pioneering studies were done in places like India and Africa. The challenge studied by these economists goes beyond what deep learning can currently do.
The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2019
The award citation is worth reading to see the sort of real world problems that these economists have spent their lives working on, and it is truly inspiring. The opening paragraph reads:
https://www.nobelprize.org/uploads/2019/10/advanced-economicsciencesprize2019.pdf
“Despite massive progress in the past few decades, global poverty — in all its different dimensions — remains a broad and entrenched problem. For example, today, more than 700 million people subsist on extremely low incomes. Every year, five million children under five die of diseases that often could have been prevented or treated by a handful of proven interventions. Today, a large majority of children in low- and middle-income countries attend primary school, but many of them leave school lacking proficiency in reading, writing and mathematics. How to effectively reduce global poverty remains one of humankind’s most pressing questions. It is also one of the biggest questions facing the discipline of economics since its very inception.
So how best to identify strategies to help the least well-off? This year’s Prize in Economic Sciences rewards the experimental approach that has transformed development economics, a field that studies the causes of global poverty and how best to combat it. In just two decades, the pioneering work by this year’s Laureates has turned development economics ― the field that studies what causes global poverty and how best to combat it ― into a blossoming, largely experimental field”
So, the problem here is to discover a “causal intervention”, which goes beyond traditional data science and deep learning. The problem is not to build a model of the world as it is now (e.g., almost a billion people in poverty), but how to change reality! That’s the aim of causal studies. Figure out how to change the world for the better.
The MIT laureates, Professor Abhijit Banerjee and Professor Esther Duflo, have written a wonderful account of their work, which is well worth reading.
This work gives us a road map to think about how to redefine the future of AI and ML away from deep learning. To tackle the world’s most pressing problems, whether it is climate change — the most serious and existential crisis that humans face — or poverty and illiteracy — socially the most damning problem that the more “developed countries” with all their wealth have yet been unable to make a dent on — it is not enough to do “curve fitting”, as Judea has argued, but it requires understanding causal interventions that will help produce a better world, e.g. reduce or hopefully eliminate global warming, and help reduce poverty and improve literacy.
Deep learning will likely remain unchallenged in its areas of strength, the building of massive highly overparameterized models on huge datasets. Much of the work here is understanding why it works well, which might suggest better more transparent methods. But, even if that effort pays dividends, and I for one am confident that folks like Sanjeev Arora will shed light on the current darkness, it will not solve humanity’s most pressing problems. For that, we must turn over a new leaf and begin anew.
That means no longer thinking that building a complex nonlinear model of data is sufficient. The reason these economists won this year’s Nobel prize is not because they were successful in building models that “explained data”. No, they pioneered techniques that allow developing countries all over the world tackle the most serious social problems that face them.
Professor Esther Duflo’s lab at MIT is currently doing nearly 1000 randomized trials in over 80 countries. She’s the youngest winner of the economics Nobel, and an inspiration to many.
The Abdul Latif Jameel Poverty Action Lab
I can think of no better future for AI and ML than to follow in her pioneering footsteps, and make the world a better place.