Yes, Google’s search index is now in clear decline, as of late 2020, though the downward trend has started years ago.
The core issue is the Web as we know has been dying, as people all over the world do not want to bother to put up links to other high quality content just for the sake it. It is not that there is no such excellent content, there certainly is on the Web itself which now has tens of trillions of archived pages.
But what good are those pages if the supply of NEWLY created and added pages is shrinking, with fewer and fewer good links? This decline is very bad, as it gets harder and harder to constantly surface good stuff among the FRESHEST supply of indexed content.
This content quality problem has been exacerbated by Google’s switch to emphasize the freshness of their index years ago. The rationale was that there was so much good stuff around, and that its supply was (supposedly even exponentially) constantly increasing so Google would always be able to show amazing results just from the freshest portion of their index.
Such notion is quaint now, as of 2020. First there are big vertical silos, starting with Amazon, but also including other big walled gardens such as Facebook, Twitter and a host of others such as Netflix, Spotify, Shopify, eBay, Craigslist etc. So the best deals, social chatter and tweets, song and shopping recommendations, auction deals, free ads etc. are to be found elsewhere.
The same really goes for basically every vertical. Way back (remember Googlebase?) it was thought nobody should bother with any vertical as Google had it in there anyway. Googlebase is long gone and people go to CarGurus or Carvana for cars, Zillow for online house listings, Indeed and others for job postings etc., the list goes on and on.
So that leaves Google’s core results where they may retrench and demonstrate ever increasing superiority. Really? We have mentioned the core problem there, of dying (quality) links and the dearth of high quality content. One can easily see all that by looking at the ratio of old vs. new results in all search results.
AI was supposed to be another refuge and savior several years ago. The idea was that Google’s core mission was always to give answers to questions as opposed serving ten blue links with bunch of ads.
We can ask Google itself and see immediately the results leave a lot to be desired:
Google has announced recently so-called BERT update which is about using the latest NLP thing, Transformers, in search results. But BERT is not scalable as it requires short snippets cotaining answers in advance, as opposed to indexing entire pages. In addition it is computationally prohibitively expensive, even for Google as Transformer models such as BERT are notorious memory hogs, never mind how long it takes to train them.
In addition, BERT has been quickly surpassed by OpenAI GPT-3 and GPT-2 which are simply huge - GPT-3 has 175 billion parameters and takes tens of thousands of powerful specialized FPU cards and weeks to train. Good luck trying to put something like that in production at tens of thousands of queries-per-second (qps) which is what Google requires.
All these trends have been masked by rises in advertising revenue i.e. stuffing of more and more ads. But that gravy train seems to have stopped in Q2 of 2020, as Google reported a decrease in search revenue, their first EVER:
The days of ever-increasing search revenue are gone, the issue now will be how to stem the decline.
The future is in graph link analysis of huge amounts of quality text with NO human links required, where links are inferred from matrices and graphs of connectivity of the Web. But the eyes of the current AI and models such as GPT-3 are pointed in a completely different direction, as appearance of non-linearity in activation layers of Artificial Neural Networks is all the rage.
A(G)I will not be the first to learn that there is LINEAR beauty hidden beneath all those non-linear appearances and that linear models can handle amazing complexity as Quantum Mechanics has showed us more than a century ago.