When seo met AI
Published 15th December 2017
For those of you who have had the pleasure of having been around this industry for even a short period of time, you will no doubt have figured out that SEO can be a funny old game. For that matter, it doesn’t matter what industry you work in, Google is now deeply integrated with all of our lives and has been for nearly two decades. However, if you are one of the lucky few who remain unaffected by it, I can only guess you’ve recently returned from a couple of decades of devoted Ayurvedic introspection or perhaps the pursuit of enlightenment with an isolated monastic order.
Or perhaps your government’s leadership insists you use the state-run alternative. Failing that, you may have simply messed up your parole review. In any of the aforementioned possible eventualities, you will have a lot of catching up to do.
My aim with this article is twofold:
- to provide a starting point for those who are currently peering down the SEO rabbit hole;
- for those who have already begun their venture into wonderland, to share some insights with you which I expect to be of interest to both pure SEO and affiliates in general.
But first: happy “belated” birthday, Google
I’d like you to join me in a toast to Google, wishing it the very best belated birthday. If you missed it, it was Google’s 19th birthday on 27 September (a few days ago, at the time of writing). According to Google’s official birthday announcement (2017): “In 1997, one of Google’s co-founders, Larry Page, had just arrived at Stanford University to pursue his PhD in computer science.
Of all the students on campus,Google’s other co-founder, Sergey Brin, was randomly assigned to show Page around. This chance encounter was the happy surprise that started it all.” For reasons you’ll understand in a moment, I’d like to ask you to read the full quotation one more time. Seriously, it’s important!
Did you read it again… carefully?How would you know if the claim was true?I will use this example by way of illustration while I remind you that SEO (when practised responsibly) is actually a subfield of forensic science and as such it requires the same investigative approach.
Consider the definition: Forensic (adjective) “belonging to, used in, or suitable to courts of judicature or to public discussion and debate”(Merriam-Webster).
Every detail, no matter how minor, in order to be considered admissible in court, must be supported by evidence, the reliability of which must be sufficiently robust to withstand scrutiny. For the purpose of this exercise, let’s specify the UK as our legal jurisdiction. Arbitrarily, let’s assume this to be a criminal court, because it probably best suits 70-80% of SEO practitioners randomly sampled from the wider, global SEO community. (Stay with me, you’ll see where I’m going.)
According to In Brief, a fantastic legal resource from which I took inspiration for this article, under UK criminal law evidence falls into 11 categories, eight of which are either not applicable to our analogy or inadmissible.
This leaves us with three remaining types of evidence: evidence; expert evidence; and corroboration.According to the Crown Prosecution Service’s definition: “Expert evidence can be used to assist the court in determining the issues in a case where it is relevant and where the opinion of an expert is needed to give the court a greater understanding of those issues”(2014).
Because this example is trivial, and considering the CPS guidelines on expert evidence, we do not require any such expert evidence. This leaves us with two remaining types of evidence with which to reach a verdict, reject the case or call an adjournment.Now read it with your own eyes: google.com/doodles/googles-19thbirthday (or tinyurl.com/ggl19).
I’ll ask again: is the statement from Google true or false? You might answer one or none of the following:
- It sounds reasonably accurate to the best of my knowledge.
- Why would they lie?
- What is to gain by making up falsehoods?
It’s actually untrue.My point, however, is not to pick holes in a statement made by Google. My point relates purely to the importance of taking a forensic approach.
The importance of forensics in SEO
Consider the small and ambitious affiliates with plans to do great things.Consider too the big affiliate with a growing team, mouths to feed, and responsibility as an employer.It doesn’t matter who you are. Google is a highly optimised technology. It runs the western world’s largest collective distributed application, employs the world’s largest AI research team and, based on that very premise, continues to attract talent of the highest calibre.
This results in more innovation than any other organisation on the planet can match. Moreover, the output of its AI research and innovation creates more data as a by-product and direct result of its aggregated intellectual capacity and near limitless computational capacity. And, having organised nearly all the world’s information, Google attracts even more talent, and thus the cycle continues.
If you, as an affiliate, regardless of size, trigger any of its anomaly detection algorithms, it will hurt your business.Hold on, back up — what do you mean it’s untrue?According to archive.org and a snapshot from 1 April, 2012, this page used to exist: https://web.archive.org/web/20120401005940/http://www.google.com/about/company/history
Nowadays, however, that page redirects to: google.com/about (Google’s blog).
However, the April 2012 page details a history that’s rather different from what you read earlier. To quote:
Our history in depth 1995-1997
- 1995:Larry Page and Sergey Brin meet at Stanford. (Larry, 22, a U Michigan grad, is considering the school; Sergey, 21, is assigned to show him around.) According to some accounts, they disagree about almost everything during this first meeting.
- 1996: Larry and Sergey, now Stanford computer science grad students, begin collaborating on a search engine called BackRub.
- BackRub operates on Stanford servers for more than a year — eventually taking up too much bandwidth to suit the university.
- 1997:Larry and Sergey decide that the BackRub search engine needs a new name. After some brainstorming, they go with Google — a play on the word “googol,” a mathematical term for the number represented by the numeral 1 followed by 100 zeros. The use of the term reflects their mission to organize a seemingly infinite amount of information on the web.
So when, according to the 2017 statement, did Page and Brin meet?
Check that first quotation again. In truth, Larry Page and Sergey Brin met at Stanford in 1995, and their collaboration began in 1996.More specifically, their crawler began exploring the web in March 1996.BackRub was the crawler, as you can see from this archive.org page: bit.ly/backrub1997It’s a snapshot of 29 August, 1996. By then, BackRub had managed:
- Total indexable HTML URLs: 75.2306 million
- Total content downloaded: 207.022 gigabytes
- Total indexable HTML pages downloaded: 30.6255 million
- Total indexable HTML pages which have not been attempted yet: 30.6822 million
- Total robots.txt excluded: 0.224249 million
- Total socket or connection errors:1.31841 million
Note the message: “Sergey Brin has also been very involved and deserves many thanks.”And look again at the date. The last time this summary was updated was 29 August, 1996. The snapshot was recorded by Archive.org on 10 December, 1997.However slight that difference may be, any discrepancy between the two is equally as vital in the practice of SEO as it is in any court of law. Although the difference in this trivial example might appear to be simple nitpicking, a similar misjudgment arising from an error of even such a narrow margin could cost a large corporate igaming operator literally billions.
Case closed: fake news
By no means am I implying malicious intent. It’s only a tiny white lie. However, it is consistent with the public relations mastery through which we learned to trust Google, even with our most private of information. Almost childlike is our trust in it, whether it’s with our personal communications, our real-time location data, its news aggregation or even its entire legitimacy. We trust its claims to have fixed the AdWords click fraud. We used to trust its adherence to monopoly laws with respect to shopping results and now allow it into our homes, as Dave Snyder quite accurately predicted in 2011 on the iGaming Affiliate Demon SEO panel in Dublin (bit.ly/DavePredicts2011).
Can you imagine what would happen if your private search history were to become public information? Because I suspect no malice on Google’s part, I think instead that it is likely to have heeded the message of Seth Godin’s talk to Googlers from July 2007 (youtube.com/watch?v=AZnYRaQfjK4 or tinyurl.com/godin2007).
He clarified how ideas spread, and it makes sense to simplify the message regarding Google’s birthday. (At the time of his 2007 talk his most recent book was All Marketers Are Liars: The Power of Telling Authentic Stories in a Low-Trust World.) My point was simply to illustrate the approach required where search engines are concerned. So we start with the same methodology, first used to power Google’s inverted index, aka the magic that enables its distributed performance.
Let me explain the importance of Google innovations before we address the implication.
Deep learning the Google parts
To illustrate the trajectory we need to observe Google’s history leading up to 2011 and the work of senior fellows Jeff Dean and Geoffrey Hinton. Hinton is currently emeritus professor of computer science at University of Toronto and an engineering fellow at Google. He coinvented the Boltzmann machine in 1985 and is recognised as one of the pioneers of neural networks.
Dean, having worked with Google since mid 1999, designed and implemented many of the innovations there, including: MapReduce, a methodology whereby distributed computation may be carried out over multiple machines in parallel; and Bigtable, a distributed storage system for structured data designed to run on cheap commodity hardware connected via a network. In 2011, he and a small team of engineers invented DistBelief.
The following quote is taken from the original paper, Large Scale Distributed Deep Networks, that he co-authored with his colleagues, using tens of thousands of CPU cores to develop a parallelised methodology to an object recognition task with 16 million images and 21k categories: “In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models.”He then led a team that generalised DistBelief into a library built on a Python interface.
Despite Python being a relatively slow language, it is popular among developers because it’s relatively easy to learn. It is currently ranked fifth on the TIOBE Index, the software development hit parade for language popularity and adoption. As with all successful v1.0 software projects, v2.0 delivered dramatic improvements. Dean, along with a team of scientists, worked on a refactor of DistBelief, which became TensorFlow. Despite Python’s relative slowness, TensorFlow remains unaffected by this performance limitation.
Its lower-level architecture is based on the statically typed C programming language.Furthermore, this low latency, C-based interface merely bridges the gap between the popular, high-level, almost human readable, Python language and the real workhorses: CPUs, GPUs and, more recently, TPUs. This is where the computationally expensive operations take place.Until GPU (graphical processing unit), TensorFlow was built to use a combination of hardware depending on the required model.
Specialising in doing matrix mathematics (the iterative and highly paralellisable computation that performs the computationally expensive matrix mathematics required to train these deep neural networks), Google has since developed an ASIC (application specific semiconductor) against a number of deep network operations.
Because Google open sourced the deep learning research framework, TensorFlow is now more than three times as popular as its next most popular competitor.Additionally, there are two trending projects (at the time of writing) in GitHub’s most popular Python open source projects. The geometric advancement of artificial intelligence isn’t limited to computation and storage costs or available labelled data.
As we can see, it may also be observed as a result of human collaboration.
- TensorFlow models, which include some of the most sophisticated publicly available pre-train models.
- Keras, which is deep learning for Python, a layer on top of Tensorflow that introduces improvements in usability, modularity and extensibility and offers Python compatibility.Just to keep the current public domain capability in perspective: Inception-v4, for object recognition, has better-than-human capabilities given several computer vision tasks, including the ability to accurately identify dog breeds.
To compete with AI we require AI
The previous section illustrates the rapid evolution of superhuman artificial intelligence. Importantly, it is impossible to observe a deep neural network from the outside looking in. We know from the Google media team, when it announced RankBrain, the current ranking algorithm, that the previously supervised learning approach has been partially replaced by an entirely unsupervised learning-to-rank approach that is widely known to be serving the search results.
As early as 2008, Google’s Peter Norvig denied that its search engine relies exclusively on machine-learned ranking. Cuil’s CEO, Tom Costello, suggests that it prefers hand-built models because they can outperform machine-learned models when measured against metrics such as click-through rate or time spent on a landing page, because machine-learned models “learn what people say they like, not what people actually like”.
RankBrain, because of its deep neural network-based composition, cannot be audited, observed or held in anyway accountable.
In order to keep up, it’s important by way of penalty avoidance to maintain a capability to remain within many acceptable norms — all of which may be derived from increasingly complex sources using equally complex and increasingly accurate methods of anomaly detection. However, the bottom line is this: research conducted by Oxford University’s Future of Humanity Institute and Yale University’s Department of Political Science estimates that AI will write a top 40 song in 11.4 years and win the WSOP in 3.6 years.
The experts sampled in Europe expect that AI researchers themselves will be replaced by AI in 80 years (based on the median), with optimists estimating the time frame to be closer to about 45 years, and for AI to write a New York Times best-seller in 33 years (see Figure 1).
The estimates used 2016 as a starting point and came from the assessments of 352 published AI experts. Worryingly, the same researchers estimated AI to be capable of being world champion of the game Go after 17.6 years, which we now know was an estimate that was out by around 17 years. So my advice to SEO professionals would be start retraining for something less likely to be replaced.
Personally, I’m thinking seriously of taking up flower arrangement because based on backtested data we’ve already solved many aspects of SEO with better-than-human capability. It would appear most of our numbers are up. And on that bombshell I wish you all the best of luck.