Internet Marketing Tips, Suggestions, & Ramblings

Problems with Machine Learning

I often use the [define:] search operator to find the meaning and spelling of words. With the Chrome Web browser it is especially convenient: all that you have to do is type a search query into the address bar. This has been useful and it has empowered me to expand my vocabulary.

Up until recently, these search results were citing a variety of different sources. The medical and legal definitions that sometimes crowded the search results were seldom useful to me; but when I needed to understand the less common uses of words, this diversity was almost always helpful.

[define:hubris]

Example of the new search result format for the searches that are using the [define:] operator.

This experience recently changed, and the following improvements were made to the [define:] search results:

  • The citations for definitions were removed.
  • The etymology of words were added.
  • The translation of words became an available option.
  • The usage of words is now being graphed over time.
  • And a few more minor things were changed or added…

These changes were motivated by a desire to improve the user’s search experience. Although some new uses were added, one use was taken away: the ability to check multiple sources within these results. This change seems to be benign, but how Google was able to remove this feature and still manage to “improve the search experience” is a point of interest.

Earlier this year, the Guardian published an article called Google and the Future of Search: Amit Singhal and the Knowledge Graph by Tim Adams. In it Tim Adams interviews Amit Singhal, head of Google Search. What Adams says is that Google is “on the threshold of another epochal change.”

“Having searched for a decade or so using the original brilliant principle of hierarchies of web-based links, the great primary coloured knowledge domination machine has, Singhal suggests, ‘begun to learn how to understand the real world of people, places and things.'”


Google and the future of search: Amit singhal and the knowledge graph by Tim Adams. Published by the Guardian.

This epochal change is Knowledge Graph, a system which learns from all of the data that Google collects. The associations that Knowledge Graph finds are then used to enhance the user’s search experience.

This may be how Google foregoes the inclusion of sources in their definitions. The sources are no longer necessary because the definitions have been computed. If you’d still like to find alternative definitions, then they are below the [define:] search results. However, you should be forewarned of the arduous and archaic search experience involved with visiting multiple Web sites to find an answer.

Although Google has computed the definitions of words within a high degree of accuracy, I recently experienced a problem with the results:

I wanted to know whether or not the word “four hundred” was hyphenated. To find that out I did what I’ve grown accustomed to do: I typed my search query into the address bar of the Google Chrome Web browser, [define:four hundred].

[define:hubris]

Example of the expanded search results for the search query [define:four hundred].

The search results did not answer whether or not the name of the number 400 was hyphenated. Instead, it prompted new questions for me, “did I receive the right definition? If not, then was the error something that I did? Did I spell the word ‘hundred’ right? If this is the right word and wrong definition, then where are any of the alternative definitions?

I wasn’t only presented with the wrong definition: I was presented with the etymology of  the wrong definition; and it was a word that I didn’t care about at that time. I also wasn’t presented with any alternative definitions.

I felt a sense of exclusion from the convenience of this whole process. I’ve grown to depend on the [define:] search operator. I have long since phased out whatever resources I had formerly used to find the meaning and spelling of words.

When the process grinds to halt, what will be the alternative? Will the Web sites that I formerly used still be in business? Will I be able to find my copy of the dictionary? Will there be anyone nearby to ask? If so, then will they know the answer without searching?

As we forget the traditional social structures that we once depended on, we are replacing them with technological infrastructure. Our trust and dependence are now being placed with technology companies like Google. When a part of this infrastructure fails, so will a deeply ingrained part of us and that feeling will be personal.

Similar definitions provide hints about why this particular problem happened. The [define:five hundred] search results have a similar issue: the definition has been commandeered by another noun, meaning “a form of euchre in which making 500 points wins a game.” This is in contrast to the [define:three hundred] search results which are universally useful; three hundred is defined as “being one hundred more than two hundred.”

Until Knowledge Graph finds enough associations to define these words another way, perhaps Google will continue to define these words in these terms. One day, Knowledge Graph will either include the mathematical definition of numbers or the mathematical definitions of numbers will be supplanted by deep cultural associations. If there’s one thing to be learned from this, I should’ve used http://www.wolframalpha.com/input/?i=four+hundred at Wolfram|Alpha for this type of search instead.

Four Hundred

Example of a search for [four hundred] using Wolfram|Alpha.

Today, tyranny is a bad word. We’ve learned to despise tyrants and we attribute a lot of bad qualities to them. However, there was a time when tyrants made some undeniably positive contributions. In the cradle of Western civilization, tyrants created economic prosperity and urban infrastructure, they held court and were patrons of the arts, and they eroded and redistributed the power of the aristocracy, which was an important factor leading to democracy.

In Professor Donald Kagan’s Introduction to Ancient Greek History, he explains that for these reasons, being for or against tyranny was not a simple question for the Greeks. However, one of the things that made the tyrants particularly terrible is that they had complete power and no responsibility to anyone. The Greeks believed that for a power to be legitimate, it had to be responsible to people. The tyrants “behaved as though they were gods,” and this form of arrogance is what the Greeks called hubris.

I think that it’s fair to say that by this definition, Google has a kind of tyrannical power over all of the data on the World Wide Web. They have copied and crawled it and now what they do with it is Google’s business. If Google has crawled the Web and Knowledge Graph has mapped its associations, and together they have determined that the most useful definition of the word “four hundred” is “the social elite of a community,” then who is to say otherwise?

“If we knew all the laws of Nature, we should need only one fact, or the description of one actual phenomenon, to infer all the particular results at that point. Now we know only a few laws, and our result is vitiated, not, of course, by any confusion or irregularity in Nature, but by our ignorance of essential elements in the calculation.'”


WALDEN, AND ON THE DUTY OF CIVIL DISOBEDIENCE by Henry David Thoreau.

I also think that it’s fair to say that by this definition, Google is demonstrating a kind of hubris. There will inevitably be other kinds of errors in Knowledge Graph, I’ve already seen some of them. Considering these types of errors as “engineering problems” on the road to an omnipotent “knowledge graph,” is hubris.

Another way to minimize the occurrence of errors in this search experience is by down-sampling the spectrum of searches. While Google is adding dimensions like Google Instant and Knowledge Graph to the search experience, they are also subtracting another dimension: diversity. By aiming to create the maximum amount of utility for the maximum number of people they are also working to exclude peripheral users, at an accelerated rate; reducing this diversity is essential in maximizing the utility of Google search.

Using machines to compute the definitions of words, increasing dependence on those definitions, and reducing access to alternative sources is an indirect way of accomplishing this.

In George Orwell’s novel 1984, Syme’s job was to cull the dictionary of words and narrow the range of thought. At one point he muses to Winston:

“Don’t you see that the whole aim of Newspeak is to narrow the range of thought? In the end we shall make thought crime literally impossible, because there will be no words in which to express it. Every concept that can ever be needed, will be expressed by exactly one word, with its meaning rigidly defined and all its subsidiary meanings rubbed out and forgotten. Already, in the Eleventh Edition, we’re not far from that point. But the process will still be continuing long after you and I are dead. Every year fewer and fewer words, and the range of consciousness always a little smaller.”

“One of these days,” Winston thinks to himself with a sudden deep conviction, “Syme will be vaporized. He is too intelligent. He sees too clearly and speaks too plainly. The Party does not like such people. One day he will disappear. It is written in his face.”

About Garry Grant

Garry Grant is a veteran expert in search engine optimization and the digital marketing industry. With nearly 20 years of experience, Garry has successfully built a multi-service operation at SEO, Inc., developing proprietary technologies through complex strategic solutions. He has extensive experience in key initiatives and operational responsibilities grounded in information technology and performance management.

Garry’s expertise and esteemed reputation, coupled with SEO Inc.’s impressive client success record has earned him such accolades as Entrepreneur Magazine's 2005 Hot List for the Hottest Internet Property, Inc. 500 2007 Honorary award for Fastest Growing Private Company in America, an Inc. 500 top 50 Company in San Diego, and interviews with The New York Times, The Wall Street Journal, WIRED, Entrepreneur and The Huffington Post.

Garry Grant began his online career in 1993 creating strategic Web and e-business solutions for Homepage.com, The Rush Limbaugh Show, Premiere Radio Networks, Clear Channel Communications, EarthLink and Artisan Motion Pictures. Today, Garry and SEO Inc.’s highly skilled digital strategists develop proprietary technology and strategic digital marketing direction for Fortune 500 companies including, SC Johnson, McAfee, Entrepreneur.com., Inc Magazine, IGN, Tacorri, LPL Financial, National Kidney Foundation, G4 TV, Fuel TV and Sony, just to name a few.