The Problems with Machine Learning
I often use the [define:] search operator to find the meaning and spelling of words. With the Chrome Web browser, it is especially convenient: you must type a search query into the address bar. This has been useful, and it has empowered me to expand my vocabulary.
Up until recently, these search results were citing a variety of different sources. The medical and legal definitions that sometimes crowded, the search results were seldom useful to me; but when I needed to understand the less common uses of words, this diversity was almost always helpful.
Example of the new search result format for the searches that are using the [define:] operator.
This experience recently changed, and the following improvements were made to the [define:] search results:
- The citations for definitions were removed.
- The etymology of words was added.
- The translation of words became an available option.
- The usage of words is now being graphed over time.
- And a few more minor things were changed or added…
These changes were motivated by a desire to improve the user’s search experience. Although some new uses were added, one method was taken away: the ability to check multiple sources within these results. This change seems to be benign, but how Google was able to remove this feature and still manage to “improve the search experience” is a point of interest.
Earlier this year, the Guardian published an article called Google and the Future of Search: Amit Singhal and the Knowledge Graph by Tim Adams. In it, Tim Adams interviews Amit Singhal, head of Google Search. What Adams says is that Google is “on the threshold of another epochal change.”
“Having searched for a decade or so using the original brilliant principle of hierarchies of web-based links, the great primary colored knowledge domination machine has, Singhal, suggests, ‘begun to learn how to understand the real world of people, places and things.'”
Google and the future of search: Amit Singhal and the knowledge graph by Tim Adams. Published by the Guardian.
This epochal change is Knowledge Graph, a system which learns from all of the data that Google collects. The associations that Knowledge Graph finds are then used to enhance the user’s search experience.
This may be how Google foregoes the inclusion of sources in their definitions. The references are no longer necessary because the definitions have been computed. If you’d still like to find alternative definitions, then they are below the [define:] search results. However, you should be forewarned of the arduous and archaic search experience involved with visiting multiple Web sites to find an answer.
Although Google has computed the definitions of words within a high degree of accuracy, I recently experienced a problem with the results:
I wanted to know whether or not the word “four hundred” was hyphenated. To find that out I did what I’ve grown accustomed to doing: I typed my search query into the address bar of the Google Chrome Web browser, [define: four hundred].
Example of the expanded search results for the search query [define: four hundred].
The search results did not answer whether or not the name of the number 400 was hyphenated. Instead, it prompted new questions for me, “did I receive the correct definition? If not, then was the error something that I did? Did I spell the word ‘hundred’ right? If this is the right word and wrong definition, then where are any of the alternative definitions?“
I wasn’t only presented with the wrong definition: I was given with the etymology of the wrong meaning, and it was a word that I didn’t care about at that time. I also wasn’t presented with any alternative definitions.
I felt a sense of exclusion from the convenience of this whole process. I’ve grown to depend on the [define:] search operator. I have long since phased out whatever resources I had formerly used to find the meaning and spelling of words.
When the process grinds to halt, what will be the alternative? Will the Web sites that I formerly used still be in business? Will I be able to find my copy of the dictionary? Will there be anyone nearby to ask? If so, then will they know the answer without searching?
As we forget the traditional social structures that we once depended on, we are replacing them with technological infrastructure. Our trust and dependence are now being placed with technology companies like Google. When a part of this infrastructure fails, so will a deeply ingrained part of us and that feeling will be personal.
Similar definitions provide hints about why this particular problem happened. The [define:five hundred] search results have the same issue: the meaning has been commandeered by another noun, meaning “a form of euchre in which making 500 points wins a game.” This is in contrast to the [define:three hundred] search results which are universally useful; three hundred is defined as “being one hundred more than two hundred.”
Knowledge Graph finds enough associations to define these words another way; perhaps Google will continue to define these words in these terms. One day, the Knowledge Graph will either include deep cultural associations will supplant the mathematical definition of numbers or the precise definitions of numbers. If there’s one thing to be learned from this, I should’ve used http://www.wolframalpha.com/input/?i=four+hundred at Wolfram|Alpha for this type of search instead.
Example of a search for [four hundred] using Wolfram|Alpha.
Today, tyranny is a bad word. We’ve learned to despise tyrants, and we attribute a lot of bad qualities to them. However, there was a time when tyrants made some undeniably positive contributions. In the cradle of Western civilization, tyrants created economic prosperity and urban infrastructure, they held court and were patrons of the arts, and they eroded and redistributed the power of the aristocracy, which was an essential factor leading to democracy.
In Professor Donald Kagan’s Introduction to Ancient Greek History, he explains that for these reasons, being for or against tyranny was not a simple question for the Greeks. However, one of the things that made the tyrants particularly terrible is that they had full power and no responsibility to anyone. The Greeks believed that for a force to be legitimate, it had to be responsible to people. The tyrants “behaved as though they were gods,” and this form of arrogance is what the Greeks called hubris.
I think that it’s fair to say that by this definition, Google has a kind of absolute power over all of the data on the World Wide Web. They have copied and crawled it and now what they do with it is Google’s business. If Google has crawled the Web and Knowledge Graph has mapped its associations, and together they have determined that the most useful definition of the word “four hundred” is “the social elite of a community,” then who is to say otherwise?
“If we knew all the laws of Nature, we should need only one fact, or the description of one actual phenomenon, to infer all the particular results at that point. Now we know only a few laws, and our result is vitiated, not, of course, by any confusion or irregularity in Nature, but by our ignorance of essential elements in the calculation.'”
WALDEN, AND ON THE DUTY OF CIVIL DISOBEDIENCE by Henry David Thoreau.
I also think that it’s fair to say that by this definition, Google is demonstrating a kind of hubris. There will inevitably be other kinds of errors in the Knowledge Graph; I’ve already seen some of them. Considering these types of mistakes as “engineering problems” on the road to an omnipotent “knowledge graph” is hubris.
Another way to minimize the occurrence of errors in this search experience is by down-sampling the spectrum of searches. While Google is adding dimensions like Google Instant and Knowledge Graph to the search experience, they are also subtracting another dimension: diversity. By aiming to create the maximum amount of utility for the maximum number of people they are also working to exclude peripheral users, at an accelerated rate; reducing this diversity is essential in maximizing the efficiency of Google search.
Using machines to compute the definitions of words, increasing dependence on those definitions, and reducing access to alternative sources is an indirect way of accomplishing this.
In George Orwell’s novel 1984, Syme’s job was to cull the dictionary of words and narrow the range of thought. At one point he muses to Winston:
“Don’t you see that the whole aim of Newspeak is to narrow the range of thought? In the end, we shall make thoughtcrime impossible, because there will be no words in which to express it. Every concept that can ever be needed will be expressed by exactly one word, with its meaning rigidly defined and all it’s subsidiary meanings rubbed out and forgotten. Already, in the Eleventh Edition, we’re not far from that point. But the process will still be continuing long after you and I are dead. Every year fewer and fewer words, and the range of consciousness always a little smaller.”
“One of these days,” Winston thinks to himself with a sudden deep conviction, “Syme will be vaporized. He is too intelligent. He sees too clearly and speaks too plainly. The Party does not like such people. One day he will disappear. It is written in his face.”