The Problems with Machine Learning

I often use the [define:] search operator to find the meaning and spelling of words. It is incredibly convenient with the Chrome Web browser: you must type a search query into the address bar. This has been useful and empowered me to expand my vocabulary.

Up until recently, these search results were citing a variety of different sources. The medical and legal definitions that sometimes crowded, and the search results were seldom helpful to me. Still, this diversity was almost always beneficial when I needed to understand the less common uses of words.

Example of the new search result format for searches using the [define:] operator.

This experience recently changed, and the following improvements were made to the [define:] search results:

The citations for definitions were removed.
The etymology of words was added.
The translation of words became an available option.
The usage of words is now being graphed over time.
And a few more minor things were changed or added…

Article Marketing These changes were motivated by a desire to improve the user’s search experience. Although some new uses were added, one method was taken away: the ability to check multiple sources within these results. This change seems benign, but how Google removed this feature and still managed to “improve the search experience” is a point of interest.

Earlier this year, the Guardian published an article called Google and the Future of Search: Amit Singhal and the Knowledge Graph by Tim Adams. In it, Tim Adams interviews Amit Singhal, head of Google Search. Adams says that Google is “on the threshold of another epochal change.”

“Having searched for a decade or so using the original brilliant principle of hierarchies of web-based links, the great primary colored knowledge domination machine has, Singhal suggests, ‘begun to learn how to understand the real world of people, places and things.'”

Google and the future of search: Amit Singhal and the knowledge graph by Tim Adams. The Guardian published them.

This epochal change is Knowledge Graph, a system that learns from all of Google’s data. The associations that Knowledge Graph finds are then used to enhance the user’s search experience.

This may be how Google foregoes the inclusion of sources in its definitions. The references are no longer necessary because the definitions have been computed. If you’d still like to find alternative definitions, they are below the [define:] search results. However, you should be forewarned of the arduous and archaic search experience of visiting multiple Web sites to find an answer.

Although Google has computed the definitions of words with a high degree of accuracy, I recently experienced a problem with the results:

I wanted to know whether or not the word “four hundred” was hyphenated. To find that out, I did what I’ve grown accustomed to: I typed my search query into the address bar of the Google Chrome Web browser, [define: four hundred].

Example of the expanded search results for the search query [define: four hundred].

The search results did not answer whether or not the name of the number 400 was hyphenated. Instead, it prompted new questions for me, “did I receive the correct definition? If not, then was the error something that I did? Did I spell the word ‘hundred’ right? If this is the right word and wrong definition, where are any alternative definitions?“

I wasn’t only presented with the wrong definition: I was given the etymology of the wrong meaning, and it was a word that I didn’t care about at that time. I also wasn’t presented with any alternative definitions.

I felt a sense of exclusion from the convenience of this whole process. I’ve grown to depend on the [define:] search operator. I have long since phased out whatever resources I had formerly used to find the meaning and spelling of words.

When the process grinds to a halt, what will be the alternative? Will the Web sites that I formerly used still be in business? Will I be able to find my copy of the dictionary? Will there be anyone nearby to ask? If so, then will they know the answer without searching?

As we forget the traditional social structures we once depended on, we are replacing them with technological infrastructure. Our trust and dependence are now being placed on technology companies like Google. When a part of this infrastructure fails, so will a deeply ingrained part of us, and that feeling will be personal.

Similar definitions provide hints about why this particular problem happened. The [define: five hundred] search results have the same issue: the meaning has been commandeered by another noun, meaning “a form of euchre in which making 500 points wins a game.” This contrasts the [define:three hundred] search results, which are universally helpful; three hundred is defined as “being one hundred more than two hundred.”

Knowledge Graph finds enough associations to define these words another way; perhaps Google will continue to define these words in these terms. One day, the Knowledge Graph will either include deep cultural associations that supplant the mathematical definition of numbers or precise definitions. If there’s one thing to be learned from this, I should’ve used http://www.wolframalpha.com/input/?i=four+hundred at Wolfram|Alpha for this type of search instead.

Example of a search for [four hundred] using Wolfram|Alpha.

Today, tyranny is a bad word. We’ve learned to despise tyrants, and we attribute a lot of bad qualities to them. However, there was a time when tyrants made some undeniably positive contributions. In the cradle of Western civilization, tyrants created economic prosperity and urban infrastructure, held court and were patrons of the arts, and eroded and redistributed the power of the aristocracy, which was an essential factor leading to democracy.

In Professor Donald Kagan’s Introduction to Ancient Greek History, he explains that for these reasons, being for or against tyranny was not a simple question for the Greeks. However, one of the things that made the tyrants particularly terrible was that they had full power and no responsibility to anyone. The Greeks believed that for a force to be legitimate, it had to be responsible to people. The tyrants “behaved as though they were gods,” and this form of arrogance is what the Greeks called hubris.

I think it’s fair to say that by this definition, Google has absolute power over all of the data on the World Wide Web. They have copied and crawled it, and now what they do with it is Google’s business. If Google has crawled the Web and Knowledge Graph has mapped its associations, and together they have determined that the most helpful definition of the word “four hundred” is “the social elite of a community,” then who is to say otherwise?

“If we knew all the laws of Nature, we should need only one fact, or the description of one actual phenomenon, to infer all the particular results at that point. Now we know only a few laws, and our result is vitiated, not, of course, by any confusion or irregularity in Nature, but by our ignorance of essential elements in the calculation.'”

WALDEN, AND ON THE DUTY OF CIVIL DISOBEDIENCE by Henry David Thoreau.

I also think it’s fair to say that Google is demonstrating a kind of hubris by this definition. There will inevitably be other errors in the Knowledge Graph; I’ve already seen some. Considering these types of mistakes as “engineering problems” on the road to an omnipotent “knowledge graph” is hubris.

Another way to minimize errors in this search experience is by down-sampling the spectrum of searches. While Google is adding dimensions like Google Instant and Knowledge Graph to the search experience, they are also subtracting another dimension: diversity. By aiming to create the maximum utility for the maximum number of people, they are also working to exclude peripheral users at an accelerated rate; reducing this diversity is essential in maximizing the efficiency of Google search.

Using machines to compute the definitions of words, increasing dependence on those definitions, and reducing access to alternative sources is an indirect way of accomplishing this.

In George Orwell’s novel 1984, Syme’s job was to cull the dictionary of words and narrow the range of thought. At one point, he muses to Winston:

“Don’t you see that the whole aim of Newspeak is to narrow the range of thought? Ultimately, we shall make thoughtcrime impossible because there will be no words to express it. Every concept that can ever be needed will be expressed by exactly one word, with its meaning rigidly defined and all its subsidiary meanings rubbed out and forgotten. Already in the Eleventh Edition, we’re not far from that point. But the process will continue long after you and I are dead. Every year fewer and fewer words and the range of consciousness is always a little smaller.”

“One of these days,” Winston thinks with a sudden deep conviction, “Syme will be vaporized. He is too intelligent. He sees too clearly and speaks too plainly. The Party does not like such people. One day he will disappear. It is written in his face.”

Problems with Machine Learning

The Problems with Machine Learning

Ready to Collaborate? Contact Us!

Search

Categories.

NEWSLETTER