Last week, Google released Parsey McParseface, a funny name for a state-of-the-art tool aimed at one of the most difficult problems in artificial intelligence. For all that computers have accomplished in the past five years, from winning on "Jeopardy!" to defeating a Go grandmaster, they are still terrible at figuring out what people are saying. Language is one of the most complex tasks that humans perform. That's why there has been such a hullaballo over McParseface, which is pretty much a glorified sentence diagrammer.
McParseface does what most students learn to do in elementary school. It takes a sentence and breaks it down, identifying nouns, verbs and so forth -- and how all of these parts relate to one another. It can tell you, for instance, what the root verb of a sentence is; what is being done to whom and who is doing it.
This is an important first step toward a day when we can talk naturally to our computers. Before Siri can even begin to understand a command like "Can you show me more cat photos," it has to recognise "cat photos" as the object of the sentence.
Interpreting a sentence's grammatical workings seems easy. Humans do it everyday without really thinking about it. But even this basic task can frazzle your brain sometimes. Consider the following:
The horse raced past the barn fell.
This statement makes no sense at first glance. "The horse raced past the barn" is a fine sentence by itself, but why is the word "fell" tacked on the end? What is a "barn fell"?
The whole thing seems ungrammatical, until you realise that the words are actually assembled a different way. Here's how to read it:
The horse (that was raced past the barn) fell.
This is a famous example of what linguists call a "garden path" sentence. When you start to read it, you're led to believe one thing. It seems that the horse is racing past a barn ... somewhere? But when you reach the end of the sentence, you realise that your initial assumptions were wrong. The root verb here is "fell" not "raced." The skeleton of the sentence is: "The horse fell." The intervening words are embellishments.
The fact that people stumble on sentences like this illustrates one of language's major difficulties. Words get complicated fast. "It is not uncommon for moderate-length sentences -- say 20 or 30 words in length -- to have hundreds, thousands, or even tens of thousands of possible syntactic structures," Google researcher Slav Petrov wrote in his blog post about Parsey McParseface.
Our minds usually do a good job using context and real-world knowledge to throw out unlikely interpretations. For instance, when someone says "The dad mixed the batter with the blueberries," we understand that blueberries are an ingredient in the batter -- not that the dad used the blueberries as a tool to mix the batter.
Garden-path sentences trick us by turning our instincts against us. They incorporate common phrases that make us assume we're reading a kind of sentence we've seen many times before. Then surprise! It's not. The arrangement of words in a garden-path sentence can be so unfamiliar that we get stumped trying to search through all the possibilities. (Is "horse" being used as a verb? Is "fell" being used as a noun? Is there some other definition of "raced" that we're not picking up on?)
These are more than just funny brainteasers. "Most of these sentences were invented by psycholinguists to break the human mind," says Ted Gibson, a professor of cognitive science at MIT. "We make up these examples to test how humans understand language. Each one is a really a little experiment." By watching people puzzle through them, scientists have discovered some of the tricks the brain uses to make language feel so easy (most of the time).
How Parsey McParseface deals with garden-path sentences
But before we get to those insights, how about Parsey McParseface? How does it fare with these sentences? Surprisingly well, it turns out.
Here's a GIF from Google explaining how SyntaxNet, the system underlying McParseface, works.
Parsey McParseface reads sentences a lot like humans do. It processes the words in order, from left to right. It makes guesses about that word's role in the sentence and how that word relates to the others. It learned to do this by studying thousands of sentences that had already been analysed by linguists.
Parsey McParseface keeps in mind several alternative interpretations of a sentence as it reads; by the time it reaches the end, it decides which possibility is the most likely. The picture below, also from Google, shows two of the different guesses the program makes about "The horse raced past the barn fell." The first guess incorrectly takes "raced" to be the root verb of the sentence. The second guess correctly identifies "fell" as the root verb.
The artificial-intelligence routine is far from perfect, of course. Google says Parsey McParseface reaches about 94 percent accuracy identifying the root of an English sentence taken from a newspaper. That's slightly better than some competing systems. (In a recent paper, Google showed off a supercharged version of Parsey McParseface that does even better. As for the competition, you can play with a version of the Stanford NLP parser here. A simpler, but much faster parser is spaCy, which has a demo here. You have to download Parsey McParseface to your computer; it's a little tricky.)
It's interesting that Parsey McParseface can tackle some pretty confusing garden-path sentences. When I tested it, it correctly interpreted statements like:
The old man the boat.
While the man hunted the deer ran into the woods.
While Anna dressed the baby played in the crib.
On the other hand, it goofed on:
I convinced her children are noisy.
The coach smiled at the player tossed the frisbee.
The cotton clothes are made up of grows in Mississippi.
(If you're stumped by some of these sentences, keep reading -- or just skip to the key at the end.)
When I showed some of these examples to a Google spokesman, he conceded that there's still work to be done in the world of AI. But in Parsey's defense, a lot of people also can't understand these sentences.
And nobody talks like this in real life. These grammar puzzles, as Gibson explained, were designed specifically to torment human brains -- to probe how people handle confusing situations.
What we've learned from garden-path sentences
"Human language is really complicated," Gibson says. "There are thousands and thousands of possible interpretations of any sentence of reasonable length."
A central question in linguistics is how humans can zero in on the right interpretation so quickly.
Experiments with garden-path sentences have shown that we rely heavily on experience. We have an innate sense of whether word patterns are common or rare. We are biased toward the more common words or groups of words -- we are faster to understand them, and we more easily decipher the grammar of a sentence with more familiar words. This tendency helps us quickly and accurately navigate everyday life. But with garden-path sentences, our useful shortcuts get exposed.
Consider: "The cotton clothing is made up of grows in Mississippi." We get misled into thinking the sentence is about cotton clothing because "cotton clothing" is such a common phrase.
Or consider: "While the man hunted the deer ran into the woods." We immediately assume that the man is hunting the deer, both because hunting deer is a familiar activity and because we're used to seeing the word "hunting" referring to some kind of prey.
A less confusing version of the sentence: "While the man hunted the vice president ran into the woods." That still sounds a little strange, but it's much easier to get the right interpretation, because, well, vice presidents are seldom hunted.
The research on garden-path sentences continues today. Scientists have discovered, for instance, that people often interpret these sentences in strange, contradictory ways. Take this sentence:
While Anna dressed the baby played in the crib.
You might believe, at first, that Anna is the subject of the sentence and that she is dressing the baby. But by the time you reach the end of the sentence, you should realise that the baby is the real subject of the sentence. A comma helps clear things up:
While Anna dressed, the baby played in the crib.
Because the baby is the subject of the sentence, it can't be the object that Anna is dressing anymore. The baby can't be in two places in the sentence at once. That prompts us to another look at the beginning of the sentence. Upon second glance, we realise that Anna must be dressing herself.
That's how a patient, rational person might think through this garden-path sentence.
But when researchers quizzed people, they found that many believed both that Anna dressed the baby and that the baby was playing in the crib. This doesn't make any sense. Anna and the baby can't both be the subject of the sentence. The researchers believe that this shows how people tend to be lazy with language. They take shortcuts and don't fully think through the underlying grammar unless they have to.
One last example. Consider:
The coach smiled at the player tossed a Frisbee.
This is the opposite of traditional garden-path sentences, which begin by leading you down a false path. This sentence is like a kidnapping -- you start out on the right path, but at the very end it derails you.
The essence of the sentence is in the first six words. "The coach smiled at the player." The last three words are extraneous. They give you some more information about the player, that's it. But somehow, those three last words mess up the entire sentence. We get confused. We want to believe that either the coach or the player tossed a Frisbee, even though nothing in the grammar of the sentence suggests that.
There has been a lot of debate in recent years about what's going on in that sentence. Traditional theories about language processing can't explain why people have so much trouble with it.
A few years ago, Roger Levy, a cognitive scientist now at MIT, came up with an interesting idea: Maybe people get confused because they think they must have misread something. We encounter garbled language all the time. Levy thinks that our minds may try to repair sentences by predicting words that might have gotten lost or mumbled or overlooked. This fuzzy recognition strategy helps us repair bad sentences. In this case, it might be getting in the way.
There are a lot of ways to slightly tweak the Frisbee sentence. Here are a few examples from Levy's paper:
The coach who smiled at the player tossed a Frisbee.
The coach smiled as the player tossed a Frisbee.
The coach smiled and the player tossed a Frisbee.
The coach smiled at the player who tossed a Frisbee.
The coach smiled at the player that tossed a Frisbee.
The coach smiled at the player and tossed a Frisbee.
When someone reads the Frisbee sentence, all of these alternatives might crowd their mind and create confusion.
The mother gave the candle the daughter.
"We often don't notice that it's incredibly implausible as written," Gibson says. "We implicitly correct it to 'The mother gave the candle to the daughter.'"
We've seen now, how garden-path sentences have led to some fascinating insights into the human mind. The lesson from all of this research might be summed up in the following: Language is hard, even though it usually feels easy. When language starts to feel hard, that's when things get really interesting.
How to read these garden-path sentences:
The old man the boat. (The old operate the boat.)
While the man hunted the deer ran into the woods. (While the man hunted, the deer ran into the woods.)
While Anna dressed the baby played in the crib. (While Anna dressed, the baby played in the crib.)
I convinced her children are noisy. (I convinced her that children are noisy)
The coach smiled at the player tossed the Frisbee. (The coach smiled at the player who was tossed the Frisbee.)
The cotton clothes are made up of grows in Mississippi. (The cotton that clothes are made up of grows in Mississippi.)