Our attempts at creating computer programs that can parse
and gain meaning from the languages humans use is known as natural
language processing. Along with most developments in artificial intelligence
, natural language processing is in its infancy. The most advanced systems we have today still cannot handle language the way a five year old child can. Natural language processing has proven to be extremely difficult.
The human mind is extremely complex, and we've barely scratched the surface of all cognitive science research has to offer. Linguists have not been able to come up with foolproof grammar analysis that could be used in a computer program. Since we don't know how our own brains decode language, it's foolish to expect to be able to reproduce human language analysis with computers in this day and age. Current natural language processing methods are built around the grammar analysis research already done by linguists, modified to fit an engineering problem. esapersona's writeup above mentions the notion of emotion driving language. Emotion gives language its beauty. It also contributes to the biggest problem facing developers of natural language processors: ambiguity.
Computer scientists like to break down big problems into smaller ones, and work from there. Natural language processors use recursion, analyzing word by word and phrase by phrase until they hit something solid that they know exactly how to interpret. This isn't necessarily the way the human mind works, but it works pretty well for computer algorithms. There are many ways to structure a sentence. Much of the effort of building a natural language processor goes into forming the trees they follow as they go word by word down a sentence.
As it turns out, natural language processors are not half bad when they are fed straight-forward, simple sentences. When there is only one way to interpret something, it's not hard to break a sentence up, find the subject, verb, object, any modifiers.... etc. Whenever there is more than one possibilty for how a sentence can be read, difficulty arises. Much of this is because we don't know how a person sorts out the differences between ambiguous statements. Consider the following:
I saw the river walking over the bridge today.
I saw my friend walking over the bridge today.
In the first sentence, it's obvious to us that the speaker is walking over the bridge, and saw the river as he passed over it. In the second sentence, it's probable that both the speaker and his friend were walking across the bridge and they saw each other in passing. But it's also possible that the speaker was riding in a car across the bridge and saw his friend walking across. Or perhaps the friend was in a car while the speaker was walking. All three are valid conclusions one could draw from reading the second sentence. If they wanted to know exactly what happened and remove all ambiguity, the reader of the sentence would ask the speaker to clarify exactly who was walking where. But there is no ambiguity in the first sentence at all.
So what is it exactly that makes us eliminate the possibility of the river walking over the bridge? Well, obviously rivers can't walk, and the bridge is probably over the river in the first place. The ability of people to walk is one of the rules that we know to be true. So we make an assumption that the person was walking while the river was being watched. The sentence is truly unambiguous. However, in the second sentence, both characters are equally likely to be walking, so there is valid ambiguity there. We can program the rule that rivers can't walk into the analysis system, but that opens up a whole new bag -- that's a lot of rules to add.
Fine, you say. Let's add a basic set of rules to the system and it will work. Not quite. Doing that would take a very long time, and it won't magically make clarity out of haze for the entire English language. How would the system interpret this sentence, in comparison to a human?
The old man the machines.
A word that is used mostly as a noun is being used as a verb! The old people are manning the machines. Since natural language processors operate mainly as trees, do we follow the branch for a noun clause, a verb clause.... how do we handle this? This is just another example of the convoluted nature of natural language. There are many, many cases of ambiguity in language. It would be a mistake to try and conquer all of the ambiguity in natural language by introducing rules and other fixed paths. Instead, we must try to teach our systems to accept it and make assumptions, working more like the human mind.