display | more...
Identification trees are one way for computers to express learning through classification. Artificial intelligence programs store data learned in some form of relational database. When those learned data are applied, they must be extracted from the database and used to make guesses about which classification the new quantity being analyzed should go under. One way to do that is to subject the new data to many conditional tests, and that is precisely how identification trees work.

When a new entry comes up to be tested, the identification tree algorithm starts. The algorithm runs tests on the data in the database and matches it up with the results found on the unknown entry. When the test is run on the data in the database, a certain number of instances in each classification will show up in the true and false cases. The test is then run on the unknown, and evaluates to true or false. Whichever elements in the database do not share that characteristic are eliminated, then another test is run. This continues down the "tree" until the unknown is in a homogeneous set with some group of training data with the same characteristics, according to the test. The unknown, then, is the same classification as those data.

Here's an example. The whole purpose of this algorithm is classification. We'll make a program that determines whether or not a restaurant is a good place to eat. Lots of factors pop into mind, but let's consider three: Food Quality, Service, and Price. You take a survey of 100 sample restaurants, using knowledge engineering, determining whether they are good or bad and taking down the data for Food Quality, Service, and Price.

After some magic database analysis, you come up with three different tests to do to narrow things down for a new restaurant:

Price > 50
Food Quality > medium
Service > great

The results of the test on the training data are as follows:

Price > 50:
True: 25 Good Restaurants, 25 Bad Restaurants
False: 25 Good Restaurants, 25 Bad Restaurants

Food Quality > medium:
True 40 Good Restaurants, 20 Bad Restaurants
False 10 Good Restaurants, 30 Bad Restaurants

Service > great
True 50 Good Restaurants, 0 Bad Restaurants
False 0 Good Restaurants, 50 Bad Restaurants

Now, it's very obvious which test you should run. If you run the service test, you can determine whether it is good or not in one step. How do you tell your program to go with the best test? How will it know?

Since you have the results of how the training data turned out for these tests, the answer is very simple. You can measure the disorder of each test. The service test has zero disorder, while the price test has the highest disorder possible. If you choose the test with the lowest disorder, you will end up with a more homogeneous set. Simply choosing the test that has the lowest disorder is known as a greedy algorithm, and does not take combinations or any other fancy tricks into account. When multiple tests are run, the remaining tests have to be run again, with data eliminated if it did not match the unknown's result for the last test. It's not hard to imagine how this can consume time and space easily.

Identification trees have advantages and disadvantages. For instance, they are much better than k-nearest-neighbour learning algorithms when the data measured are not in numerical quantities. Non-numerical quantities are really hard to represent in feature space. Another factor: those conditional tests are not foolproof. If we got those data in the example above, if the service was great, the restaurant is guaranteed to be good. However, what if we test a restaurant where the service is excellent, but the food is terrible? It wouldn't be a good restaurant, but our program would determine otherwise. Just like human beings, artificially intelligent systems make a lot of mistakes.