researchers are constantly trying to develop computer systems that are capable of learning. The basic major types of learning systems that have been developed and implemented are k-nearest-neighbor learning
, identification trees
, neural nets
, and support vector machines
. Each has their advantages and disadvantages. I do not include genetic algorithms
here because they don't classify or regress, they optimize. GAs are more of an efficient guess-and-check method than a learning method. GAs do, however, share many of the characteristics of the aforementioned techniques. More advanced and complicated systems such as W learning
are in constant development and have no real worldwide standards just yet.
The applications of computer learning can be divided into two broad categories: classification and regression. Classification is exactly what it sounds like. It means that given an unknown object, a learning system will be able to correctly classify it into one of a certain number of categories. Regression is essentially function approximation. In regression, given a set of inputs, a learning system will produce a specific output.
Computer learning systems all use relational data bases for their learning. When they are trained by their human developers, they associate inputs and characteristics with outputs and classifications. When it comes time to analyze an unknown, an algorithm will look at the data in the database to help it make a decision. The algorithms are extremely different. Nearest neighbors picks the k elements in the database most like the unknown, while identification trees subject the unknown to a series of conditional tests and compare those results to the results of the tests done on the training data. Neural nets attempt to simulate the human mind by using inputs and outputs on "neurons" and assigning weights to inputs based on their importance. Support vector machines use complicated mathematical formulae to make decision boundaries that divide up the feature space that all of the sample points lie in.
For classification, nothing is better most of the time than support vector machines. The other algorithms all have flaws which make them clearly inferior to support vector machines. With k-nearest-neighbors algorithms, it's hard to pick an effective k. Too small of a k, and the algorithm becomes too susceptible to noise. If the algorithm only considers its next nearest neighbor (k=1), for instance, any unknown that closely matches a noise point will be misclassified. Nearest neighbors algorithms have a slight advantage over the other forms of classification when there are many classification categories possible, because the calculations can become extremely complex for support vector machines. Neural nets are terrible at classification. They take a long time to train, and they tend to produce bad results. Their method of backpropogation and developing weights does not fit classification very well at all. Identification trees, on the other hand, have a good niche. They are effective when the comparisons in characteristics are symbolic and not numerical. Because identification trees work on conditional statements, they don't need to use numerical data, like every other algorithm discussed here does. It is possible to enumerate the symbolic data, but that always proves to be more trouble than it's worth. Identification trees can take a long time to draw a conclusion because they have to consider many tests. This can be a problem if there are many possible values. But most classification problems involve only numbers, so support vector machines are used. They make more concrete and better decisions than nearest neighbors algorithms do, and behave more like a human would in most situations.
If it were not for regression, neural nets would be obsolete. All of the other algorithms will pale in comparison to the way neural nets handle regression. Since we are dealing with function approximation, the neural net method of training and assigning weights to inputs is absolutely perfect.
All learning algorithms are susceptible to underfitting and overfitting. When training a learning algorithm, one has to be careful not to do either of these things. If the algorithm is overfitted, it pays too much attention to a single data point and becomes affected greatly by a noise point. Underfitting is exactly the opposite. The algorithm will focus on a whole bunch of points, allowing many points of data to influence the decision, including those outside wherever the decision boundary should be.