display | more...

The test set cross validation method is used in the field of machine learning to determine how accurately a learning algorithm will be able to predict data that it was not trained on. In the test set method, cross validation is done by randomly choosing 30% of training set data to be used as a test set. What this means is that if there are 100 training data points, 30 of them will be randomly chosen to be set aside. The regression or learning algorithm is then performed on the remaining 70 data points.

After training is complete, use the resulting trained algorithm to classify or estimate the points that were set aside. If this is a classification problem, calculate the percentage of points that were classified incorrectly. If it is a regression problem, calculate mean squared error. The resulting error is the test set cross validation error for this problem.

Test set cross validation is useful because it is simple. It's main drawback, however, is that it wastes data. By forcing the algorithm to train on only 70% of the existing data, we may not have an accurate portrayal of how the algorithm will train on the full data set. Test set cross validation is highly dependent on how "luckily" or "unluckily" the test set data was chosen.

Log in or register to write something here or to contact authors.