In statistics and pattern recognition, a classifier that assigns an identity to a new object by using the identity of the most similar known object.

Suppose you're trying to identify plants. Each time you find a plant you make a series of measurements from its leaves: general shape (e.g., oval, scalloped, pointed, ...), length, width, color, and so on. You then compare these data to data cataloged previously about known plant species. You pick the one that's closest. If the closest one isn't very close, you reject it and declare that you have a new specimen.

The major problem with this algorithm is that it's often problematic to define closeness when considering variables of entirely different scales (e.g., color vs. length). Another problem is the choice of variables: two variables may be significant only when combined. For example, the size (height and width) of a plant's leaves may vary greatly but the aspect ratio may be relatively constant.

Log in or register to write something here or to contact authors.