A function that maps data to labels. Informally, it's a mathematical model (or computer program, if you wish) that attempts to identify objects based on measurements or other data. It may return the most likely identification, or the probability distribution for all possible identifications, or a simple "no idea".

You can think of Peterson's Field Guide to Birds as a kind of classifier, in that it maps characteristics like size, color, body shape, and locale to species. These data aren't precise, and they're usually not part of the formal definition of a species. But even though "seldom seen in deserts" isn't part of the definition of geese, it's certainly useful in identifying what's likely to be a goose and what isn't. The characteristics used by a classifer are known as predictor variables.

A classifier is built using the statistics of these data. Several factors may be considered:

  • How much classifying information is there in each predictor variable? A variable such as "Does it have feathers?" doesn't help much.
  • Are there overlapping or correlated variables? In other words, if we know the value of one variable do we know the value of other variables? If we know that a bird has large, nasty claws, we probably don't have to ask whether it also has a large, nasty beak.
  • Are certain variables significant only in combination with each other? Height and weight aren't significant in classifying people as skinny or fat by themselves but they are in combination. (Sorry, I couldn't think of any bird examples.)
Once we know which variables are significant, we have to determine how to combine them. Again, there are many choices:
  • Convert the variables to numbers, multiply each by a weight, take the sum, then give labels to the ranges of results. This is known as regression, and if the sum ranges from 0 to 1 and represents a probability, we have logistic regression.
  • Make a decision tree. At the root, we use a variable to split the space of identities based on certain values of that variable, and at each split we apply other variables, and so on, until the leaves of the tree are unique identities. This is fields guides often do. (Formally, this is a classification tree.)
  • Train a neural network to accept the data and return classifications. Of course, we would have no clue as to which variables it was actually using or how it was combining them, but we'd have a decent classifier.
  • Compare new data with objects previously classified by an expert and choose the identity of the object that appears to be most similar. This is known as a nearest neighbor algorithm. ("This one sure looks like it...hey, the tag here says it's a goose".)
  • Calculate the conditional probability of the values of each variable given the identity, calculate the prior probability of each identity, then use Bayes' theorem to calculate the probability of the identity given the data. This is called a Bayesian classifier. If we make the strong assumption that the variables are independent (that is, the value of one variable doesn't tell you anything about the values of another variable), then we have a naive Bayesian classifier.
  • Create some wild-ass function that maps data to identities. This can get you published if you can pass the peer review.
Although the specific process of creating (or as it's sometimes called, inducing) each type of classifier is too long to be described here, there are is a strategy for creating classifiers in general:
  1. Create a set of pre-identified data and divide it randomly into a training set and a test set.
  2. Create a classifier using the training set.
  3. Apply the classifier to the test set.
  4. Compare the results of the classifier with the actual identities from the test set. In general, there are two kinds of mistakes. Assuming that we're really interested in identifying geese:
    • False positive – the classifier claims that it's found a goose when in fact it's not really a goose
    • False negative – the classifier claims it's not a goose when in fact it is a goose
The importance of false positives and false negatives are apparent when you think of important classifications, such as disease diagnosis. A false positive classification of appendicitis may mean someone gets an appendix extracted needlessly. A false negative may mean someone dies.

Much like the classifiers described in Percepied's post, some spoken languages have words that label certain kinds of data. They assign items into certain classes used for discussion of such objects in general or for counting of objects.

One such example in English would be counting "sheets" of paper. This expression classifies paper into a sheet class. (Much like children objects in programming-ish terms)

The Thai language uses classification to a much greater extent than English. Not that English doesn't have classes for things but we tend to use short specifics like "5 ashtrays" and assume what sort of objects ashtrays are rather than say "5 ashtrays belonging to a meta-object that defines ashtrays, toilet seats, and intangible objects such as words". Thai has classes for everything from people (คน, "kon") to vehicles (คัน, "kun") to round & hollow objects (ใบ, "bai"). These are always used when counting (e.g.: "students 5 people" would be the typical grammar structure) and often used when discussing the class or sub-objects in general.

I'm sure many other languages use such classification systems.


Gritchka says: Strongly characteristic of East Asian languages: in Chinese, Japanese, and Indonesian the classifier is obligatory. As far as I'm aware all number + noun pairs have classifiers of the 'sheet', 'pipe', 'person' kind in those three, and of course many other regional ones.

In sign language, a classifier is a handshape that represents an object already named, or an object whose visual appearance is being described.

Perhaps the most simple ASL classifier is the handshape used for the number one. This can represent a person standing, a candle, or an object of similar shape, regardless of size. The "person standing" classifier can be used to show a person's movement across a room. Once the spatial setup has been established, the classifier moves across the space to represent the object's motion.

Some classifiers do not resemble their associated objects. A car, truck, motorcycle, police car, train, and most other vehicles are represented using the three handshape, with the thumb pointed toward the ceiling (if the vehicle is right-side-up).

When used to describe the shape of an object, classifiers do not indicate movement or placement. The handshape used for the letter c is the classifier for a mug, cup, glass, bottle, and other objects of similarly uniform round shape. The signer's hand moves up from its starting point, stopping at the approximate height of the object - a glass, for example, will be taller than a mug. (Martini glasses, teacups, and others are not shown with the c classifier, instead their shape must be explained with appropriate combinations of handshapes.

Classifiers are an essential component of communicating in sign language. As such, sign language teachers must refer to the term "classifier" frequently. In these classes, and elsewhere when the term "classifier" represents a handshape, the term is abbreviated to the fingerspelled letters C-L.

Clas"si*fi`er (?), n.

One who classifies.

 

© Webster 1913.

Log in or register to write something here or to contact authors.