A method frequently used in
statistical learning to get, as the name states, the principal
variation components of the data
distribution.
Let's say we have a
set of
n vectors belonging to the three-dimensional
space. They are just points in the space. If you use the PCA, you will get:
- the average elements of the distribution (the points with the smallest average distance from the other points, I guess).
- a set of n orthogonal, three-dimensional vectors, called principal components, that represent the directions in which the points are distributed.
The set is ordered depending on the variation of the points
projections on each p.c., from the biggest to the smallest. In this way we can get a
dimensionality reduction of your set of points, taking in account only the first
m points and discarding the other.
By construction, the points you chose are the most
significant, and if you don't discard to many of the less significant p.c.s, you can get an accurate (at least from a statistical point of view) decription of the starting set.
Note that, even if the
n p.c.s are geometrically orthogonal, that doesn't mean that they are statistically
independent: actually, given that you compute also an average point, the meaningful p.c.s are only the first
n-1.