A method frequently used in statistical learning to get, as the name states, the principal variation components of the data distribution.

Let's say we have a set of n vectors belonging to the three-dimensional space. They are just points in the space. If you use the PCA, you will get:
  • the average elements of the distribution (the points with the smallest average distance from the other points, I guess).
  • a set of n orthogonal, three-dimensional vectors, called principal components, that represent the directions in which the points are distributed.
The set is ordered depending on the variation of the points projections on each p.c., from the biggest to the smallest. In this way we can get a dimensionality reduction of your set of points, taking in account only the first m points and discarding the other.
By construction, the points you chose are the most significant, and if you don't discard to many of the less significant p.c.s, you can get an accurate (at least from a statistical point of view) decription of the starting set.

Note that, even if the n p.c.s are geometrically orthogonal, that doesn't mean that they are statistically independent: actually, given that you compute also an average point, the meaningful p.c.s are only the first n-1.