Reposted due to a node title typo

Data mining is the art of extracting (hopefully valuable or useful) information or knowledge from extremely large amounts of accumulated data (usually in the form of data warehouses) without necessarily having any prior knowledge of the kind of information you're looking for.

That last part is what makes data mining difficult and exciting and different from the kind of data analysis that people are used to. Normally, you already have an idea what you're looking for and then test this hypothesis by conducting experiments that are designed to make it clearly visible and gathering only the data needed to prove (or disprove) the hypothesis.

Data mining, however, is done by collecting all available data - large amounts of unordered records, often with a high dimensionality - and then looking for interesting patterns: correlations between certain parameters, periodic cycles, etc. Of course, the main problem is that it hard to find something without knowing what it is. Some things (such as correlations) are always interesting, so there are some standard tests. But other, more complex results often require luck and intuition to find.

The biggest problem is the high dimensionality of the data, which makes it impossible to visualize the entire data set and let the most capable known pattern analyzer - the human brain - do the work. Therefore, many methods in data mining aim at somehow reducing the dimensionality of the data without losing information.

To give a practical example, a company that produces sheet metal may notice that the hardness of their product fluctuates quite a lot. They want it to be hard, of course, but they don't know which constellation of the many parameters (temperature, pressure, the exact composition of the metal alloy, the presence of certain catalysts, the rates at which all of this changes etc.) produces the optimal result. Careful data mining of sensor readings during the production process may reveal that the sheet metal is hardest when it contains 15% copper that is added slowly after all the other ingredients have been heated to no more than 1600 degrees Celsius at at least 200 bar pressure (I made these values up, of course).