Data mining is one of the current hot fields of Computer Science. The new buzzword running around for data mining is 'Knowledge Discovery'.

Humans have limits to processing data, and one problem that many organizations are facing is the huge amount of data that have been collected, and the inability to gather relevant information from it. A lot of research is going into Artificial Intelligence algorithms and statistics to help data mining.

One example of data mining, is looking at a huge Excel spreadsheet, and being able to say, so-and-so will probably buy chocolate pretzels in her next trip to the supermarket.

Another issue with data mining is, how do you present information in such a way, that humans can make effective use of the information? This field needs to work with people who have intimate knowledge about visual perception, so that not only can data be mined, but it would be useful to the human users.
Reposted due to a node title typo

Data mining is the art of extracting (hopefully valuable or useful) information or knowledge from extremely large amounts of accumulated data (usually in the form of data warehouses) without necessarily having any prior knowledge of the kind of information you're looking for.

That last part is what makes data mining difficult and exciting and different from the kind of data analysis that people are used to. Normally, you already have an idea what you're looking for and then test this hypothesis by conducting experiments that are designed to make it clearly visible and gathering only the data needed to prove (or disprove) the hypothesis.

Data mining, however, is done by collecting all available data - large amounts of unordered records, often with a high dimensionality - and then looking for interesting patterns: correlations between certain parameters, periodic cycles, etc. Of course, the main problem is that it hard to find something without knowing what it is. Some things (such as correlations) are always interesting, so there are some standard tests. But other, more complex results often require luck and intuition to find.

The biggest problem is the high dimensionality of the data, which makes it impossible to visualize the entire data set and let the most capable known pattern analyzer - the human brain - do the work. Therefore, many methods in data mining aim at somehow reducing the dimensionality of the data without losing information.

To give a practical example, a company that produces sheet metal may notice that the hardness of their product fluctuates quite a lot. They want it to be hard, of course, but they don't know which constellation of the many parameters (temperature, pressure, the exact composition of the metal alloy, the presence of certain catalysts, the rates at which all of this changes etc.) produces the optimal result. Careful data mining of sensor readings during the production process may reveal that the sheet metal is hardest when it contains 15% copper that is added slowly after all the other ingredients have been heated to no more than 1600 degrees Celsius at at least 200 bar pressure (I made these values up, of course).

Log in or register to write something here or to contact authors.