The goal of data mining, or knowledge discovery (from databases) (KDD), as it is variously called, is to extract interesting and useful patterns from large data sets and use them for improving the organizational objectives of an enterprise. The enterprise may be a business but may also include scientific discovery. What is a pattern? Researchers have identified various patterns such as frequent itemsets, association rules, correlations, clustering, classification, depth contours, data spheres, etc. as worthwhile patterns.
The state of the art of research in data mining offers many efficient algorithms for extracting patterns that match user-supplied specifications that attempt to capture usefulness/interestingness. When is a pattern useful? This is a difficult question with deep implications. Usefulness is application dependent and sometimes lies in the eye of the beholder. This suggests data mining is necessarily a multi-step exploratory process. Current technology offers little support the exploratory process that underlies data mining. A significant part of our research into data mining seeks to mitigate this by developing a framework that explcitly recognizes this exploratory process. Our framework has the following key components:
(1) A constraint language by means of which a user can express the focus in data mining, in terms of the insight they have based on the application semantics: e.g., find associations between cheap and expensive items and thos coming from different types.
(2) A rich theory that exploits constraints to the fullest (in a precise sense) in pruning the search space of mining, thus making it very efficient. The fast mining algorithms that come with such a theory.
(3) Techniques for extracting patterns online even as the user alters either the numerical parameters that control the mining process, or the aforementioned constraints.
The state of the art has been confined to developing methodologies for doing any one mining task effectively and efficiently. What if applications require sequences of mining tasks to be performed in a way that the outcome of one affects the way the next task is to be performed? Unfortunately, the only way to do this today is to make manual hooks between successive steps in the sequence.
Another important part of our data mining research addresses this significant void by modeling mining tasks as operators tat manipulate data and generate patterns. Indeed, we have developed a mining algebra where patterns can be manipulated as first-class citizens.