Distance-Based Outliers

Most existing work in data mining has focused on the discovery of patterns. For some applications, however, the patterns are well-established, and it is the exceptions to those patterns that are of interest.

We are performing on-going research on the identification, explanation, and generalization of distance-based outliers (DB-outliers). An outlier is a statistical term for any data value that seems to be out of place with respect to the rest of the data. Formally, given user-defined parameters p and D, and a distance function F, an object O in a dataset T is said to be a distance-based outlier if at least fraction p of the objects in T lie greater than distance D from O.

Our research has been applied to identify outliers among players in the National Hockey League, based on the players' performance statistics. We have also applied our work to stock market, mutual fund, education, insurance, and video surveillance data.

