DistanceBased Outliers
Most existing work in data mining has focused on the discovery
of patterns. For some applications, however, the patterns are wellestablished,
and it is the exceptions to those patterns that are of interest.
We are performing ongoing research on the identification, explanation,
and generalization of distancebased outliers (DBoutliers). An
outlier is a statistical term for any data value that seems to be
out of place with respect to the rest of the data. Formally, given
userdefined parameters p and D, and a distance function F, an object
O in a dataset T is said to be a distancebased outlier if at least
fraction p of the objects in T lie greater than distance D from
O.
Our research has been applied to identify outliers among players
in the National Hockey League, based on the players' performance
statistics. We have also applied our work to stock market, mutual
fund, education, insurance, and video surveillance data.
Detailed information about distancebased outliers can be found
in:
Edwin M. Knorr and Raymond T. Ng. "Algorithms for Mining DistanceBased
Outliers in Large Datasets", Proceedings of the 24th VLDB Conference,
New York, August 2427, 1998, pp. 392403. Postscript
Edwin M. Knorr and Raymond T. Ng. "Finding Intensional Knowledge
of DistanceBased Outliers", Proc. VLDB, Edinburgh, Scotland, September
710, 1999, pp. 211222. Postscript
Edwin M. Knorr, Raymond T. Ng, and Ruben H. Zamar. "Robust Space
Transformations for Distancebased Operations", Proc. SIGKDD, San
Francisco, August 2629, 2001, pp. 126135. Postscript
More information on outlierdetection in video surveillance can
be found in:
Edwin M. Knorr, Raymond T. Ng, and Vladimir Tucakov. "DistanceBased
Outliers: Algorithms and Applications", The VLDB Journal, 8(3),
February, 2000, pp. 237253. Postscript
or Compressed Postscript
