My PhD research areas were data mining and outlier detection. An outlier is an observation (point, tuple, record) in a dataset that doesn't seem to belong with the rest of the data. In other words, an outlier has sufficiently few points in its D-neighbourhood, for some given radius D. Data mining refers to the efficient discovery of previously unknown and potentially useful information from (large) datasets. Although most existing work in data mining has focused on the discovery of patterns or associations within data, one area that has been largely overlooked is the detection of outliers. Indeed, for some applications (e.g., phone, credit card, or other financial transactions), the patterns are well established; however, it is the exceptions to those patterns that are of interest. Our case studies include: NHL player performance statistics, stock market and mutual fund data, and student performance in computer science courses.
To account for scale, variability, and correlation within the attributes (dimensions) of a multivariate dataset, and to account for the adverse effects that some outliers may have on the search for outliers, we employ methods from robust statistics. Robust methods are said to accommodate outliers because they can handle many outliers before breaking down. For example, a single, very large outlier in a 1-D dataset can greatly inflate the mean and the standard deviation; however, at least 1/2 of the points would have to be sufficiently large in order to cause the median to reach undesirably high values. Thus, we say that the median is more robust than the mean.
Currently, my research is in CS education (evidence-based research and best practices in teaching and learning). As a Senior Instructor, I was also a part-time Science Teaching and Learning Fellow with UBC's Carl Wieman Science Education Initiative (2012-2014): http://www.cwsei.ubc.ca.