A Taxonomy of Visual Cluster Separation Factors

Michael Sedlmair, Andrada Tatu, Tamara Munzner, and Melanie Tory

A taxonomy of data characteristics with respect to class separation in scatterplots. Some factors are organized as axes (arrows) while others are binned. Between-Class factors often result from the variance of Within-Class factors (horizontal dependencies), and factors at the top can strongly influence factors below them (vertical dependencies). Class Separation is therefore dependent on all other factors.

Abstract

We provide two contributions, a taxonomy of visual cluster separation factors in scatterplots, and an in-depth qualitative evaluation of two recently proposed and validated separation measures. We initially intended to use these measures to provide guidance for the use of dimension reduction (DR) techniques and visual encoding (VE) choices, but found that they failed to produce reliable results. To understand why, we conducted a systematic qualitative data study covering a broad collection of 75 real and synthetic high-dimensional datasets, four DR techniques, and three scatterplot-based visual encodings. Two authors visually inspected over 800 plots to determine whether or not the measures created plausible results. We found that they failed in over half the cases overall, and in over two-thirds of the cases involving real datasets. Using open and axial coding of failure reasons and separability characteristics, we generated a taxonomy of visual cluster separability factors. We iteratively refined its explanatory clarity and power by mapping the studied datasets and success and failure ranges of the measures onto the factor axes. Our taxonomy has four categories, ordered by their ability to influence successors: Scale, Point Distance, Shape, and Position. Each category is split into Within-Cluster factors such as density, curvature, isotropy, and clumpiness, and Between-Cluster factors that arise from the variance of these properties, culminating in the overarching factor of class separation. The resulting taxonomy can be used to guide the design and the evaluation of cluster separation measures.

Paper

A Taxonomy of Visual Cluster Separation Factors

Michael Sedlmair, Andrada Tatu, Tamara Munzner, and Melanie Tory

In Computer Graphics Forum (Proc. EuroVis), 31(3), 1335-1344, 2012.

→ PDF (1.1 MB)
→ BibTex

Videos

Video 1: Lookup table of all 816 scatterplot representations we inspected in our study.
→ AVI video (60.8 MB, no audio, tested on VLC 1.1.12)

Video 2: The interactive 3D data viewer we used in our study.
→ MP4 video (5.1 MB, no audio, tested on VLC 1.1.12)

Talk

Presentation at EuroVis 2012:
→ PDF - slides without animations (13.3 MB)
→ MOV - slides with animations (21.0 MB, tested on QuickTime 10.0)
→ MOV - prerecorded talk with audio (15:33 min, 87.4 MB, tested on QuickTime 10.0)

Fast Forward

Video of the 30 sec "Fast Forward" presented at EuroVis 2012:
→ MOV - prerecorded Fast Forward with audio (0:30 min, 6.3 MB, tested on QuickTime 10.0)

Supplemental Material

The supplemental material includes the following information:

Mathematical details about the measures used and the extensions we made
Parameterization of the dimension reduction (DR) techniques we used
A list of all datasets we analyzed in the qualitative data study
Condensed list of codes resulting from the open coding process
Plots of further grid size analysis

→ PDF (651 KB)

HighRes Figures

Taxonomy:

Mapping of tested datasets and measures onto taxonomy:

Michael Sedlmair

Last modified: Jul 17, 2012