DimStiller v 0.2.1
DimStiller is a visual tool for dimensional analysis and reduction. Users can create pipelines called "expressions" in which they chain together analysis techniques called "operators".
One way to think about the task of dimension reduction is to consider the space of all data tables. The user's data is a single point in this space. The desired table produced from the users analysis is another point. The task is then to find the mapping or composition of mappings that move from the input table to the output table. DimStiller operators map input points to some other output point and therefore represent edges between the input and output tables in table space. DimStiller expressions, which are compositions of these operators, represent a connected path of points in table space.
The DimStiller UI consists of three parts indicated in the figure:
- WORKFLOWS. The top part labelled "WORKFLOWS" is the workflows selector. Workflows are like expression templates. Click "Add" to create a new expression from a workflow, or click "Apply" to add the workflow to a loaded expression.
- EXPRESSION TREE. The Expression Tree
- OPERATOR CONTROL. When a user hilights an operator in the expression tree, the operator's control shows up here. The user can adjust the parameters of the operator using widgets that appear here.
DimStiller has three menus:
- New Expression... : Creates a new expression in the expression tree.
- Open Expression...: Loads a saved expression from disk.
- Save Expression...: Saves the currently selected expression to disk.
- Save Expression as Workflow...: Creates a new workflow (expression template) from the operators in the currently selected expression.
- Save Operator to File...: Saves just the output table of the currently selected operator to disk.
- Workflow: lists the workflows in the workflow directory. Selecting them in the menu creates a new expression with the workflow applied to the selected input data.
- Operators: lists the Operators in the operator directory. Selecting them in the menu applies them to the expression.
To use DimStiller, you first create a new expression. This will implicitly ask you for an input table (see format description) and create a new expression with a single "Input:File" operator. Now the user has two choices: They can
Watch a video demo of an earlier version of the tool here.
The first line in DimStiller should be a comma-separated list of dimension titles, one entry for each dimension.
The second row in DimStiller should be a comma-separated list of dimension types. Types can be one of the following three case-insensitive values "CATEGORICAL", "ORDINAL", or "NUMERIC". If this line is left out, then all dimensions are considered "NUMERIC" by default. Every following line is interpreted as a point in the dataset with a comma-separated list of point values. DimStiller does not currently support missing entries in data tables.
Here is an example file:
Command Line Arguments
This section is purely for reference. One can use the packaged shell script or batch file to invoke DimStiller rather than using the command line arguments.
Default command line invocation:
java -cp .:core.jar:Jama-1.0.2.jar still.gui.DimStiller -D still/operators/ -W workflows/
-D d1:d2:...:dn Location of directories
dn containing DimStiller operator binaries (default still/operators/)
-W d1:d2:...:dn Location of directories
dn containing workflow files.
-I filename Input file option. Creates a new expression at startup.
- Attrib:Color Allows you to select an input dimension by which to color as well as whether to use categorical or sequential coloring. Users can permute the colors (for categorical coloring). By default, the coloring dimension is culled from the output of the operator. Uncheck the box to avoid this behaviour. Coloring only shows up in operators which use the color attribute, which currently is only the View:SPLOM operator.
- Collect:Pearson's Correlation Collects correlated dimensions together. The threshold by which dimensions get collected is controlled by a slider in the operator control. Displays a correlation matrix view of the output dimensions with a red-yellow-blue color ramp with red highly negatively correlate, yellow uncorrelated, and blue highly positively correlated.
- Cull:Name Permits the user to manually filter out desired dimensions by name.
- Cull:Variance Remove any dimension whose variance is less than a certain threshold. Threshold selection is done using a scree control.
- Data:Normalize Normalize numerical dimensions to have mean 0 and standard deviation of 1. The user is additionally given a list of "opt-out" checkboxes for each numeric input dimension. When the user checks a box, the operator will leave that dimension's values unchanged while normalizing the other non-checked dimensions.
- Reduce:MDS Perform multidimensional scaling. When the operator is activated the control measures the embedding stress for each possible output dimension. The user then selects the output dimensionality using a scree control. Avoid using on large (n>1000) datasets.
- Reduce:PCA Perform principal component analysis. User is presented with a sorted list of eigenvalues indicating the variance captured by the different components. The user then selects the output dimensionality using a scree control. By default, the principal component dimensions replace the numeric input dimension. Check the "append principal components" to override this behavior.
- View:Histogram Displays a view window with histogram plots of the data dimensions. The user can use a slider to control the number of bins in the histograms and the layout of the plots. Supports linked hilighting.
- View:SPLOM Displays a view window with a scatterplot matrix of the data dimensions. Supports linked hilighting. Note that if the splom windows are too small, it shows a matrix view of the correlation matrix instead. Cull dimensions to reduce the number of dimensions visible.
- View:CorrespondenceAnalysis Displays a view window showing the symmetric map layout of the principal coordinates of a correspondence analysis between two dimensions. KNOWN ISSUE: Label occlusion due to overlap.
- Filter:Value Permits filtering table rows (or points) by value. The operator control permits 3 kinds of filtering.
- Category The user can select a categorical dimension in the dimension list and then a categorical value in the value list. The operator filters all points that don't have the selected value.
- Numerical The control constructs a series of fixed-width bins and the user can select one of the bins to filter by in the value list. The operator filters all points that are outside of the bin. If there are no points in the bin, then no points are filtered (instead of all points being filtered). If the number of unique values expressed by the dimension is less than 100 then the user is given the option to filter by unique values of a numeric dimension. To do this, uncheck the "Use Bins" checkbox for the dimension.
- Selection If the user clicks the "Filter by selection attribute" checkbox, then the operator will filter all those points that are not selected either in a SPLOM or a histogram. If no points are selected then, instead of all points being filtered, then no points are filtered.
Operators can be "torn off" of the main control by clicking the "Tear off" button on the bottom of the control. Tearing off controls allows the control to persist onscreen when adjusting other operator controls. To return the control to the main window, click on the window-close button in the upper left corner of the window.
The following features are in the queue for addition to DimStiller:
- View PDFs Saving any view to a PDF
- Rich axis labelling improved user control over how axes labels and ticks are generated.
Please see the feedback section to request any desired operators to DimStiller.
DimStiller is designed to be extensible in that users can create their own operators. This section will eventually contain instructions on how to create and add operators.
- Fixed operator output.
- Maximum MDS embedding is 40.
- Added Procedural Expressions.
- Have color palette change only when forced by "permute colors" button.
- Added ability to freeze axis bounds on SPLOM operator.
- Changed tree selection rules to use reasonable defaults.
- Only enable operator menu when an input table or operator is selected.
- Switched from JAMA to jblas for the internal matrix library.
- Fixed a bug on the bubble chart where the ordering of the axis ticks was wrong.
- Fixed a color bug where the wrong colors were assigned in the bubble chart.
- Fixed Input Table modification.
- Added Bubble Plot to the Scatterplot (known issue: selection currently doesn't work with bubble plot)
- Added the ability to manually modify colors in the categorical color list.
- Added Correspondence Analysis Operator.
- Added vertical histogram labels.
- Updates cursor when processing occurs.
- Updated histogram axis labels to only use scientific notation when less than 100
- Updated SPLOM axis labels to only use scientific notation when less than 100
- Rotated y axis labels to be vertical.
- Added the ability to resize plots beyond the border of a frame using a checkbox in the view control.
- Fixed case-sensitivity bug in file loading.
- Fixed regular expression parsing to allow spaces.
- Added tear-off controls.
- Added non-binning option to the filter operator for numeric dimensions.
- Added ability to opt-out for certain dimensions in the normalization operator.
- Added option to append principal component dimensions.
- Fixed file importing bug on command line.
- Added labels to the Histogram operator.
- Added filtering control to the SPLOM operator.
- Changed the Expression save format.
- Fixed bug in Collect:Pearson control.
- Added labels to the SPLOM operator.
- Improved performance of Filter operator.
- Fixed plot display sizing bug.
- Added versioning command line argument check. (use -V on command line)
- Updated filter control to allow multiple item selections.
Please send feedback and requests to <sfingram (at) cs.ubc.ca>