UFinder: An interactive tool for data exploration

Jeanette Bautista, Micheline Manske
Department of Computer Science
University of British Columbia
2366 Main Mall
Vancouver, BC, V6T 1Z4, Canada
{bautista, manske}@cs.ubc.ca

Abstract

We present a proposal for an interface to interactively explore a data set consisting of 1300 colleges and universities ranked by 33 variables representing number of students, cost of tuition, SAT grades, etc. for the purpose of selecting a school to attend. This project may be extended to the more general problem of decision-support visualizations, namely how to design interactive visualizations to help chose one option from among many with differently weighted variables.

1     Domain, task, and dataset

Selecting a university is a difficult decision with many variables to consider such as cost of living, quality of the school and the requirements to get in. Humans are not very effective at reasoning about many variables at one time. We would like to design an interface that supports the task of selecting a college or university from among many choices. Our system is targeted at high school students, their parents, high school guidance councilors, and anyone else for whom ranking schools is an important task.

We have chosen as our test data the USNEWS data set taken from the 1995 U.S. News & World Report’s Guide to America’s Best Colleges. The data set is available from the StatLib website at http://lib.stat.cmu.edu/datasets/colleges.

The data set consists of information about 1300 colleges and universities in the United States for the 1993-94 school year. The variables for each school are listed in Table 1.

Variables for the USNEWS data set
College name % new students from top 25% of HS class
State Number of full-time undergraduates
Public/private indicator Number of part-time undergraduates
Average Math SAT score In-state tuition
Average Verbal SAT score Out-of-state tuition
Average Combined SAT score Room and board costs
Average ACT score (qualifying test) Room costs
First quartile Math SAT Board costs
Third quartile Math SAT Additional fees
First quartile Verbal SAT Estimated book costs
Third quartile Verbal SAT Estimated personal spending
First quartile ACT % of faculty with Ph.D.’s
Third quartile ACT % of faculty with terminal degrees
Number of applications received Student/faculty ratio
Number of applications accepted Instructional expenditure per student
Number of new students enrolled Graduation rate
% new students from top 10% of HS class  
Table 1: Variables for USNEWS data set

Examination of Table 1 reveals that there is an inherent hierarchy in the variables. One interpretation of this hierarchy is shown as a tree in Figure 1 (note that some nodes have been added which do not directly correspond to variables in the original data set). We wish to exploit the hierarchical nature of the data to reduce the perception of an overwhelming number of variables by selectively hiding lower level variables during exploration by the user.


Figure 1: USNEWS data set arranged hierarchically

2     Proposed Solution

Our solution for this decision-support task is allow the users to manipulate sliders to control the range of data points (schools) to be displayed, in the spirit of HomeFinder[1] and FilmFinder[2]. Because our data set contains many variables, and hence would require many sliders, we wish to make use of the hierarchical nature of the variables to arrange the sliders by general categories. For example, the users may wish to filter the range of schools based on the cost for one year of schooling, or they may wish to narrow this further and filter on the “cost of tuition” and “cost of living” measures directly.

A potential screen layout of the interface is shown in Figure 2. The bar on the right contains the sliders for filtering data points. The filtered schools will be displayed in the box on the upper left. When a data point in the display is clicked on, detailed information for that particular school is displayed in the box below.


Figure 2: Potential screen layout of interface

We are considering two options for the hierarchical sliders. The first are the version that are shown in Figure 2; using the standard windows treeview implementation. They are shown in more detail in Figure 3(a). The second option is based on the Treemaps approach [3], where the tree of variables (with a slider for each variable) is mapped to a box which uses all of the available screen space. The Treemap implementation is shown in Figure 3(b).


Figure 3: The two potential approaches for displaying sliders hierarchically. 3(a) The standard windows treeview 3(b) The Treemap approach.

The filtered data will initially be displayed as a scatterplot, with the variables on the axes determined by the user. One extension of our work may be to display the filtered data using ValueCharts[4], an interactive visualization for displaying and comparing options based on up to five weighted variables.

Another further extension may be to extend the interface to enable it to work with any data set that can be arranged as hierarchical trees with depth less than five.

3     Scenario of Use

A user wishes to attend a University or college in the United States uses UFinder to help make her decision. First she manipulates the sliders in the left-hand panel to pick a range of values for each of the variables; number of students, caliber of students, student/faculty ratio, costs. For example, she manipulates the sliders to indicate that she is interested in seeing any schools with a “Yearly cost” less than $7000. She chooses to display her results as a scatter plot of number of students vs. cost. 40 schools appear in the scatter plot. She clicks on a few to view more detailed information in the lower window. After selecting Arkansas Tech University, she notices that ”Tuition” is $1730 and “Living expenses” are $3650. Although this meets the criteria of a yearly cost less than $6000, she knows that she will want to pay more for an apartment, and hence will need to find a school with a lower tuition. She expands the “Yearly cost” slider to reveal the sliders for “Tuition” and “Living Expenses”. She refines the search to admit only schools with tuition less than $1500. She repeats this process until she has variable ranges that she is satisfied with, and choices of schools narrowed down to 5. She prints off the detailed information for each school to discuss with her parents later that evening.

4     Proposed implementation approach

The interface will be written in Java using the InfoVis Tookit[5]. The InfoVis Toolkit is an interactive graphics tookit which supports scatterplots, treemaps, and interactive sliders. Neither author has used the toolkit before.

As a possible extension mentioned in the Proposed Solution section, we may incorporate ValueCharts[4]. The code for ValueCharts is being provided by one of the authors (Giuseppe Carenini).

5     Milestones

Both authors have been in the position of having to make a major life decision to move from home to attend University in another city based on incomplete information, and this is what motivates us to consider this problem. However, neither author has experience with interface programming or the data set in question. As such, we have allocated time in our milestones to become familiar with the tools we will be using.

As well, we have not decided on the extension that we will apply to the initial design interface. We feel that the most relevant problems will become more obvious as we get into the project. Some possible ideas for extensions are to analyze the InfoVis Toolkit, to incorporate ValueCharts into the visualization, to carry out a small study to comparing our UFinder to similar decision-support visualizations, to extend the interface to allow users to compare two data points directly, to extend the interface to work with missing data fields, or to extend the design so that it can accept any generic data set with a hierarchical nature.

Date Milestone
March 7   •    Familiar with the InfoVis Toolkit.
  •    Meet with Giuseppe Carenini to discuss extensions of ValueCharts
March 15   •    Preliminary interface done (without functionality)
  •    Class presentation
March 29   •    Scatterplot working
  •    Extension chosen
April 5   •    Implementation complete
April 19   •    Extension complete

References

[1] Christopher Williamson, and Ben Shneiderman. The Dynamic HomeFinder: Evaluating dynamic queries in a real-estate information exploration system, In Proc. ACM SIGIR'92, pp338-346.
 
[2] Christopher Ahlberg and Ben Shneiderman. Visual Information Seeking: Tight Coupling of Dynamic Query Filters with Starfield Displays, In Proc. ACM CHI 1994, pp 313-317.
 
[3] Brian Johnson and Ben Shneiderman. Treemaps: a space-filling approach to the visualization of hierarchical in-formation structures. In Proc. of the 2nd International IEEE Visualization Conference, pages 284–291, October 1991.
 
[4] Giuseppe Carenini and John Lloyd. ValueCharts: Analyzing Linear Models Expressing Preferences and Evaluation. In publication.
 
[5] Jean-Daniel Fekete. The InfoVis Toolkit. Version 0.6alpha2, 2004. http://www.lri.fr/~fekete/InfovisToolkit/