CPSC 533C -
INFORMATION VISUALIZATION
PROJECT
PROPOSAL
Monday, March 1st. 2004
Team: Juan Gabriel Estrada Alvarez
(estrada@cs.ubc.ca)
DOMAIN
Heart disease is often one of the medical areas where doctors
spend much time - critical for any patient who might be affected by a severe
illness - trying to diagnose or narrow down the possibilities for what exactly
may be going on with a patient. This is in addition to diseases that may not
have yet been discovered, due precisely to this difficulty. One of the things
that most doctors would like to do in an easy - and not so cluttered - way is
to analyze time-series data corresponding to pulse intervals and/or blood
pressure. With an appropriate tool to do this, they might be able to perform
faster and more accurate diagnoses. Much study has been given to how Fourier or
Wavelet analysis can be used, but it is only until recently that new tools have
enabled researchers to gather enough data for time series long enough to do
further study. In this paper I propose to apply some techniques already in use
in InfoVis, particularly on clustering, to examine
some of the data recently taken from rabbits and rats. It shall be as an
alternative to Fourier or Wavelet analysis solutions, which are often hard to
interpret.
PERSONAL EXPERTISE
As part of my B. Sc. requirements, I worked on a
project whose goal was to develop a visualization of the data mentioned above.
In this project I became familiar with the data format used by the researchers
who initially posed the question of developing such a tool, to visualize the
thousands of time series they had recorded (and are still recording). As a
result I have several sources for data to further test. I have no previous experience
with clustering techniques. Similarly, I do not have a life sciences
background, save enough to be able to extract the data to be analyzed.
Notwithstanding, the data conversion code and the visualization proposed in the
past project might be useful for our current purposes.
THE SOLUTION
The solution will try to use clustering to try and
categorize the different time series available.
That is to say, the tool would ideally group the different types of time
series into clusters that correspond to the same "state of the
heart". Examining a cluster would then mean to examine the individual time
series contained in it. Upon examining a particular series, information about
its origing would be displayed and perhaps other
visualizations might be chosen for it (e.g. the 3d fractal terrain view that
was presented in the earlier project). At both the cluster and individual time
series levels, a querying tool like the one presented in the TimeSearcher application (Hochheiser,
H. Shneiderman, B. Visual Queries for Finding
Patterns in Time Series Data University of Maryland, Computer Science Dept.
Tech Report #CS-TR-4365, UMIACS-TR-2002-45) would be of utility to researchers
and allow comparison of time series that are not necessarily in the same
cluster. Given the lengths of the series available, the user would first be
presented with an overview on which she can zoom in as necessary (i.e. present
the data in different time scales). Finally, if and once a categorization can
be achieved, diagnosis would be realizable by providing a patient's recorded
time series as input to the tool and evaluating in which cluster it is
assigned.
SCENARIO OF USE
Under the current conception, the user would perform
the following steps in order to process data:
1) Load the data set from the file menu;
2) Select the kind of
clustering to be done on the data set from the clustering menu;
3) Once the clustering has been completed, the query
display will be updated showing an average time series per cluster;
4) The time series browser window will also be updated
and will display individual series;
5) The user may perform queries on the browser window
by using "time boxes" as defined in the TimeSearcher
application. Zooming on both the query window and the series browser is also
allowed. Which action is being carried out is specified by pressing an icon in
the lower left panel;
6) Upon placing a query (or timebox)
the query window is updated by showing only those cluster series that meet the
boundaries of the timebox. Similarly, the series
browser is updated to show only the time series in those clusters.
7) The user may refine queries by placing further timeboxes, which causes step 6 to be repeated;
8) The user could zoom in in
the query window on the resulting clusters. At this point, the series browser
might be refreshed to show the corresponding regions zoomed in. Zoom out would
have the same effect;
9) The user can now play with the categorizations and
determine if another kind of clustering might help instead.
Overview of
a possible arrangement of the application window:
Clockwise from upper left: i)
time series browser; ii) individual info display, if available; iii) list of
the series currently being displayed; iv) possible sliders in order to fine
tune queries and zooms; vi) utilities panel, where zoom, query and cluster
icons reside. Note that this example is a snap of the TimeSearcher
application running. TimeSearcher does not do the
kind of clustering that is our objective and so its utilities panel does not
meet our needs. This layout is currently only an idea and will most likely
change. In particular, we may need the query display to be much bigger, and the
series browser to be displayed only when requested.
The possible
clustering menu:
The user can select among clustering approaches (if
available) and the number of clusters to produce.
How timeboxes and zooming would work:
Each timebox refines further
the amount of clusters displayed. Starting from the right, each query filters
out more clusters until we arrive at the leftmost graph. Zooming is performed
similarly, but the boxes are light transparent red instead of blue and do not
filter out clusters.
IMPLEMENTATION
Existing code for the previous project's visualization
is written in Java 2, whereas the format converting code is written in Mathematica. Current plans would rewrite the code into Java
2 and Also make use of the Piccolo toolkit (for Java)
for zoomable interfaces. Some of the existing code in
TimeSearcher - namely the querying code - would be
useful as a base. The goal is that the tool will be hardware independent.
PHASES
·
Contact the authors of TimeSearcher for a possible use of their querying code.
·
Establish and implement the
clustering algorithm(s) to be used.
·
Implement visualization through zoom
able 2D time graphs. Integrate the querying tool to the main display graph.
(Depending on the response from the authors of TimeSearcher,
this might have to be implemented from scratch)
·
Implement the GUI.
·
Integrate the visualizations into
the display
·
Determine if inclusion of the
visualization from the previous project can be useful and proceed to integrate
it as an alternative view.