Project Proposal
Friday November 04th 2005, 2:15 am
Filed under: Project


Individual Project by: J. Karen Parker, parker AT cs.ubc.ca

Quick Links:
Domain, Task & Dataset
Personal Expertise
Proposed Solution
Scenario of Use
Implementation Approach
Milestones
References

DOMAIN, TASK & DATASET

Domain

In the field of Human-Computer Interaction (HCI), data analysis can be a complex problem. Researchers often use a combination of logging and qualitative methods to record data, resulting in a daunting amount of information which must be sifted through in order to do a thorough analysis.

This data overload is a common problem, as evidenced by a recent workshop at CHI 2005 which aimed to tackle the problems associated with combining data logging and qualitative methods [1]. The workshop organizers identified several key tasks which are difficult to achieve with existing data analysis solutions: detecting patterns in behavior, comparing patterns between users, and combining patterns in behavior with qualitative data [1].

Hilbert and Redmiles suggest, among other things, “[visualizing] the results of transformations and analyses of event streams so they can be explored with more ease” [2]. So, some form of information visualization may be able to aid HCI researchers in their data analysis problems.

Previous work in visualizing complex systems may provide some insight into the HCI log data problem. Bosch et al. state that “because of the complexity of computer systems, the analysis process is a highly unpredictable and iterative one: an initial look at the data often ends up raising more questions than it answers” [3]. This is also true for HCI log data.

Task: High Level Description

In trying to create a solution that is good for everybody, one may end up with a solution that suits nobody terribly well. In an attempt to avoid this pitfall we will take a somewhat narrow approach to visualizing HCI log data, focussing solely on the visualization of web browsing behaviour.

Most previous InfoVis research into web browsing has been concerned primarily with representing users’ navigation through websites. The main aims of this type of visualization include improving website usability [4], and characterizing how users navigate complex information spaces [5].

Recent research into web browsing has begun to examine user behaviour at a much deeper level than simple navigation. In particular, several researchers at Dalhousie University are conducting examining web browsing behaviour by logging low-level browser events in order to gain information about users’ web browsing habits [6][7]. By combining browser events with user provided data (e.g. the task a user was trying to accomplish when they were at a particular page), these researchers hope to reveal interesting trends and patterns in web browsing behaviour.

Dataset

The first dataset we will support is from a research study in which participants were asked to rate the “privacy level” of each page they visited over a week-long period [6]. The following information was logged for EVERY individual page a user visited during that week:

- browser window ID
- date
- time
- page title
- url
- primary content category (approx. 40 categories)
- secondary content category (approx. 50 categories)
- privacy level (4 categories)
- location (home/work/school)
- computer type (consistent for each participant)

Task: Low Level Description

Some basic tasks that the owner of this data would like to be able to accomplish include:
- examine privacy level changes within rapid bursts of browsing
- find temporal patterns
- see how browsing is partitioned between windows.
- see how transitions between privacy levels relate to the content
- filter on specific variables

[back to top]

PERSONAL EXPERTISE

While web browsing is not my particular area of interest, I am an HCI researcher and am keenly aware of the need for better analysis tools in our field. Furthermore, the research this project aims to support is being conducted by two friends from Dalhousie University. (We were all members of the same research lab when I was doing my Masters there.) I’m thrilled that something I do in a course project might actually help them in their doctoral data analyses!

[back to top]

PROPOSED INFOVIS SOLUTION

Event ordering is a very important component of our target dataset, thus a time-series display is the obvious solution. At the simplest level, we want to represent each currently open browser window on a timeline. Then, using visual attributes such as colour and markings on each “window”, we can indicate the various events and attributes of a given window at any given time. The resulting visualization is a sort of web browsing Gantt chart. While Gantt charts show “a graphical representation of the duration of tasks against the progression of time” [8], our solution will show a graphical representation of web browser windows against the progression of time.

The dataset we are using already has an established colour scheme which corresponds to user-assigned privacy level (blue for “don’t save”, green for “public”, yellow for “semi-public”, and red for “private”). We will maintain this colour scheme in our solution, colouring each page in a window to show its privacy level. Further markings on each window will indicate important attributes such as category and location. These markings may be turned on/off by the viewer. Due to the large number of pages in the dataset, screen real estate for each individual page will be quite limited, so meta information such as URL and page title will be available only as a popup on mouseover.

The viewer will also be able to control the width of the timeline. In some cases, they may want to view the data in true temporal time, while in other cases they may only be interested in sequence information. In addition to being able to switch between a “real timeline” and “sequential timeline”, a focus+context technique will be provided, allowing the viewer to stretch a particular section (or sections) of data along the horizontal axis. This technique will help viewers get a closer look at a particular area of the dataset, or compare two distant areas side-by-side.

Note that while we are explicitly identifying attributes of the privacy study data to which we will assign various visualizations in this project, it is expected that the same visualization techniques could be used for other studies of web browsing behaviour with different attributes.

[back to top]

SCENARIO OF USE

(Click on an image to pop up a window containing a larger version )

1. Kirstie wants to see an overview of the data for one of her users, so she loads the data file into out InfoVis tool. She is presented with a temporal representation of all the windows that user browsed, and the colour-coded privacy levels for each page in each window:

2. She sees some interesting privacy patterns in the data from the second half of the day on 01/01/05 so she zooms in on that time period:

.

3. She sees an anomalous red marked-area (private) in the middle of a window that is otherwise completely semi-public (yellow), so she turns on the “page change” markers to see when the user changes pages:

4. She mouses over the area (which she is now able to distinguish as a single red page) to get further details about it:

[back to top]

IMPLEMENTATION APPROACH

We’ll be using Java as our development language. We chose Java because it is multi-platform and we would like to be able to support data analysis tasks on Windows/Mac/Unix. We own a Mac and our target end user owns a Windows box, so at the very least our software it has to work on both of those platforms.

At the moment, we do not plan to use a toolkit (i.e. Prefuse or InfoVis toolkit). However, we’d like to do a bit more research on what these two toolkits offer before completely ruling them out.

[back to top]


MILESTONES

Week of November 6th: Do some more research on toolkits and such. Create paper prototype. Meet with end user to test prototype and collect more info on desired tasks.

Week of November 13th Start playing around in Eclipse and programming bits and pieces of the interface.

Week of November 20th Code code code.

Week of November 27th Code code code, hopefully have a mostly final version by the end of this week.

Week of December 4th Demo software for end user and get feedback. Mark CPSC 344 exams. ;)

Week of December 11th Refine software based on end user feedback. Start writing report.

Week of December 18th Finish report, present project in class, and hand in report.

[back to top]

References

[1]Usage Analysis: Combining Logging and Qualitative Methods, Kort, J., de Poot, H., CHI2005 Workshops, April 2-7, 2005, Portland, Oregon, USA.

[2] D. Hilbert, and D.F. Redmiles, “Extracting Usability Information from User Interface Events,” ACM Computing Surveys, Dec. 2000, pp. 384-421.

[3] Rivet: A Flexible Environment for Computer Systems Visualization Robert Bosch, Chris Stolte, Diane Tang, John Gerth, Mendel Rosenblum, and Pat Hanrahan. Computer Graphics, February 2000.

[4] Jason I. Hong, and James A. Landay, “WebQuilt: A Framework for Capturing and Visualizing the Web Experience.” In Proceedings of The Tenth International World Wide Web Conference (WWW10), Hong Kong, May 2001, pp. 717-724.

[5] Berendt, B. & Brenstein, E. (2001). Visualizing Individual Differences in Web Navigation: STRATDYN, a Tool for Analyzing Navigation Patterns. Behavior Research Methods, Instruments, & Computers, 33, 243-257.

[6] Hawkey, K. and Inkpen, K.M. (2005) Privacy gradients: exploring ways to manage incidental information during co-located collaboration. (Late Breaking Results: Short Papers) in Extended Abstracts of the Conference on Human Factors in Computing Systems (CHI 2005). Portland, OR, USA. pp. 1431 - 1434.

[7] Kellar, M. & Watters, C. (2005). Studying User Behaviour on the Web: Methods and Challenges. CHI 2005 Workshop on Usage Analysis: Combining Logging and Qualitative Methods, Portland, OR.

[8]“Gantt Charts.” Website. http://www.ganttcharts.com. Accessed Friday, November 4th, 2005.



Assignment 1: Visualization Critique
Wednesday September 14th 2005, 11:20 pm
Filed under: Assignments

Bad Visualization

Map of 2004 UK Election Results [link]

This map of the UK attempts to visually represent the results of the 2004 UK elections by showing which party won which constituency.

The map itself is not labelled, and no lines are drawn between constituencies, so one cannot easily match up the list of constituencies along the righthand side with their corresponding location on the map. Users have to click on a constituency or area of the map to link up this information. Also, the “scoreboard” legend does not match up with the map. There are three
parties listed in the legend, along with “other”, however there are some decidedly large areas of colour on the map (e.g. the green, brown, and bright yellow areas that) that would seem to warrant their own entries in the legend - yet they are missing. The color scheme used to mark the map means that we can see which party won each area, but not by how much, so some information is lost here as well.

  • Clarity: Good (simplicity, color differentiation, but missing information)
  • Accessibility: Poor (can’t see much information at a glance)
  • Accuracy: Poor (can see who won, but not by how much)
  • Good Visualization

    CBC Archives Timeline [link]

    The CBC Archives website contains information about various newsworthy people and events from Canada’s history. This interactive timeline allows users to browse the archived items both chronologically and by topic.

    This is a simple but effective visualization. With this one image, users are able to clearly see the full collection of items in the CBC archives and ascertain the topic and time period of each item. The timeline along the top - coupled with the position of each title along this timeline - gives some indication of chronology for each item (although it does not provide detailed date information). Additionally, the colours of the bars behind each title, when viewed with the simple legend, allow readers to clearly see the topic of each item at a glance. The scheme even allows for multiple topics assigned to one item, as scene by the two-toned bars behind some items.

  • Clarity: Great (information is easily understood)
  • Accessibility: Great (don’t need to work too hard to findinfo)
  • Accuracy: Good (timeline info could be improved)