University of British Columbia
For this project I propose to visualize log data collected from the Eclipse development environment for the purpose of analysis. This data was collected through a user study run by Mik Kersten and Gail Murphy on a project called Mylar. Mylar was released to the Eclipse community of users and their usage data was collected to determine how both Eclipse and Mylar were used. For each of the 75+ users in the study, up to 4 months of usage data could have been collected. The different types of events that were logged are selections, edits, command invocations and preference changes. The data set that needs to be analyzed for each user can be very large since many of these events are at keystroke or mouse-click level such as edit and selection events. Personally, I have knowledge of this data since I was developing Mylar with Mik Kersten and Gail Murphy for eight months part-time and four months full-time as an RAship.
The task that I propose to assist with is the analysis of the log files to find usage patterns for Eclipse and Mylar. In the end, it would be ideal to be able to show multiple users' data, but this is beyond the scope of the course and could be easily done once the visualization of one user is done. Some of the tasks that should be supported are finding sequences of events, displaying the usage of Mylar and filtering to show the frequency of use of a single command or sequence of commands. In the final implementation of this system, it should support displaying multiple users' data for comparison to find similarities, but this is beyond the scope of this course project. In this project, only exploring a single users' data will be supported.
This log data is in an unstructured XML format with a node representing each event that was generated by the system. Each event that is logged the following attributes associated with it: a type, start and end date and time, origin, interest contribution, kind and navigation. The object that was acted upon is also recorded, but it will be of little relevance since it has been obfuscated for privacy purposes and it is thought that this piece of data would not provide any useful information. This data is not only useful for the purpose of the Mylar user study , but it provides general usage data for Eclipse which has yet to be explored. A parser already exists for this data, so it can already be read into objects and reports generated, but these reports do not provide low level detail about each of the events.
Visualization of this information is difficult due to the vast amounts of data produced through the use of logging every interaction a user has with Eclipse. I propose that to overcome this problem there needs to be the ability to drill down into an overview of the data as well as have a focus plus context model when showing a timeline of the data for a subset of the information. It is unrealistic to show all of the data in a single timeline since there is so much (recorded to the thousandth of a second), so by using overview plus detail and focus plus context to aggregate the data further, analysis will be more manageable. Since the times of all of the interaction events are recorded, I am able to split this data up into daily values as well as place the data on a proper timeline.
For the interface, I propose to have a persistent overview calendar and a dynamic timeline linked to the selected day. The overview calendar will display a simple bar graph of the usage of each of the types of events or be grayed out if there was not data for that day. By selecting a day, the dynamic timeline will become active. This timeline will have time on the horizontal axis and the type of interaction on the vertical axis so that it is simple to locate a single type of interaction event to inspect. Since even for a single day there can be a lot of data, to save screen real estate, a bifocal display will be used so that users will be able to closely inspect small areas of data without having to use scroll bars. Also, there will be a persistent control panel that allows the users to filter or search for sequences of events.
The following is an image of the proposed solution:
NOTE: The colors displayed in this design proposal are not indicative of what will be used in the final version of the project.
Upon opening the visualization view, the user will import the log data that they wish to analyze into the system. From there, the navigation tree on the side will be populated representing each of the participants whose logs were parsed during the import. From here, the user will choose the study participant that they wish to examine using the navigation tree. Upon selecting the participant, the overview calendar visualization would be populated for display. From here, the user can either drill down into the details of a single day, filter events or specify sequences of events to find. To specify a sequence of events in the participants data, the user would enter either a partial or full text representation of the event into a text box and add it to the list. As these events are added to the sequence, both the overview and the timeline (if populated) will reflect the presence (or absence) of the specified sequence through the use of highlighting. To drill down into daily details, the user will select the day that they wish to view from the calendar. From here, the user will be able to quickly locate a specified sequence of events visually due to highlighting, and is able to navigate through the data using the perspective wall lens. The user can also filter out some events or types of events. This will be done by specifying a partial or full text value for that information and adding it to the filter list. Both the overview and the timeline are updated when these events are filtered to reflect the change in data. After any of these actions are done, the user will be able to modify filters, sequences and day selected, and continue to investigate the data further.
I propose to implement this visualization using Java due to my experience using this language, as well, it is used for the development of Eclipse, meaning that this visualization can be integrated into the system. Also, using Java means that this visualization will run on any platform. I plan to integrate this within Eclipse so that the data can be easily extracted as there is already a parser for the log data built into a plug-in. I intend to use a software toolkit to aid in development such as prefuse or the infovis toolkit, but I am still undecided as to which one would be most useful for the visualization that I am doing as they both have their strengths.
November 16, 2005
I will have a basic timeline displaying all of the interaction data for a single user. This timeline will not have any special features, but events will be aggregated into categories for display in different dimensions. Also, by this time I hope to have the architecture designed so that the data is properly stored for quick and easy access for all tasks.
December 1, 2005
I will have the calendar overview created along with the linking to the timeline for a detailed view of the selected day. At this point I will also have the perspective wall added to the timeline so that scrollbars can be eliminated and a focus plus context model is used instead.
December 10, 2005
I will add filtering and the ability to specify sequences of data to find and display with some sort of highlighting. Also, I would like to have a way of combining multiple logs from a single user as well as supporting switching quickly and easily between users' data to display if time permits.
December 16, 2005
I will have the programming portion of the project completed so the final report can be written before the final presentations. This means that the color scheme has been solidified so that it is accessible to all users and all interaction support is added.
December 19, 2005
I will have the report and presentation completed for presentation on this day.