ClearEye: An Visualization System for Document Revision
 
Qiang Kong, Qixing Zheng

Department of Computer Science

University of British Columbia

201-2366 Main Mall

Vancouver BC Canada V6T 1Z4

 
Evaluation | Lessen Learned | Future Work | Conclusion | Acknowledgements | References
Abstract

A document can go through dozens of revisions before it is finalized. As a result, maintaining multiple versions and select between these versions can be very time consuming. In order to provide users with a quick access of multiple versions, we introduce a visualization system for document revision called ClearEye. We believe our system assists users visualizing the differences between two or more versions of a document on a high level, and at the same time, provides detailed comparisons between any two versions so that users can locate changes quickly and accurately. A preliminary taxonomy for designing visualization system for document revision in general is also discussed in paper.

Back to top
 

1. Introduction

Most documents go through a revision process before they are finalized. For example, an academic paper submitted for publication may undergo dozens of revisions. People often keep multiple versions of a single document in a directory instead of just maintaining one updated version. This is because they may need some valuable information that was previous deleted, or it is possible that they later decide to use previous versions. Another reason is that co-authorship is more and more common in academics and other professional fields. As a result, more versions of the same document are created in the revision phase, and they are kept to be used later on to create one or more final versions according to different publication needs.

The major problem with keeping multiple versions of the same document is how to maintain them and use appropriate ones when needed. People are not good at remembering the differences between versions. For instance, there are questions like whether a particular paragraph is inserted in version A or B, which paragraph in the document has been changed most dramatically, and what some new changes are in each revision compared to the one just before it. It gets even harder in multi-author situations. To answer the above questions, users usually need to open each version in the directory and compare them. This process can be extremely time-consuming and error-prone.

People have been using techniques and tools to visualize revisions. The most traditional way is to print out a hard copy of a document and use pens in different colours to edit the document so that the original document and the changes to the document are distinguishable. Some users that we interviewed in our informal user study told us that they actually preferred this traditional way of doing revision since writing on paper does not affect their trend of thoughts. On the other hand, many people choose to use the Track Changes function in MS Word to keep track of changes that have been made to a document. However, when there are a large number of changes made by multiple users, the document will be very hard to read because the annotated changes cannot match the precise change locations. Also, user cannot visualize multiple versions independently and map one to another.

In this paper, we describe a visualization system called ClearEye. It is a visualization tool for document revision. It assists users to visualize the differences between two or more versions of a document on a high level, and at the same time, provides detailed comparisons between any two versions so that users can locate changes quickly and accurately. One point worth noting is that because of the scope of our project, ClearEye is currently functioning just as a visualization tool, so it is not a version control system unlike other systems such as Cleacase[Cha96], which will be discussed in the next section. ClearEye is designed to provide a quick glance of multiple documents file requiring minimal user effort as well as acting like a graphical "diff" between any two documents.

The rest of this paper is organized as follows: In section 2, we describe related works. In Section 3, we describe the key features and components of ClearEye. Following this, in section 4 we present one detailed sample scenario that illustrates how ClearEye could be used. We describe in the next section two kinds of evaluations we have done. In section 6, we list the lessons we learnt from project implementation and our informal user evaluation of the system. Section 7 presents future work which is aimed to make ClearEye more practical. The final section summaries some main points and concludes with a preliminary taxonomy of designing visualization systems for document revision in general.

Back to top

 

 

2. Related Works

Despite the fact that many visualization systems have been built for software development, they are mainly designed with modules of codes in mind rather than just plain documents. There are actually substantial differences between visualizations for document revision and software version control. In software visualization, it is more important to represent information on the execution of the program and other program statistics such as age and functionality; whereas in document revision, we need to focus on and represent all changes. Also, sometimes the location of a segment of code maybe flexible since it does not affect the overall running of the program. However, paragraph arrangements in a document are critical to the logic and flow of the document.

Another important difference between software development and document revision is that code editing often consists of large chunks of local changes while document revision usually contains many small global changes that scatter over the entire document. Current software visualization tools do not perform very well with large numbers of small global changes. We now talk about couple of these tools and point out their weaknesses if used as visualization tools for document revision.

Apple's FileMerge application

FileMerge[Mal03] is a GUI based file comparison and merging program that utilizes the diff command in UNIX. It is more user friendly than the common Concurrent Versions System (CVS). Often times, the major complaint about CVS is that it is hard to use since users need to remember the exact command to issue an action. FileMerge represents the differences between two files graphically such as using arrows to encode the change directions. However, FileMerge does not provide an overview of compared files.

Clearcase

Clearcase[Cha96] uses a tree display to represent version evolution, which shows how each version branches out from the master version and the points that it merges back to the master version. The visualization system does a good job in representing the development of the software, but it does not provide functionality to compare multiple versions in parallel. However, parallel comparisons are very helpful when viewing the structure changes in documents.

SeeSoft

SeeSoft[ESS92] visualizes text files by mapping each line into a thin row, colored according to a statistic of interest. For example, it uses colour to represent the programmer, age, or functionality of each line. SeeSoft not only provides file overview but also allows users to choose specific files to compare in detail. xxdiff[Mar00] uses SeeSoft-like color bar to give the user a global view of the two documents being compared and use different foreground and background colors of the text to encode the difference of the two documents. However, they still cannot visually encode any document evolution information.

HistoryFlow

Our project is largely inspired by the current research work at IBM's Collaborative User Experience Research Group called HistoryFlow[MaF03]. It has some visual similarities to ThemeRiver[HHW02]. It employs parallel coordinates to provide a clear view of complex records of contributions and collaborations on group-authored texts. Just at a glance, user can answer questions like "Is the text's evolution marked by spurts of intense revision activity or does it reflect a smooth transition from its beginning to present?"[MaF03] Although the simplicity of HistoryFlow is appealing, it does not embed quantitative information about each revision, and more importantly, it does not provide content level comparison, so users still need to open particular files to find the changes.

ClearEye extends History Flow in three major ways. First, we implement an overview option called Recent Change View, which not only highlights the recent changes of each version compared to its immediate predecessor but also provides a quantitative change report. Second, we added a content comparison view to give users the option to locate and view detailed changes in a single version. Finally, we provide users with colour customization tool so that they can choose their preferred and comfortable colours when using our system.

Back to top

 

 

3. ClearEye

ClearEye consists of two levels of views: an overview that displays the evolution of a series of documents and a content comparison view of any two versions. Each view employs a set of visualization techniques to ensure that the information provided to user would be as salient as possible.

3.1 Level One C overview of multiple versions

This display is intended to provide user an overview of the documents evolution. It has two alternative view choices, mapping view and recent change view. In the mapping view, which is shown in Figure 1, user can easily tell when and which part of the document has undergone a significant change as well as how the structure of the document is changed over multiple versions. While in the recent change view shown in Figure 2, user can quickly find out the summary of changes of any paragraph in the document compared to the version just before it by simply spotting the colored segments.

Figure 1
Figure 2

3.1.1 Parallel-Coordinate-like Layout.

The main technique used in the layout of this level is parallel coordinate, in which each data dimension is represented as a vertical axis, and parallel coordinate schemes plot all the axes parallel to each other in a plane. "A data element in an N-dimensional space is mapped to a polyline that traverses across all of the axes crossing each axis at a position proportional to its value for that dimension."[FRW99] It is especially useful in multivariate or high-dimensional systems, and it is the case in the ClearEye.

However, we made some modifications to the original parallel coordinate scheme to make it suitable for multi-documents visualization. In ClearEye, each axis represents a single version of a document, where the first version lies in the leftmost position of the view panel while the last version is in the rightmost position. The length of each of the axes is proportional to the length of the document. A segment in each of the axes represents a paragraph in the document, and the segment length is proportional to the length of the paragraph.

In order to show how one version evolves to another, we map semantically unchanged paragraphs in two consecutive versions by connecting the segments in vertical axes that represent them. This mapping provides users a way to observe how a particular paragraph in the document changes throughout each revision.

3.1.2 Color

Although there is no other perceptual channel alone is as salient as spatial position, hue is the second best perceptual cue for encoding nominal data[Tuf91]. Besides using colours as far apart as possible, we also try to avoid "excessive exuberance"[Tuf91], since large area in the visualization will be colour coded. Therefore, we chose 8 dim colours which were used in [Mun00] to encode the different paragraphs and their mappings between versions of a document. The eight colours we used are 45 degree apart on the HSB color wheel.

We also employed the bright versions of these eight colours to highlight a single mapping when users put their mouse over the area. However, users just putting their mouse on a single segment in a axis, only the edges of the segment will be highlighted. We then used VisCheck to ensure that the colour choices in ClearEye is also suitable for users with colour deficiency.

3.1.3 User Interaction

As mentioned above, one overview choice in level one is the mapping view. To avoid the potential occlusion incurred by the semantically mappings between paragraphs, we use mouse over highlighting to indicate the paragraph of interest. For example, the purple paragraph is highlighted Figure 3.

Figure 3
Figure 4

The other overview choice is the recent change view, in which users can get detailed change reports by hovering mouse over the colored segments, shown in Figure 4. This mouse-over-pop-up technique reduces information density presented to users at one time.

3.2 Level Two - Content view of two versions

This level gives user the detailed change information by putting two versions side by side. This view is intended to help users quickly discover the differences and their locations between the two versions.

3.2.1 Overview+Detail

While providing users with the contents of the two documents, ClearEye also employs two colored bars (Figure 5) to give overview information of the changes between the two documents. Each rectangle in the colored bar represents a change, and the color of the rectangle represents the change type. The spatial position of the rectangle with in the overview bar corresponds to the actual change location in the document.

3.2.2 Colour

In order to help users be more focused on changes in the documents, we grayed out all the unchanged text and shown as background so that the changes within the documents are able to stand out in the foreground.

We chose three colours from the previous eight colours as the default colours to encode the change types in both foreground of the text and the rectangles in the colored overview bar. ClearEye also allows users to customize the encoding colours for change types (Figure 6).

Figure 5
Figure 6
Back to top

 

 

4. Scenario of Use

We now provide a sample use of our system which includes exploring all the functionalities mentioned in the previous section. However, it is possible, maybe even common, that users just use ClearEye for an overview of multiple documents or use it to find the differences between two versions of a document.

Jane, Mike, Mary, Kevin and Gill are working on their final project report. Jane has written a draft. She sent it to Mike to revise first and then Mary, Kevin and Gill in turn. This is a sequential revision process. Finally, Jane receives all the revisions from her co-workers.

She wants to see what changes have been made on each revision, so she inputs all six versions including her own draft of the final report into ClearEye. Figure 7 is what she sees. First she noticed that some middle paragraphs were left mostly the same compared to her original draft. However, the position of paragraph seven has been edited the most. Mike wants to put it after paragraph two, but Mary wants to put it before paragraph two. Also, both of them have made a considerable number of changes in paragraph seven. Thus, Jane wants to compare Mike's revision to Mary's in detail. She chooses version two and three and clicks "Compare". Figure 8 shows what Jane sees. Then, Jane wants to see what new changes were made in each version. She goes back to the overview and chooses "Recent Change View". Figure 9 shows what the recent change view. When Jane hovers her mouse onto any paragraph with colour, she sees the summary of changes for this paragraph compared to the previous version.

Figure 7
Figure 8

Figure 9

Back to top

 

 

5. Implementation

ClearEye was implemented using Java from scratch. There are three main building blocks for the system, which contain approximate 3,000 lines of codes in total.

  • The algorithm we used to match sections of two document revisions is based on Heckel's[Hec78] technique for isolating differences between files. However, one limitation of this technique is that very small changes such as an addition of a single comma will result in a mismatch between two sentences. We overcame this limitation by setting a threshold for changes, so tiny changes will not result in a mismatch in the overview.
  • Both the parallel coordinates and mappings between paragraphs in the overview and content view for two compared documents are implemented using Java 2D graphics.
  • We used Java Swing and Fireworks to implement the graphical user interface (GUI) of our system. All the button icons were created by authors to give ClearEye a unified and symbolic look.
Back to top

 

 

6. Evaluation

We did two types of evaluations. First, we evaluated the system's scalability. Second, we went through an informal user evaluation with six targeted users. We found both strengths and weaknesses of ClearEye. Some weaknesses are due to the time limitation of technical implementation. In the future work section, we will talk more about how to overcome these weaknesses.

6.1 System Evaluation

We evaluated the system's scalability in the first level view from two perspectives. First, we tested our system using large number of versions. Figure 10 shows a snapshot of 20 versions. Second, we tested our system with long documents. Figure 11 shows a result of a 20 pages document. Unfortunately, both results are not satisfactory.

Figure 10
Figure 11

When there are many versions in the display, the mappings between versions decrease dramatically in width. It can reach to the point that users will not be able to tell the differences between individual versions and their mappings since they almost have the same width.

In the second situation, when the documents get longer and have more paragraphs, the segments in the visualization that represent paragraphs decrease quickly in length. It can reach to the point that each paragraph looks like a thin line, then it is almost impossible to visualize its evolution over time.

6.2 User Evaluation

We did an informal user evaluation of our system. In each evaluation session, we first interviewed each user about their current methods of handling revisions, whether they are using any visualization system, and their collaborative writing activities. Then, we asked them to select several documents to input into ClearEye and tell us what they can conclude from the overview in level one. Next, we let them arbitrarily choose two versions to compare in detail and tell us all the location and content of each change. We concluded our evaluation by another interview asking about their opinions on ClearEye.

Among the six users, no one has used any visualization tool that can monitor changes within multiple versions of a document because they do not know one or do not want to spend extra effort to learn how to use a new tool. Almost all of them have used Track Changes before, and they agree that it is not perfect but better than nothing.

The following are some strengths and weaknesses of the system that are commented by our test users:

6.2.1 Strengths

Accessibility

Accessibility is one of the major features of ClearEye that users commented on. They felt the mappings between versions of documents are easy to understand and follow. Users do not need to go through extra training to use the visualization system so that they can imagine themselves using it often in practice.

Reduced Information Density

In both multiple-version overview and content detail view of our system, the interesting information is highlighted which help users focus on the changes and not overwhelmed by the remaining information. This is especially important in long documents since the changes maybe just a tiny fraction of the entire document.

Easy Colour Customization

All users discovered the customizing colour option during the 30 minutes of their user evaluations. They expressed their preferences over this function since they felt they have more control of how the visualizations look. The customization function is very standard and easy to use.

Finding Detailed Changes Efficiently

One of the major attractions of ClearEye is that in the second content view level, user has an overview bar to quickly find change locations within a document. Then the colour coded changes provide information on the change type and change content between the two documents under comparison. Efficiency is critical when there are many revisions, and each has many changes.

6.2.2 Weaknesses

Just like any other visualization system, ClearEye has its weaknesses, too.

More Direct Interaction

Although ClearEye provides a certain level of user interaction on the interface, it lacks some direct and intuitive interaction techniques. For instance, when users want to compare two versions in detail, the current interface requires users to go to the drop down menus and choose two versions. However, a more straightforward and natural way to accomplish selection is to just select the corresponding versions in the visualization so that users do not need to go back and forth between the visualization and thecontrol panel.

Extra Visualization

This is more of a debate point since users have split opinions about using multiple colours to encode different paragraphs or just using two colors such as white and blue. Some users think that using eight colors may be overkill and confusing since users are trying to figure out whether there are any additional meanings behind each colour.

Another instance of using extra visualization is that some users do not understand the purpose of using two overview bars in the content view. They claim it will be sufficient to use just use one overview bar which encodes all the change information between two versions.

Back to top

 

 

7. Lesson learned

Throughout this course project, we learnt the following things:

  • One limitation of our user testing is that the sample files we used in the evaluation are not the files that users are familiar with, so that the mapping of changes from one version to another does not help users create any visual memory of the actual documents. In future formal user studies, getting users familiar with visualization files first is one important point to consider.
  • Use real data to test. We used many manually created files to test our system in order to see all the functionalities. However, we only discovered scalability problems when testing with real data, and because of the time limitation, we could not overcome this problem in the project. In future research, it is a good idea to include real data testing earlier in the process.
  • Choosing the appropriate number of colours is hard, as people have individual preferences. Leaving it for users to customize is an option worth to consider.
Back to top

 

 

8. Future Work

We would like to improve ClearEye in the following ways:

  • Overcome scalability limitations: using some Focus+Contxt techniques such as Fisheye Lens to distort part of the documents and focus on segments of interests. This technique can be used for both types of scalability problems mentioned earlier.
  • Document merge: our system is for visualizing sequential document revision process, in which a group member writes a draft of the document first, which is then sent to other group members in turn. However, another collaborative writing behavior is that each group member writes part of the document. In the end, all the parts are merged together. We are also thinking to extend our system to incorporate this kind of writing behavior, which needs to include an intelligent text editing function in the visualization.
  • Support more direct interactions:
  • Comparing two versions can be accomplished by directly selecting two axes in the visualization.

    In level two, when user clicks on the squares in the overview bars, the system will automatically take the user to the specific change location in the document.
Back to top

 

 

9. Conclusion

We have developed the ClearEye visualization system for document revision. The system supports user exploration on various versions of a document to see the change patterns and the evolution history of these versions. The system also supports detailed comparison between any two versions of user's choice to help his/her locate changes in the document with ease. Throughout the development of our project, we have discovered several characteristics that we feel a successful visualization tool for document revision needs to have. These requirements are also the design guidelines for ClearEye.

  • The system should be highly accessible. This means that the system should be GUI based and users do not need to remember commands to issue any operation.
  • The system should provide an overview of multiple revisions of a document so that users can see the overall change patterns and the evolution of the document. This helps users to keep a visual map of these revisions and pick out the versions that they are most interested in.
  • The system should also provide a detailed content comparison of any two versions so that users can view the content and location of each change with ease.
  • In order to help users locate specific changes within documents, an overview of all changes in the document should be provided in the content level view to keep users oriented.
  • The system should scale well with respect to the number of versions and the length of documents.
Back to top

 

 

Acknowledgement

We thank all the users who participated in our informal user study. We also thank Dr. Tamara Munzner for her helpful suggestions on the possible improvements and future work of the system. Finally, we thank Yuhan Cai for his critiques on this paper.

 

 

References
[Cha96] http://www-cad.eecs.berkeley.edu/HomePages/fchan/research/clearcase.html
[ESS92] Eick, S. G., Steffen, J. L., and Sumner, E. E. SeeSoft -- a tool for visualizing line oriented software statistics . IEEE Trans. Software Eng., 18(11):957-68, 1992.
[FRW99] Y. H. Fua, E. A. Rundensteiner, M. O. Ward, Hierarchical parallel coordinates
[Hec78] Paul Heckel, A technique for isolating differences between files. Communications of the ACM Volume 21 , Issue 4 (April 1978) Pages: 264 - 268
[HHW02] Susan Havre, Elizabeth Hetzler, Paul Whitney, Lucy Nowell. ThemeRiver: Visualizing Thematic Changes in Large Document Collections. IEEE Transactions on Visualization and Computer Graphics Volume 8 , Issue 1 (January 2002) Pages: 9 - 20
[Mar00] http://xxdiff.sourceforge.net/
[MaF03] http://researchweb.watson.ibm.com/history/
[Mal03] http://www.macdevcenter.com/pub/a/mac/2003/08/08/version_control_two.html
[Mun00] Tamara Munzner, Interactive Visualization of Large Graphs and Networks (PhD thesis) Chapter 5, Stanford University, 2000, pp 87-122
[Tuf91] Edward Tufte. Envisioning Information. Graphics Press, 1991.
 

Back to top

 

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top

Back to text | Back to top