Back to
Home Page

Lightweight Structural Summarization as an Aid to Software Evolution

Gail C. Murphy
Ph.D. Thesis, University of Washington, July 1996

Abstract

To effectively perform a change to an existing software system, a software engineer needs to have some understanding of the structure of the system. All too often, though, an engineer must proceed to change a system without sufficient structural information because existing software understanding techniques are unable to help the engineer acquire the desired knowledge within the time and cost constraints specified for the task.

The thesis of this research is that an approach based on summarization can overcome the limitations associated with existing approaches, enabling an engineer to assess, plan, and execute changes to a software system more effectively. Summarization involves the production of overviews of vast amounts of user-selected information in a timely manner. I describe two techniques developed to support the summarization approach. The first technique, the software reflexion model technique, enables an engineer to summarize selected structural information in the context of a task-specific high-level model. The second technique, the lexical source model extraction technique, supports the summarization process by facilitating the scanning and analysis of system artifacts for structural information that is difficult or impossible to extract at low cost using existing approaches. Each of these techniques is lightweight and iterative: the engineer is able to quickly and easily gain access to partial and approximate structural information, and may then balance the completeness and accuracy of the information needed with the cost of further applying the technique. I demonstrate the viability of the approach by describing its use on a variety of change tasks and systems, including the use of the reflexion model technique by an engineer at Microsoft Corporation to aid with an experimental reengineering of the million-line Excel spreadsheet product.

Availability

The dissertation is available in three forms:

  • a PDF file of the complete dissertation.
  • A gzipped Postscript file of the complete dissertation set up for double-sided printing. This file is quite large when uncompressed (around 800K compressed and 10Mb uncompressed).
  • So, the dissertation is also available in some smaller chunks:
    • (part A) preliminary pages and the introduction (around 250K compressed and 550K uncompressed),
    • (part B) the next four chapters about Software Reflexion Models (around 280K compressed and 4Mb uncompressed),
    • (part C) the next three chapters about the lexical source model extraction technique (around 115K compressed and 500K compressed), and
    • (part D) validation, related work, the conclusion, bibliography, and appendices (around 160K compressed and 630K uncompressed).

Slides from my interview talk are also available.


Last modified: April 06, 2004

Gail Murphy
murphy@cs.ubc.ca