Lightweight Lexical Source Model Extraction

Gail C. Murphy and David Notkin

ACM Transactions on Software Engineering and Methodology, vol. 5, no. 3, July 1996, p. 262-292.

This is a longer version of a paper that appeared in FSE '95.

The longer version is currently available only by request. Please send e-mail to if you would like a copy.

The earlier conference version is available: [FSE '95 Postscript Version]


Software engineers maintaining an existing software system often depend on the mechanized extraction of information from system artifacts. Some useful kinds of information---source models---are well-known: call graphs, file dependences, etc. Predicting every kind of source model that a software engineer may need is impossible. We have developed a lightweight approach for generating flexible and tolerant source model extractors from lexical specifications. The approach is lightweight in that the specifications are relatively small and easy to write. It is flexible in that there are few constraints on the kinds of artifacts from which source models are extracted (e.g., we can extract from source code, structured data files, documentation, etc.). It is tolerant in that there are few constraints on the condition of the artifacts. For example, we can extract from source that cannot necessarily be compiled. Our approach extends the kinds of source models that can be produced from lexical information while avoiding the constraints and brittleness of most parser-based approaches. We have developed tools to support this approach and applied the tools to the extraction of a number of different source models (file dependences, event interactions, call graphs) from a variety of system artifacts (C, C++, CLOS, Eiffel, TCL, structured data). We discuss our approach and describe its application to extract source models not available using existing systems; for example, we compute the implicitly-invokes relation over Field tools. We compare and contrast our approach to the conventional lexical and syntactic approaches of generating source models.

ACM Copyright Notice.

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Back to Gail Murphy's Selected Publications Page

Last modified: June 28, 1996

Gail Murphy