SMEGenerator Manual Page


This page describes version 1.6 of the SMEGenerator.


SMEGenerator -- Generates a scanner from an LSME specification


SMEGenerator [-cntirdv] -s SpecificationFile


The SMEGenerator reads the pattern and action descriptions from a LSME specification file and generates a scanner program. The scanner program is an Icon program which reads from standard input, searches for the patterns, and when matches to the patterns are found, executes associated action code. The Icon program generated (which is written to standard output) may be executed with the Icon translater, or compiled with the Icon compiler. This manual page assumes you have Icon installed. For more information on Icon, see the Icon home page.

Currently, the best reference manual for the LSME specification language is in our paper. An initial language reference manual is available here. Hopefully, this will become better soon.

SMEGenerator may be used to generate either a set of deterministic finite state machines for scanning, or a single non-deterministic finite state machine. The older, non-deterministic machine, is kept for compatability tests. A flag is used to select the kind of finite state machine desired. By default, the deterministic finite state machines are generated.


-c Uses the "comment" lines in the specification to ignore comments in the source files

-t #Sets the maximum number of tokens to match before pruning a path (Heuristic #3).

-i [1,2]Set the level of statistics to report (1 or 2)

-rTurn on run-time tracing (for debugging)

-dDump a description of the state machines

-vPrint the version # and exit (must provide -s value)


The pattern and actions used to extract a source model are dependent, in part, on the language and style of coding used in the system's source code. To illustrate the basic features of the SMEGenerator, we use the following example of extracting include information from C source and header files (i.e., .c and .h files). The scanner generated from these patterns may be used to scan both the source and header files.

(The scanning of include information could be easily accomplished using a simpler lexical tool like grep. It is used as an example here because of its simplicity. For examples that demonstrate more of the unique capabilities of the lexical source model extrator, see a paper describing the tool.)

The desired source model is a stream of output consisting of:


The specification file consists of (lines beginning with # are comment lines):

# The patterns are attached to a hierarchical description of
# the software structure.
# We start by scanning a file. The <file> introduces a
# variable named file.
file <file>

# Then once we are scanning a file, we want to look for include
# statements. The %% bracket a pattern.
\#include ( \< | " ) <filename> @

# We pass in the name of the file being scanned as the \n
# first argument to the scanning program \n
file := getArgument(1) \n

write ( file, " includes ", filename ) \n

@ ( \> | " )

Next, we generate the scanning program:

SMEGenerator -s specFile > imports.icn

Finally, we use the icon translater (icont) to scan a .C file and output the source model.

icont imports.icn -x < aCFile

Comments to

Back to Lightweight Source Model Extraction Home Page