Lightweight Source Model Extraction Language

Lightweight Source Model Extraction (LSME) Language


A LSME specification file is split into two basic parts: the pattern and action descriptions, and an (optional) analysis description.

Pattern and Action Descriptions

Patterns are associated hierarchically. To indicate the position of a pattern within the hierarchy, each pattern description is preceded by a name known as the software structure entry name. The software structure entry name consists of a sequence of alphanumeric characters with the dot character acting as a hierarchical separator. For instance, the name:

directory

indicates a software structure entry at the highest level of the hierarchy. The name:

directory.file

introduces a software structure entry at the second level of the hierarchy, and so on. The software structure entry name may optionally be followed by an identifier (see below for more information on identifiers). See the SMEGenerator manual page for an example.

Pattern Descriptions

Pattern descriptions are attached to software structure entry names in between %% signs. For example:

directory
%
<aPattern>
%

attaches a simple pattern to the directory software structure entry. A pattern description consists of pattern characters, one-character tokens and identifiers. The pattern characters include:

{ }+ indicating anything between the brackets repeats one or more times.

{ } indicating anything between the brackets repeats exactly once.

[ ] indicating anything between the brackets appears zero or one times.

( ) indicating alternative choices (each choice is separated by a |).



One-character tokens consist of any single character within a pattern. For example:

directory
%
!
%

introduces ! as a one-character token. The following characters may be escaped as one-character tokens: ( ) { } [ ] \n.<

Identifiers consist of variables names within < and > symbols. For example:

directory
%
<aPattern>
%

introduces aPattern as a variable name. This identifier will match any non-whitespace sequence of characters other than a one-character token. The aPattern variable may be accessed within action code.

Action Code

Action code may be attached to any single-character token or identifier. The token or identifier is followed by a @ sign. The action code then follows. The action code is any Icon code. (Each line of action code must be terminated by \n). The action code terminates with another @ sign. For example:

directory
%
<aPattern>@
write ("foo") \n
@
%

will write out foo whenever an identifier is matched in the source artifact.

Quirks

Pattern characters, one-character tokens, and identifiers must be separated by whitespace in pattern descriptions. This is a limitation of the current parser.

Comments

The starting and ending character sequences for comments within the source artifacts scanned may be described as:

comment start end

where start and end are the character sequences. For example:

comment /* */ comment // \n

describes how to ignore comments in C++ code.

LSME Comments

Comments may appear inbetween pattern descriptions or within action code. Comments appearing inbetween pattern descriptions are lines beginning with #. Comments in action code are lines beginning with # and ending with \n.

Initialization Code

Initialization (Icon) code (including global variable declarations) may be placed within an:

init @
@

block at the beginning of the file.

Analysis Descriptions

Under construction!

Examples

Under construction!

Comments to murphy@cs.ubc.ca

Back to LSME Home Page