RMTool: File Formats

Structure Description

Source Model
Encoding Entities
Representing Relationships

Computing a reflexion model requires the specification of several files. This page describes the format of the different files used by a computation.

Structure Description

The structure description file describes how you will refer to source entities in the map and what information you will have available about source entities. The precise information you want will be dependent upon the language(s) in which the system you are analyzing is written and the kind of task you are trying to perform.

As an example, suppose that you are investigating a system written in C++ and the information you will be extracting from the system consists of calls between methods and references of methods to global variables. In the map, you may want to refer to entities by:

the name of a method,
the name of a global variable,
all methods in some class,
all classes in some file, or
all files in some directory.

The structure description file lets you describe a hierarchical naming scheme (a source model entity naming tree) to support these different ways of referring to an entity. For the C++ example, you might choose:

directory
   directory.file
      directory.file.class
         directory.file.class.method
      directory.file.variable

This lets you say in a map file:

[ directory=foo method=bar mapTo=something ]

to name a method "bar" in any class in any file that is in the "foo" directory. Or, you could say:

[ file=afile variable=avar ]

to name a global variable named "avar" in the "afile" file.

The structure description file dictates how source model entities are encoded in the source model file. See the description for the Source Model for further details.

Source Model

Each line in a source model file describes a relationship between two source entities. Each line is of the form:

sourceEntity1 sourceEntity2 [type]

where <type> is an optional item, potentially describing the type of the relationship between the entities. Note that each item on the line is separated by a space.

Encoding Entities

Each of the two source entities is encoded by the structure description file. The naming tree in the structure description file dictates numbers to use to refer to each part of the name of an entity. Numbers are assigned depth-first. For the example in the Structure Description section above:

directory -> 1
   directory.file -> 2
      directory.file.class -> 3
         directory.file.class.method -> 4
      directory.file.variable -> 5

A source model entity is encoded by preceding each piece of the naming information by the number in @@ symbols and concatenating everything together. So if there is a method "bar" in a class "barClass" in a file "barFile" in a directory "barDir", it is encoded as:

@1@barDir@2@barFile@3@barClass@4@bar@5@

And if there is a method "foo" in a class "fooClass" in a file "fooFile" in a directory "fooDir", it is encoded as:

@1@fooDir@2@fooFile@3@fooClass@4@foo@5@

And if there is a global variable "gv" in the "fooFile", it is encoded as:

@1@fooDir@2@fooFile@3@@4@@5@gv

As shown above, you do not have to have each piece of naming information. For instance, if you do not know which file the "fooClass" was in, the entry would be encoded by not putting anything in that
field:

@1@fooDir@2@@3@fooClass@4@foo@5@

When you have "conflicting" information described in your source model entity naming tree, some fields will always be blank. For instance, in the examples above, it does not make sense to specify a value for both a method name and a global variable name since one entity cannot be both.

Representing Relationships

If the bar method calls the foo method, the line in the source model becomes (note it may not format as one line in your browser):

@1@barDir@2@barFile@3@barClass@4@bar@5@ @1@fooDir@2@fooFile@3@fooClass@4@foo@5@ call

where the "call" string at the end of the line specifies this is a relationship of type "calls". If the source model contains only one kind of relationship, there is no need to provide any value for the "type" item. If the source model contains both "calls" and "data references", the "type" item can be used to distinguish between the different kinds of relationship values in the file.

Mapping

A map file consists of an ordered sequence of map entries. Each entry is placed within square brackets (and may span lines). A source model entity is mapped by the first line in the map it matches (this behaviour can be overridden in the command-line interface to the tools).

A map entry names entities in the source model and associates them with entities in the high-level model.

Source model entities are named by using the keywords specified in the structure description file. For example, if the C++ structure description given above is used, you may use the keywords directory, file, class, method, and varaible as in:

[ file=Parser.cpp mapTo= Parser ]

specifies that all entities in the file Parser.cpp will be mapped to the high-level model entity Parser.

Regular expressions may be used when naming the source entities (but not when naming high-level model entities). The regular expression syntax is that of perl.

By default, wildcards are places around the start and end of a source item regular expression. For instance, stating file=pager means file=.*Parser.*. This wildcarding can be overridden through the use of the ^ and $ operators. For instance, file=^Parser\.cpp$ states only the file named Parser.cpp (and not
Parser.h).

Multiple high-level model entities may be named on the right hand side of a mapping entry. For example:

[ file=Parser.cpp mapTo=Parser mapTo=AnotherEntity]

maps the entities in Parser.cpp to both Parser and AnotherEntity.

Lines beginning (in the first column) with # in the mapping file are considered comment lines. An example map file for the compiler example follows.

# .* in next line is unecessary
[ file=scanner.* mapTo=Parse ]
[ file=buf\.[ch] mapTo=Parse ]
[ class=Parser mapTo=Parse ]
[ class=Ast mapTo=AST ]
[ class=AssignStmt mapTo=AST ]
[ class=BinOp mapTo=AST ]
[ class=Block mapTo=AST ]
[ directory=401 class=CallStmt mapTo=AST ]

High-level Model

Each line in a high-level model file either introduces a node or an arc according to the following syntax:

node_name

to define a node named "node_name" with no dependencies,

node_1 node_2

to define two nodes with a directional arc from "node_1" to "node_2",

node_1 node_2 type

to define two nodes with a directional arc from "node_1" to "node_2" with a particular type.

A line beginning with a # (in the first column) is a comment line.

Nodes may be introduced simply by defining arcs.

As an example, the high-level model file:

Parse AST
AST SymTab
AST CodeGen
CodeGen Object
UnattachedObject

specifies the following high-level model:

Config (Optional)

The graph configuration file describes how to visually display edges in a reflexion model. Each line in the file is of one of the following two formats:

node <name> [shape|color]=<value>
edgetype <name> color=<value>

Lines of the first format specify how various nodes should appear. The allowable shape and color values are those values supported by AT&T's graphviz package. Lines of the second format specify how edges of a particular type should appear.

For example, the line:

edgetype call color=blue

specifies that edges of type "call" should be coloured blue.

The special type "notype" is used to indicate the colour of untyped edges.

An example of a graph configuration file is:

edgetype call color=blue
edgetype data color=green
node Foo shape=diamond

Last Updated: October 20, 2001

Contact murphy@cs.ubc.ca for more information or any problems with this page.