CpSc 513 Verification in Real Life Jan. 7, 2013
SLAM and the verifying the i7 design

0. Topics for today
I. Formal verification methods in today's papers
II. SLAM/SDV
III. Verification in the i7 design
IV. Review and Preview

I. Formal verification methods in today's papers
A. Model checking
1. Model a program as having a finite set of states
a. For example, a program that has a fixed number of boolean
variables.
b. If a program has N boolean variables, then these can take on
2^N combinations of values.
c. If the same program has K possible values for the program
counter (for now, assume no recursion), then there are
K*2^N possible program states.
d. Not all of these states can occur in real executions.
2. The verification problem:
a. Let q0 be the initial state of the program
* e.g. PC=0, all program variables are 'false'
* more generally, we might have, Q0, a set of possible initial
states (e.g. some variables may be uninitialized).
b. Let Q be the set of all possible states.
c. Let R be the state transition relation of the program
* (q, q') IN R iff the program can go from state q to state q'
in a single step.
* Why is R a relation?
Allow uncertainty in the model to account for:
inputs
actions by the environment
simplifications that we made to the program to make
verification tractable, etc.
* Usually we don't write down R explicitly (i.e. we don't
list all of the state pairs in R)
+ Instead, for any q, we can find all q' such that (q, q') IN R
by looking at the program source code.
+ Note that q tells us the current PC and the values of all
program variables.
+ We just consider executing that statement from with those
values for the variables in the program.
d. Let Spec: Q -> boolean be the specification.
* We want to show for all reachable states q, Spec(q) holds.
3. Model checking
let Pending := { q0 };
let Reachable := EmptySet;
while Pending != EmptySet do
let q := choose(Pending);
Pending = Pending - {q};
if NOT Spec(q) return(fail);
endif
for each q' such that (q, q') IN R do
if NOT (q' IN Reachable)
Pending = Pending UNION {q'};
Reachable = Reachable UNION {q'};
endif
end // for
end // while
return(pass);
a. This algorithm returns 'fail' if it is possible to reach a
state that doesn't satisfy Spec. Otherwise, all reachable
states satisfy Spec and model checking returns 'pass'.
b. The algorithm terminates because:
* There are a finite number of possible states
* Each state is added to Reachable at most once
* Each state is added to Pending at most once
* Each iteration of the while loop removes a state from Pending,
thus, the while loop body is executed at most |Q| times.
* Likewise, each time the for loop is encountered, its body
is executed at most |Q| times.
* The total runtime is at most |Q|^2
c. The algorithm can be used to produce counterexample traces.
For every state q' in Reachable
either q = q0
or we can find a state q such that q in Reachable and
(q, q') in Reachable
Note: q is just the state we had for
d. Later, we'll learn more about model checking
* Algorithms that work well in practice.
* How to check other kinds of properties as well.

B. Abstraction
1. In general
a. Real programs can have an unbounded number of variables
* Recursion
* Dynamic allocation
* Even a program with a small number of int's, float's, or
double's has astronomically more states than we can hope
to explore.
b. In practice, model checking is performed on a "simplified"
version of the program. Abstraction is one way to do this.
c. Let Q be the set of states of the program, and A be a set of
"abstracted" states.
d. Define a function abs: Q -> A so that for q in Q, abs(q) is
the abstraction of q.
e. Now, define an abstract transition relation R_abs so that
(a, a') IN R_abs if Exists (q, q') IN R s.t.
a = abs(q) AND a' = abs(q')
f. Let Spec(q) = Spec'(abs(q))
g. If Spec' is a safety property of the program with state space
A, transition relation R' and initial state abs(q0), then
Spec is an safety property of the original program.
h. Thus, we can prove safety properties of complicate programs
by proving the property for a simpler program.
2. Predicate abstraction
a. An abstraction function where state variables of the abstract
program correspond to predicates over the values of the
variables of the detailed program.
b. Often, we can use the predicates that appear as conditions of
if-statements, or while-loops, etc. as the variables for the
abstract program.
c. There's a bunch of cool theory (that we'll cover later) that
shows this approach is very general.
3. CEGAR -- Counter-Example Guided Abstraction Refinement
a. If our desired property fails to hold for the abstract program
what can we conclude about the original program?
Not much.
* The original program may have a bug,
* Or, the abstraction may be too coarse.
b. Let a_0, a_1, ... a_k be a counter-example for the abstract
program.
* Can we find
q_0, q_1, ... q_k in the original program such that
ForAll 0 <= i <= k. a_i = abs(q_i)
AND ForAll 0 <= i < k. (q_i< q_(i+q)) IN R
?
* If "yes", then we've found a counter-example for the original
program.
* If "no", then in many cases, this tells us why our abstraction
is too coarse, and we can use this to (in some cases
automatically) find a better abstraction.
* In future lectures we will:
+ Look at abstraction in more detail.
+ Consider more general versions where the original program
may take several steps for one step of the abstract program
+ ...

C. Inductive Invariants
1. Another way to show that as safety property, Spec, hold in all
reachable states.
a. Find a predicate I such that
ForAll (q, q') IN R. I(q) => I(q')
b. In English, this means that if I holds in some state, it
will hold in the next state as well.
c. By induction (hence the name "inductive invariant"), if I
holds in some state, it will hold in all future states.
d. Thus, if I holds in state q0, it will hold in all reachable states.
e. If I holds in q0 and I => Spec, then Spec holds in in all
reachable states.
2. Showing that a predicate, I, is an invariant can be *much* easier
than finding all reachable states.
3. Finding I is the hard part:
a. human ingenuity (scarce)
b. software that "discovers" invariants (hot research topic)
c. combined human/computer methods

D. Symbolic Simulation
1. Replace some of the values in a simulation with symbolic variables.
a. Propagate the variables through the simulation.
b. Check properties at the end.
2. Generalizes a simulations
a. Increases coverage
b. To keep it tractable, not all variables are symbolic.
3. There are systematic ways to combined symbolic simulation results
to get a complete proof.

II. SLAM/SDV
0. Overview
A. Motiviation
1. Research goals, demonstrate an application of
* Boolean programs (predicate abstraction)
* Model-checking for pushdown automata
* Pointer-alias analysis
2. Practical goal
Provide better bug-finding for Windows device drivers.
B. Glossary:
API -- Applications Program Interface
DDK -- Driver Development Kit (for Windows Server 2003)
see also WDK
ML -- originally "MetaLanguage", a functional programming language,
see
http://en.wikipedia.org/wiki/ML_(programming_language)
http://www.cs.cmu.edu/~rwh/introsml/
Functional languages are widely used in formal verification because
they it is relatively easy to give a mathematically precise meaning
(i.e. semantics) to programs written in functional languages.
SDV -- Static Driver Verifier
SLAM -- Software, programming Languages, And Model checking
or Specifications, programming Languages, And Model checking
SLIC -- Specification Language for Interface Checking
WDK -- Windows Driver Kit (for Windows Vista)
WDF -- Windows Driver Framework (or "Windows Driver Foundation")
A toolkit for writing drivers for windows. Includes Microsoft
supplied "good" drivers that the programmer can extend or override
to create their own driver. See:
http://en.wikipedia.org/wiki/Windows_Driver_Foundation
WDM -- Windows Driver Model
Predecessor to the WDF. A framework for Windows device drivers
that was used with Windows 98, 2000, XP, Server 2003, and Vista.
The programming model is more a "write the driver from scratch"
compared with the "override and extend" approach for the WDF. See:
http://en.wikipedia.org/wiki/Windows_Driver_Model

A. SLAM -- Software, programming Languages, And Model checking
1. Stateful usage rules
a. Many rules for device drivers are about the order in which
kernel operations may be performed:
+ E.g. it is an error to attempt to acquire a lock that the
thread already holds.
+ E.g. it is an error to attempt to release a lock that the
thread does not hold.
b. These can be expressed as rules about state machines.
Model checking can be used to show that the program never
reaches a bad state.
2. Boolean programs
a. All state variables are boolean valued
b. These vahttp://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22riables correspond to predicates in the original
program and predicates identified in the CEGAR process.
3. Monitors: SLIC
a. Properties are specified in a C-like language that describes
monitoring operations in the kernel functions.
b. These functions aren't actually compiled into the kernel
c. Instead, they are used by the verifier to say:
Whenever you see the driver call kernel function KeFunc,
take the following actions in the abstract program.
d. In their example, the function to acquire a lock sets a variable
called "state" to the value Locked, and the function to
release the lock sets the value of "state" to Unlocked.
4. Model-checking in SLAM
a. Same idea as described in I.A.3
b. Extended to handle pushdown automata (machines with stacks)
* They use inter-procedure analysis to reduce the complexity
of the pushdown automaton to keep the analysis practical.
c. They use binary decision diagrams (to be covered in the Jan. 9
lecture) to represent machine states symbolically.
* A large number of machine states can be represented by a
"simple" formula.
4. CEGAR (CounterExample Guided Abstraction Refinment) in SLAM
a. Start with a very simple, automatically generated abstraction
* The abstraction is a boolean program.
b. Model check the abstract program.
c. If the abstract program passes the check, then the real program
does as well.
d. If the abstract program fails the check, try to create a
counter-example in the original program. This is done using
a "SMT" solver (to be covered later in this course).
e. If a counter-example is found, it is reported.
f. If there is no such counter-example, the SMT solver provides a
proof http://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22that the formula describing the counter-example is
unsatisfiable.
g. The really cool thing is that by using this proof, they can
automatically extract one or more predicates that can be
added to the abstract program (as additional boolean valued
variables).
* This produces a boolean program whose executions are a subset
of those of the previous boolean program. In particular,
the bogus counterexample is now excluded.
* Steps b and beyond are re-executed with this new abstraction.
h. Although each refinement produces a tighter abstraction the
verification is not guaranteed to terminate.
Why?
5. Example:
a. Verifying a lock...unlock sequence where both operations
are guarded by (x > 0)
b. Originally the boolean program allows an unlock without a lock
* The x > 0 conditions had been replaced with non-deterministic
choices.
* The model-checker finds a trace where the lock is skipped
but the unlock is performed.
c. This trace of the boolean program doesn't correspond to any
execution of the real program.
* The refinement process adds a boolean variable corresponding
to the predicate "x > 0" to the boolean program.
d. Now, the condition for performing the lock and the condition
for the unlock always do the same thing, and the program
passes the check.

B. SDV -- Static Device Verifier
0. Verification needs more than a model checker
a. Need to specify properties to verify
b. Need an environment model
c. Need to integrate into development environment
1. Specifying properties to verify
a. specify properties for each driver API
b. separate properties specified for
* network drivers
* storagehttp://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22 device drivers
* graphics device drivers
c. writing these rules requires formal methods expertise
* The effort is amortized over the large number of drivers to be
verified.
* Some rules developed by the formal methods team at Microsoft
research.
* Most developed by other groups in Microsoft:
The formal methods research group provides formal methods
training so other groups can have there "rule writers".
They describe the success of hiring summer interns to
write rules.
2. Modeling the environment
a. Model the application program that calls the driver, and
* Use non-determinism to express that the application program
may make an arbitrary sequence of calls.
* If there are restrictions on how the application program
can call the driver, these are included in the model as
well.
b. Model the behaviour of system calls
* Use non-determinism for
+ return codes
+ return values (e.g. pointers into buffers) --
need to make this finite-state for the model checker.
3. Integrating into software development environment
a. Scripts to combine driver code with rules and environment model
b. The user runs these scripts
c. The script produces an error summary along with counter-example
traces.

C. Experience
1. Successes
a. Found many real bugs in real drivers
b. Boolean abstraction works well with programs up to a few
tens of thousands of source code lines.
2. Limitations
a. Only checks for "sequence of operation" rules
I don't believe it finds pointer errors, array bound violations
or similar programming errors.
b. Only checks sequential programs
But some recent work has studied verification of concurrent
programs assuming "bounded context switching" (see ref. 40).
c. CEGAR and predicate abstraction impractical for programs
with hundreds of thousands of source code lines or more.

III. Verification in the i7 design
0. Motivation and Glossary
A. Motivation
1. Find more design errors earlier in the "execution cluster"
of the i7 processor architecture.
2. Claim that verification is superior to simulation for designs
of this complexity in terms of
a. quality of design
b. time and cost of verification effort
B. Glossary
AGU -- "Advanced Vector Extensions". Instruction for 256-bit
"vectors".
AVX -- "Address Generation Unit". See III.A.4.iv.
CTE -- "Cluster Test Environment", for simulation.
CVE -- "Cluster Verification Environment", for formal verification.
EXE -- The "EXecution Cluster". See III.A.4
FEC -- The "Front End Cluster". See III.A.3.a
FPU -- "Floating Point Unit". See III.A.4.iii.
IA-32 -- The Intel version of the x86 instruction set.
IEU -- "Integer Execution Unit. See III.A.4.b.i.
JEU -- "Jump Execution Unit". See III.A.4.b.v.
MEC -- "MEmory Cluster", see III.A.3.d
MIU -- "Memory Interface Unit". See III.A.4.b.vi.
MMX -- SIMD instructions for "packed integers"
Uses the 64-bit wide FPU datapath to do any of
2 32-bit integer operations
4 16-bit integer operations
8 8-bit integer operations
as a single operation
OOO -- Out-Of-Order cluster. See III.A.3.b
SIMD -- Single-Instruction-Multiple-Data, a simple form of
parallel computation where multiple execution units
perform the same operation on several sets of operands.
SIU -- SIMD Integer Unit. See III.A.4.b.ii.
SMT -- Simultaneous Multi-Threading, see III.A.2.c
SSE -- "Streaming SIMD Extenstions". Instructions for 128-bit
"vectors" that can be treated as any of:
2 doubles, 4 floats, 4 32-bit integers, 8 16-bit integers,
16 8-bit integers.
STE -- "Symbolic Trajectory Evaluation", a form of symbolic
simulation.
vPro -- Hardware support for security.

A. Organization of an i7 computer
1. The CPU (central processing unit) interfaces with
a. memory (DRAM)
b. storage devices (disk, FLASH, etc.)
c. displays
d. networks (ethernet, WiFi, Infiniband, etc.)
e. other I/O devices
2. The CPU consists of multiple "cores"
a. These cores form a shared-memory, parallel computer
* abbreviated CMP for "Chip MultiProcessor"
* a.k.a. SMP for "Shared Memory multiProcessor"
b. Each core can fetch and execute instructions independently
c. In fact, an i7 core can fetch and execute instructions
from two different threads or processes and execute them
at the same time. This is known as SMT ("Simultaneous
Multi-Threading"). In Intel-marketing lingo, it's called
"hyperthreading".
3. Each i7 core is organized with the following top-level units
called "clusters" in the paper:
a. Front End Cluster (FEC):
* fetches instructions from the L1 instruction cache
(see the MEC below). Each core can fetch multiple
four instructions per cycle.
* translates each x86 instruction into one or more
"microinstructions". The microinstructions are
basically RISC instructions. An x86 instruction may
perform multiple operations. RISC instructions, in
general, each perform a single operation.
Translated instructions are stored in a microinstruction
cache to save power when the same batch of instructions
is executed multiple times (e.g. in a loop).
* branch prediction:
+ Each core fetches up to four instructions per cycles.
+ It takes multiple cycles to execute a branch.
=> 10's of instructions may be fetched between fetching
a branch and knowing for sure whether or not the
branch is taken.
~> Each core records the history of recently executed branches
and guesses the outcome based on the history. If it
guesses right (in practice, most of the time), execution
continues. When a guess turns out to be wrong, the CPU
aborts instructions that where fetched after the branch,
rolls back its state to the point of the branch, and
resumes from there.
b. Out-Of-Order cluster (OOO):
* receives microinstructions from FEC
* dynamically computes dependency graph for instructions.
* an instruction can execute as soon as its operands are available.
* The OOO commits instructions in program order.
+ If an instruction generates an execution (e.g. a page fault,
an arithmetic exception, etc.), the instruction and all
subsequent ones are aborted.
+ This is like committing transactions in databases.
c. EXEcution cluster (EXE):
* Has separate hardware for various classes of instructions
* Details given under III.A.4
d. MEmory Cluster (MEC):
* on-chip caches
* cache coherence: make sure that reads and writes to the same
location from different cores produce consistent results.
* interface to off-chip DRAM
4. The EXEcution cluster
a. The focus of this paper.
b. Has Separate hardware for various classes of instructions
i. scalar integer (IEU)
Performs integer add, subtract, comparisons, bitwise
logical, shifts, etc.
ii. SIMD integer (SIU)
Performs SIMD integer instructions (MMX and SSE).
iii. floating point (FPU)
Performs floating point operations such as
FADD, FMULT, FDIV, FCOS, etc.
iv. Address generation for memory loads and stores (AGU)
Computes addresses for loads and stores, access checks, etc.
v. Jumps and branches (JIU)
Handles jump and branch instructions to confirm (or not)
the branch predictor.
vi. Memory interface unit (MIU)
Handles data transfers with the caches.
Receives data for loads.
Handles store buffering and bypasses.
c. Queues for (roughly) instructions of each class
d. Instructions are executed as soon as their operands are ready.
* If multiple instructions from the same queue are ready to
execute on the same cycle, the oldest one is taken first
(I'm guessing because this is the standard way of doing it).
e. Some added clever tricks for SSE and AVX instructions that
use multiple functional units in parallel.
f. More cleverness for vPro instructions, special floating point
"shuffling", etc.
g. Integer ALUs included on many of the execution ports because
* They are relatively small.
* They can use the queue and other resources when the program
doesn't make heavy use the dedicated unit for this execute
port.

B. Design verification at Intel
1. Overall objectives
a. minimize time-to-market:
* profit margins are highest when the processor is first sold
* trade-off between pre-silicon verification and post-silicon
debug
+ pre-silicon lets the designer/verifier control and/or observe
every signal.
- But, simulation is *much* slower than the real chip.
+ post-silicon lets the tests run at full speed.
- But, it's much harder to figure out what went wrong
(or even if something went wrong).
b. need to find nearly all the bugs before selling the chip
+ recalls are expensive, and lower the value of the brand
~ need to keep bug-rate low enough that crashes can be blamed
on Microsoft. ;)

2. Simulation-based methods
a. Many kinds of simulation:
* exercise the design: "easy" checks, finds simple bugs.
* stress tests: corner cases, etc.
* coverage tests: lots of random and biased random tests
to look for unanticipated bugs.
b. Multiple levels of simulation
* Unit testing is the norm in hardware
* The paper focuses on cluster simulation/verification, and
* Below the cluster, mix of simulation (for easy cases) and
formal methods (for coverage).
* At the whole chip level, simulation is used.

3. Formal verification
a. Based on symbolic simulation
b. A theorem proving approach is used to combine symbolic
simulation results into complete proof.
c. Proofs are hard
* This has led to an emphasis on
+ Proof engineering
+ Proof re-use
* This works well in the Intel environment where successive
microprocessors have very similar specifications and
architectures.

C. Verifying the i7 execute cluster
1. Scope
a. Formal verification used instead of simulation for
stress testing and coverage
b. Formal verification included control state and bypassing logic.
c. Formal methods set up as nightly regression checks.
d. But, not all verification completed by tape-out
* The expectation is that some bugs will be found post-Si
* A bug escape may be OK if it doesn't block other tests.
* Especially good if you know what it is and how to handle it
by the time the chip is back from fabrication.
* Of course, it's always better to find the bug before fabrication.
2. Some technical aspects
a. Most instructions verified by symbolic simulation
b. Interactions between the execution units complicated the
verification
* but this is probably a big reason *why* verification at
the cluster level has a value beyond just verifying the
individual units.
c. Symbolic simulation handles most instructions
* floating multiply and divide must be broken into smaller steps:
+ BDDs don't represent "product" efficiently
+ This is where theorem proving excels.
* Some operations required case-splits and parametric simulations,
in other words, multiple symbolic simulations to cover all
of the cases while keeping each simulation tractable.
* Special cases for the AGU because it involves indexing into
tables of addresses for pending memory operations.
d. Use of inductive invariants for the control path
* complicated by clock-gating:
+ can't assume that all state variables are updated each cycle
? are they up-to-date when they are needed?
3. Success
a. Fewest post-Si bugs of any cluster
b. Value seen when verification replaces simulation
* argue against verification supplementing simulation:
- incurs costs of both
- the extra cost of formal "only" has the return of the
additional bugs that it finds.
c. Shows value of moving formal verification to higher levels
of the design hierarchy.
4. Limitations: five bugs made it past tape-out
a. one because of an incomplete formal specification
b. two because the verification was incomplete before tape-out
c. one because the protocol didn't work as intended
* Formal verification showed the hardware implemented the protocol
* The chip-level outcome of using the protocol had not been
verified.
c. two more bugs because of re-use of specifications and designs
from an earlier processor even though an architectural change
had been intended.

IV. Review and Preview
A. Review
1. Summary of SLAM
a. model checking finds real bugs in production code
b. extensive use of abstraction. As much as possible,
* ignore data values
* focus on control flow
c. to make this useful for real designs, the model checker must
be integrated with the software development toolkit to be
as "pushbutton" as possible.
2. Summary of i7 verification
a. symbolic simulation finds real bugs in big chip designs
b. As much as possible
* focus on data values using symbolic simulation
* capture control flow using invariants, manual effort
c. to make this useful for real designs,
* formal methods must *replace* entire categories of simulation
* "proof engineering" needed to re-use the large manual effort
required for specifications and proofs.

B. Where are we?
1. Done with course intro
a. Jan. 2: course overview and quick sketch of the topic
b. Jan. 7: two examples of applications of formal verification
to large, industrial designs.
2. Next: some formal verification basics
a. symbolic booleans with OBBDs
OBDDs are, perhaps, passé, but they are fairly easy to explain
and they work well for many examples.
b. equivalence checking
c. semantics of programs:
from source code to a mathematical representation
d. model checking
3. Then, scaling this up to real-world applications
a. SAT solvers
b. SMT solvers
c. abstraction, interpolation
d. maybe some theorem proving
4. Finally, other domains
a. robots and vehicles
b. analog circuits
c. other things involving real numbers

C. Preview of the OBDD paper.
1. Key ideas:
a. OBDDs represent boolean functions as "compressed"
decision trees
b. this representation is canonical, and works well with
many operations that are commonly needed for boolean
functions: AND, OR, EXISTS, FORALL, function composition,
etc.
c. while OBDDs work well in practice for many problems, they are
not magical. There are problems (e.g. multiplication) for
which OBBDs require exponential space.
2. Applications
a. representing finite domains (Section 4)
b. logic circuit verification (Section 5.1)
c. error correction, logic optimization (Sections 5.1 & 5.2)
d. model checking (Section 6)
e. Note: the probablistic stuff (Section 5.4), and AI stuff
(Section 7) seemed like cool ideas at the time, but I
don't think they've really gone anywhere. Likewise, the
intro promises applications of OBDDs to problems from
mathematical logic, but the paper doesn't seem to give
any examples.
3. Now, read the paper, and form your own evaluation, ask your
own questions, etc.

VI. Further references:
A. Some papers about the i7
"A fully integrated multi-CPU, GPU and memory controller 32nm processor"
Two-page paper about the Sandy Bridge family (IEEE Solid State
Circuits Conf.)
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5746311
"Power-Management Architecture of the Intel Microarchitecture
Code-Named Sandy Bridge"
IEEE Micro paper (accessible to general audience) describing
design decisions in the Sandy Bridge family.
http://dx.doi.org/10.1109/MM.2012.12
http://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22
An on-the-web description of Sandy Bridge. Of the three references
that I've found, this one has the block diagrams that can help you
decipher the names of all the clusters and units described in the
i7 verification paper.