CpSc 513 Verification in Real Life Jan. 7, 2013 SLAM and the verifying the i7 design 0. Topics for today I. Formal verification methods in today's papers II. SLAM/SDV III. Verification in the i7 design IV. Review and Preview I. Formal verification methods in today's papers A. Model checking 1. Model a program as having a finite set of states a. For example, a program that has a fixed number of boolean variables. b. If a program has N boolean variables, then these can take on 2^N combinations of values. c. If the same program has K possible values for the program counter (for now, assume no recursion), then there are K*2^N possible program states. d. Not all of these states can occur in real executions. 2. The verification problem: a. Let q0 be the initial state of the program * e.g. PC=0, all program variables are 'false' * more generally, we might have, Q0, a set of possible initial states (e.g. some variables may be uninitialized). b. Let Q be the set of all possible states. c. Let R be the state transition relation of the program * (q, q') IN R iff the program can go from state q to state q' in a single step. * Why is R a relation? Allow uncertainty in the model to account for: inputs actions by the environment simplifications that we made to the program to make verification tractable, etc. * Usually we don't write down R explicitly (i.e. we don't list all of the state pairs in R) + Instead, for any q, we can find all q' such that (q, q') IN R by looking at the program source code. + Note that q tells us the current PC and the values of all program variables. + We just consider executing that statement from with those values for the variables in the program. d. Let Spec: Q -> boolean be the specification. * We want to show for all reachable states q, Spec(q) holds. 3. Model checking let Pending := { q0 }; let Reachable := EmptySet; while Pending != EmptySet do let q := choose(Pending); Pending = Pending - {q}; if NOT Spec(q) return(fail); endif for each q' such that (q, q') IN R do if NOT (q' IN Reachable) Pending = Pending UNION {q'}; Reachable = Reachable UNION {q'}; endif end // for end // while return(pass); a. This algorithm returns 'fail' if it is possible to reach a state that doesn't satisfy Spec. Otherwise, all reachable states satisfy Spec and model checking returns 'pass'. b. The algorithm terminates because: * There are a finite number of possible states * Each state is added to Reachable at most once * Each state is added to Pending at most once * Each iteration of the while loop removes a state from Pending, thus, the while loop body is executed at most |Q| times. * Likewise, each time the for loop is encountered, its body is executed at most |Q| times. * The total runtime is at most |Q|^2 c. The algorithm can be used to produce counterexample traces. For every state q' in Reachable either q = q0 or we can find a state q such that q in Reachable and (q, q') in Reachable Note: q is just the state we had for d. Later, we'll learn more about model checking * Algorithms that work well in practice. * How to check other kinds of properties as well. B. Abstraction 1. In general a. Real programs can have an unbounded number of variables * Recursion * Dynamic allocation * Even a program with a small number of int's, float's, or double's has astronomically more states than we can hope to explore. b. In practice, model checking is performed on a "simplified" version of the program. Abstraction is one way to do this. c. Let Q be the set of states of the program, and A be a set of "abstracted" states. d. Define a function abs: Q -> A so that for q in Q, abs(q) is the abstraction of q. e. Now, define an abstract transition relation R_abs so that (a, a') IN R_abs if Exists (q, q') IN R s.t. a = abs(q) AND a' = abs(q') f. Let Spec(q) = Spec'(abs(q)) g. If Spec' is a safety property of the program with state space A, transition relation R' and initial state abs(q0), then Spec is an safety property of the original program. h. Thus, we can prove safety properties of complicate programs by proving the property for a simpler program. 2. Predicate abstraction a. An abstraction function where state variables of the abstract program correspond to predicates over the values of the variables of the detailed program. b. Often, we can use the predicates that appear as conditions of if-statements, or while-loops, etc. as the variables for the abstract program. c. There's a bunch of cool theory (that we'll cover later) that shows this approach is very general. 3. CEGAR -- Counter-Example Guided Abstraction Refinement a. If our desired property fails to hold for the abstract program what can we conclude about the original program? Not much. * The original program may have a bug, * Or, the abstraction may be too coarse. b. Let a_0, a_1, ... a_k be a counter-example for the abstract program. * Can we find q_0, q_1, ... q_k in the original program such that ForAll 0 <= i <= k. a_i = abs(q_i) AND ForAll 0 <= i < k. (q_i< q_(i+q)) IN R ? * If "yes", then we've found a counter-example for the original program. * If "no", then in many cases, this tells us why our abstraction is too coarse, and we can use this to (in some cases automatically) find a better abstraction. * In future lectures we will: + Look at abstraction in more detail. + Consider more general versions where the original program may take several steps for one step of the abstract program + ... C. Inductive Invariants 1. Another way to show that as safety property, Spec, hold in all reachable states. a. Find a predicate I such that ForAll (q, q') IN R. I(q) => I(q') b. In English, this means that if I holds in some state, it will hold in the next state as well. c. By induction (hence the name "inductive invariant"), if I holds in some state, it will hold in all future states. d. Thus, if I holds in state q0, it will hold in all reachable states. e. If I holds in q0 and I => Spec, then Spec holds in in all reachable states. 2. Showing that a predicate, I, is an invariant can be *much* easier than finding all reachable states. 3. Finding I is the hard part: a. human ingenuity (scarce) b. software that "discovers" invariants (hot research topic) c. combined human/computer methods D. Symbolic Simulation 1. Replace some of the values in a simulation with symbolic variables. a. Propagate the variables through the simulation. b. Check properties at the end. 2. Generalizes a simulations a. Increases coverage b. To keep it tractable, not all variables are symbolic. 3. There are systematic ways to combined symbolic simulation results to get a complete proof. II. SLAM/SDV 0. Overview A. Motiviation 1. Research goals, demonstrate an application of * Boolean programs (predicate abstraction) * Model-checking for pushdown automata * Pointer-alias analysis 2. Practical goal Provide better bug-finding for Windows device drivers. B. Glossary: API -- Applications Program Interface DDK -- Driver Development Kit (for Windows Server 2003) see also WDK ML -- originally "MetaLanguage", a functional programming language, see http://en.wikipedia.org/wiki/ML_(programming_language) http://www.cs.cmu.edu/~rwh/introsml/ Functional languages are widely used in formal verification because they it is relatively easy to give a mathematically precise meaning (i.e. semantics) to programs written in functional languages. SDV -- Static Driver Verifier SLAM -- Software, programming Languages, And Model checking or Specifications, programming Languages, And Model checking SLIC -- Specification Language for Interface Checking WDK -- Windows Driver Kit (for Windows Vista) WDF -- Windows Driver Framework (or "Windows Driver Foundation") A toolkit for writing drivers for windows. Includes Microsoft supplied "good" drivers that the programmer can extend or override to create their own driver. See: http://en.wikipedia.org/wiki/Windows_Driver_Foundation WDM -- Windows Driver Model Predecessor to the WDF. A framework for Windows device drivers that was used with Windows 98, 2000, XP, Server 2003, and Vista. The programming model is more a "write the driver from scratch" compared with the "override and extend" approach for the WDF. See: http://en.wikipedia.org/wiki/Windows_Driver_Model A. SLAM -- Software, programming Languages, And Model checking 1. Stateful usage rules a. Many rules for device drivers are about the order in which kernel operations may be performed: + E.g. it is an error to attempt to acquire a lock that the thread already holds. + E.g. it is an error to attempt to release a lock that the thread does not hold. b. These can be expressed as rules about state machines. Model checking can be used to show that the program never reaches a bad state. 2. Boolean programs a. All state variables are boolean valued b. These vahttp://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22riables correspond to predicates in the original program and predicates identified in the CEGAR process. 3. Monitors: SLIC a. Properties are specified in a C-like language that describes monitoring operations in the kernel functions. b. These functions aren't actually compiled into the kernel c. Instead, they are used by the verifier to say: Whenever you see the driver call kernel function KeFunc, take the following actions in the abstract program. d. In their example, the function to acquire a lock sets a variable called "state" to the value Locked, and the function to release the lock sets the value of "state" to Unlocked. 4. Model-checking in SLAM a. Same idea as described in I.A.3 b. Extended to handle pushdown automata (machines with stacks) * They use inter-procedure analysis to reduce the complexity of the pushdown automaton to keep the analysis practical. c. They use binary decision diagrams (to be covered in the Jan. 9 lecture) to represent machine states symbolically. * A large number of machine states can be represented by a "simple" formula. 4. CEGAR (CounterExample Guided Abstraction Refinment) in SLAM a. Start with a very simple, automatically generated abstraction * The abstraction is a boolean program. b. Model check the abstract program. c. If the abstract program passes the check, then the real program does as well. d. If the abstract program fails the check, try to create a counter-example in the original program. This is done using a "SMT" solver (to be covered later in this course). e. If a counter-example is found, it is reported. f. If there is no such counter-example, the SMT solver provides a proof http://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22that the formula describing the counter-example is unsatisfiable. g. The really cool thing is that by using this proof, they can automatically extract one or more predicates that can be added to the abstract program (as additional boolean valued variables). * This produces a boolean program whose executions are a subset of those of the previous boolean program. In particular, the bogus counterexample is now excluded. * Steps b and beyond are re-executed with this new abstraction. h. Although each refinement produces a tighter abstraction the verification is not guaranteed to terminate. Why? 5. Example: a. Verifying a lock...unlock sequence where both operations are guarded by (x > 0) b. Originally the boolean program allows an unlock without a lock * The x > 0 conditions had been replaced with non-deterministic choices. * The model-checker finds a trace where the lock is skipped but the unlock is performed. c. This trace of the boolean program doesn't correspond to any execution of the real program. * The refinement process adds a boolean variable corresponding to the predicate "x > 0" to the boolean program. d. Now, the condition for performing the lock and the condition for the unlock always do the same thing, and the program passes the check. B. SDV -- Static Device Verifier 0. Verification needs more than a model checker a. Need to specify properties to verify b. Need an environment model c. Need to integrate into development environment 1. Specifying properties to verify a. specify properties for each driver API b. separate properties specified for * network drivers * storagehttp://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22 device drivers * graphics device drivers c. writing these rules requires formal methods expertise * The effort is amortized over the large number of drivers to be verified. * Some rules developed by the formal methods team at Microsoft research. * Most developed by other groups in Microsoft: The formal methods research group provides formal methods training so other groups can have there "rule writers". They describe the success of hiring summer interns to write rules. 2. Modeling the environment a. Model the application program that calls the driver, and * Use non-determinism to express that the application program may make an arbitrary sequence of calls. * If there are restrictions on how the application program can call the driver, these are included in the model as well. b. Model the behaviour of system calls * Use non-determinism for + return codes + return values (e.g. pointers into buffers) -- need to make this finite-state for the model checker. 3. Integrating into software development environment a. Scripts to combine driver code with rules and environment model b. The user runs these scripts c. The script produces an error summary along with counter-example traces. C. Experience 1. Successes a. Found many real bugs in real drivers b. Boolean abstraction works well with programs up to a few tens of thousands of source code lines. 2. Limitations a. Only checks for "sequence of operation" rules I don't believe it finds pointer errors, array bound violations or similar programming errors. b. Only checks sequential programs But some recent work has studied verification of concurrent programs assuming "bounded context switching" (see ref. 40). c. CEGAR and predicate abstraction impractical for programs with hundreds of thousands of source code lines or more. III. Verification in the i7 design 0. Motivation and Glossary A. Motivation 1. Find more design errors earlier in the "execution cluster" of the i7 processor architecture. 2. Claim that verification is superior to simulation for designs of this complexity in terms of a. quality of design b. time and cost of verification effort B. Glossary AGU -- "Advanced Vector Extensions". Instruction for 256-bit "vectors". AVX -- "Address Generation Unit". See III.A.4.iv. CTE -- "Cluster Test Environment", for simulation. CVE -- "Cluster Verification Environment", for formal verification. EXE -- The "EXecution Cluster". See III.A.4 FEC -- The "Front End Cluster". See III.A.3.a FPU -- "Floating Point Unit". See III.A.4.iii. IA-32 -- The Intel version of the x86 instruction set. IEU -- "Integer Execution Unit. See III.A.4.b.i. JEU -- "Jump Execution Unit". See III.A.4.b.v. MEC -- "MEmory Cluster", see III.A.3.d MIU -- "Memory Interface Unit". See III.A.4.b.vi. MMX -- SIMD instructions for "packed integers" Uses the 64-bit wide FPU datapath to do any of 2 32-bit integer operations 4 16-bit integer operations 8 8-bit integer operations as a single operation OOO -- Out-Of-Order cluster. See III.A.3.b SIMD -- Single-Instruction-Multiple-Data, a simple form of parallel computation where multiple execution units perform the same operation on several sets of operands. SIU -- SIMD Integer Unit. See III.A.4.b.ii. SMT -- Simultaneous Multi-Threading, see III.A.2.c SSE -- "Streaming SIMD Extenstions". Instructions for 128-bit "vectors" that can be treated as any of: 2 doubles, 4 floats, 4 32-bit integers, 8 16-bit integers, 16 8-bit integers. STE -- "Symbolic Trajectory Evaluation", a form of symbolic simulation. vPro -- Hardware support for security. A. Organization of an i7 computer 1. The CPU (central processing unit) interfaces with a. memory (DRAM) b. storage devices (disk, FLASH, etc.) c. displays d. networks (ethernet, WiFi, Infiniband, etc.) e. other I/O devices 2. The CPU consists of multiple "cores" a. These cores form a shared-memory, parallel computer * abbreviated CMP for "Chip MultiProcessor" * a.k.a. SMP for "Shared Memory multiProcessor" b. Each core can fetch and execute instructions independently c. In fact, an i7 core can fetch and execute instructions from two different threads or processes and execute them at the same time. This is known as SMT ("Simultaneous Multi-Threading"). In Intel-marketing lingo, it's called "hyperthreading". 3. Each i7 core is organized with the following top-level units called "clusters" in the paper: a. Front End Cluster (FEC): * fetches instructions from the L1 instruction cache (see the MEC below). Each core can fetch multiple four instructions per cycle. * translates each x86 instruction into one or more "microinstructions". The microinstructions are basically RISC instructions. An x86 instruction may perform multiple operations. RISC instructions, in general, each perform a single operation. Translated instructions are stored in a microinstruction cache to save power when the same batch of instructions is executed multiple times (e.g. in a loop). * branch prediction: + Each core fetches up to four instructions per cycles. + It takes multiple cycles to execute a branch. => 10's of instructions may be fetched between fetching a branch and knowing for sure whether or not the branch is taken. ~> Each core records the history of recently executed branches and guesses the outcome based on the history. If it guesses right (in practice, most of the time), execution continues. When a guess turns out to be wrong, the CPU aborts instructions that where fetched after the branch, rolls back its state to the point of the branch, and resumes from there. b. Out-Of-Order cluster (OOO): * receives microinstructions from FEC * dynamically computes dependency graph for instructions. * an instruction can execute as soon as its operands are available. * The OOO commits instructions in program order. + If an instruction generates an execution (e.g. a page fault, an arithmetic exception, etc.), the instruction and all subsequent ones are aborted. + This is like committing transactions in databases. c. EXEcution cluster (EXE): * Has separate hardware for various classes of instructions * Details given under III.A.4 d. MEmory Cluster (MEC): * on-chip caches * cache coherence: make sure that reads and writes to the same location from different cores produce consistent results. * interface to off-chip DRAM 4. The EXEcution cluster a. The focus of this paper. b. Has Separate hardware for various classes of instructions i. scalar integer (IEU) Performs integer add, subtract, comparisons, bitwise logical, shifts, etc. ii. SIMD integer (SIU) Performs SIMD integer instructions (MMX and SSE). iii. floating point (FPU) Performs floating point operations such as FADD, FMULT, FDIV, FCOS, etc. iv. Address generation for memory loads and stores (AGU) Computes addresses for loads and stores, access checks, etc. v. Jumps and branches (JIU) Handles jump and branch instructions to confirm (or not) the branch predictor. vi. Memory interface unit (MIU) Handles data transfers with the caches. Receives data for loads. Handles store buffering and bypasses. c. Queues for (roughly) instructions of each class d. Instructions are executed as soon as their operands are ready. * If multiple instructions from the same queue are ready to execute on the same cycle, the oldest one is taken first (I'm guessing because this is the standard way of doing it). e. Some added clever tricks for SSE and AVX instructions that use multiple functional units in parallel. f. More cleverness for vPro instructions, special floating point "shuffling", etc. g. Integer ALUs included on many of the execution ports because * They are relatively small. * They can use the queue and other resources when the program doesn't make heavy use the dedicated unit for this execute port. B. Design verification at Intel 1. Overall objectives a. minimize time-to-market: * profit margins are highest when the processor is first sold * trade-off between pre-silicon verification and post-silicon debug + pre-silicon lets the designer/verifier control and/or observe every signal. - But, simulation is *much* slower than the real chip. + post-silicon lets the tests run at full speed. - But, it's much harder to figure out what went wrong (or even if something went wrong). b. need to find nearly all the bugs before selling the chip + recalls are expensive, and lower the value of the brand ~ need to keep bug-rate low enough that crashes can be blamed on Microsoft. ;) 2. Simulation-based methods a. Many kinds of simulation: * exercise the design: "easy" checks, finds simple bugs. * stress tests: corner cases, etc. * coverage tests: lots of random and biased random tests to look for unanticipated bugs. b. Multiple levels of simulation * Unit testing is the norm in hardware * The paper focuses on cluster simulation/verification, and * Below the cluster, mix of simulation (for easy cases) and formal methods (for coverage). * At the whole chip level, simulation is used. 3. Formal verification a. Based on symbolic simulation b. A theorem proving approach is used to combine symbolic simulation results into complete proof. c. Proofs are hard * This has led to an emphasis on + Proof engineering + Proof re-use * This works well in the Intel environment where successive microprocessors have very similar specifications and architectures. C. Verifying the i7 execute cluster 1. Scope a. Formal verification used instead of simulation for stress testing and coverage b. Formal verification included control state and bypassing logic. c. Formal methods set up as nightly regression checks. d. But, not all verification completed by tape-out * The expectation is that some bugs will be found post-Si * A bug escape may be OK if it doesn't block other tests. * Especially good if you know what it is and how to handle it by the time the chip is back from fabrication. * Of course, it's always better to find the bug before fabrication. 2. Some technical aspects a. Most instructions verified by symbolic simulation b. Interactions between the execution units complicated the verification * but this is probably a big reason *why* verification at the cluster level has a value beyond just verifying the individual units. c. Symbolic simulation handles most instructions * floating multiply and divide must be broken into smaller steps: + BDDs don't represent "product" efficiently + This is where theorem proving excels. * Some operations required case-splits and parametric simulations, in other words, multiple symbolic simulations to cover all of the cases while keeping each simulation tractable. * Special cases for the AGU because it involves indexing into tables of addresses for pending memory operations. d. Use of inductive invariants for the control path * complicated by clock-gating: + can't assume that all state variables are updated each cycle ? are they up-to-date when they are needed? 3. Success a. Fewest post-Si bugs of any cluster b. Value seen when verification replaces simulation * argue against verification supplementing simulation: - incurs costs of both - the extra cost of formal "only" has the return of the additional bugs that it finds. c. Shows value of moving formal verification to higher levels of the design hierarchy. 4. Limitations: five bugs made it past tape-out a. one because of an incomplete formal specification b. two because the verification was incomplete before tape-out c. one because the protocol didn't work as intended * Formal verification showed the hardware implemented the protocol * The chip-level outcome of using the protocol had not been verified. c. two more bugs because of re-use of specifications and designs from an earlier processor even though an architectural change had been intended. IV. Review and Preview A. Review 1. Summary of SLAM a. model checking finds real bugs in production code b. extensive use of abstraction. As much as possible, * ignore data values * focus on control flow c. to make this useful for real designs, the model checker must be integrated with the software development toolkit to be as "pushbutton" as possible. 2. Summary of i7 verification a. symbolic simulation finds real bugs in big chip designs b. As much as possible * focus on data values using symbolic simulation * capture control flow using invariants, manual effort c. to make this useful for real designs, * formal methods must *replace* entire categories of simulation * "proof engineering" needed to re-use the large manual effort required for specifications and proofs. B. Where are we? 1. Done with course intro a. Jan. 2: course overview and quick sketch of the topic b. Jan. 7: two examples of applications of formal verification to large, industrial designs. 2. Next: some formal verification basics a. symbolic booleans with OBBDs OBDDs are, perhaps, passé, but they are fairly easy to explain and they work well for many examples. b. equivalence checking c. semantics of programs: from source code to a mathematical representation d. model checking 3. Then, scaling this up to real-world applications a. SAT solvers b. SMT solvers c. abstraction, interpolation d. maybe some theorem proving 4. Finally, other domains a. robots and vehicles b. analog circuits c. other things involving real numbers C. Preview of the OBDD paper. 1. Key ideas: a. OBDDs represent boolean functions as "compressed" decision trees b. this representation is canonical, and works well with many operations that are commonly needed for boolean functions: AND, OR, EXISTS, FORALL, function composition, etc. c. while OBDDs work well in practice for many problems, they are not magical. There are problems (e.g. multiplication) for which OBBDs require exponential space. 2. Applications a. representing finite domains (Section 4) b. logic circuit verification (Section 5.1) c. error correction, logic optimization (Sections 5.1 & 5.2) d. model checking (Section 6) e. Note: the probablistic stuff (Section 5.4), and AI stuff (Section 7) seemed like cool ideas at the time, but I don't think they've really gone anywhere. Likewise, the intro promises applications of OBDDs to problems from mathematical logic, but the paper doesn't seem to give any examples. 3. Now, read the paper, and form your own evaluation, ask your own questions, etc. VI. Further references: A. Some papers about the i7 "A fully integrated multi-CPU, GPU and memory controller 32nm processor" Two-page paper about the Sandy Bridge family (IEEE Solid State Circuits Conf.) http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5746311 "Power-Management Architecture of the Intel Microarchitecture Code-Named Sandy Bridge" IEEE Micro paper (accessible to general audience) describing design decisions in the Sandy Bridge family. http://dx.doi.org/10.1109/MM.2012.12 http://hardware-review24.com/load/cpu/intel_sandy_bridge_microarchitecture/1-1-0-22 An on-the-web description of Sandy Bridge. Of the three references that I've found, this one has the block diagrams that can help you decipher the names of all the clusters and units described in the i7 verification paper.