CpSc 538G: Automatic Exploit Generation

September 12, 2016
  0.  Announcements:
    A.  Registration issues debugged -- you can't take three versions of
          CPSC 538 in the same year.  I think you need my signature to take
	  two.  Let me know if you need a form signed.  We've also created
	  a section of CPSC 513 (same class, old name).  You can sign-up
	  for that instead.
    B.  I've created a piazza class for CS 538G.
      1.  I'll start a thread for claiming papers you want to present.
      2.  Any and all other discussion is welcome.


  I.  Introduction
    A.  Problem statement
      1.  Programs have lots of bugs.
      2.  How can we identify those that are security concerns?
      3.  Aside: Why don't we just fix the bugs that we know about?
        a.  Sometimes the fix isn't simple, and the patch might introduce
	         more bugs
	   i.  Example: replace a fixed sized array with one from malloc.
	         Now we need to make sure that it is freed properly, or
		   the patch introduces a memory leak.
	  ii.  Example: what if the original algorithm isn't quite right
	         for the actual problem.  Replacing the existing algorithm
		 and data structures could require changes throughout the
		 code.
	 iii.  Example: a former UBC student who worked on the Space Shuttle
	         software told me of a known, divide-by-zero vulnerability in
		 the code.  It only occurred if the shuttle flew through the
		 centre of the earth.  The NASA people documented the "bug"
		 in case the software was re-used in a different context where
		 the origin of the coordinate system was somewhere that could
		 be on a flight path.
        b.  Sometimes we can't agree on whether or not something is a bug.

    B.  Basic approach:
      1.  Use "black-box fuzzing" to find bugs.
        a.  Run existing test cases.
	b.  Replace inputs with random and extreme values.
	c.  "Black box" means this is done without looking at the program
	         source code.
	   i.  Would "white box" be better?
	  ii.  White box lets the test generator see code constructs that
	         are likely to be vulnerable and generate test cases for
		 those.
	 iii.  Black box works on the machine code, and can pick up machine
	         specific issues (such as bugs due to memory layout) that
		 might not be apparent to white box testing.  Black box
		 works even if we don't have access to the original source
		 code.
      2.  For each bug found, attempt to generate a security exploit.
        a.  The paper focuses on buffer overflows that lead to
	      control hijacking
        b.  Command injection, format-string exploits, information leaks,
	      "return-to-libc" and other techniques for security exploits
	      are also mentioned.
      3.  The key innovations are finding ways to handle the path explosion
            problem and keeping the formulas tractable for the SMT solver.



 II.  Hijacking control a by exploiting a buffer overflow bug.
    A.  Memory layout
      1.  C doesn't perform array-bounds checks:
            There is code in use where inputs to the program cause data to
	    be written outside of the memory block allocated for the array.
      2.  The function call stack is implemented in memory as an array
	      that grows downward.
	a.  In other words, the callee has its stack
	      frame at a lower range of addresses than the caller.
        b.  At the interface between the caller's stack frame and the callee's
	      stack frame, the stack has locations for parameters, return
	      values, and the return address for the machine's program counter.
      3.  Local variables for a function are allocated onto the stack.
        a.  The bottom of the stack is moved down far enough to make room for
	      all of the variables.
        b.  Larger structures such as structs and arrays just take bigger
	      blocks of memory from the stack.
      4.  In C, an array is a pointer to the memory location for the value at
      	      index=0.  Arrays grow upward.
	a.  address(A[i]) = address(A[0]) + i*sizeof(element-of(A))
	b.  This means that if A is allocated in the callee's stack frame,
	      we can find an i so that A[i] refers to values of the caller's
	      stack frame, or the caller's caller's stack frame, or
	      *the return address for this function*, or ...

    B.  Basic approach to a buffer-overflow exploit.
      1.  Write data beyond the valid end-of-array for an array that is a
              local variable.
        a.  In particular, write a sequence of bytes that changes the
	      return-address for the function to a block of code that the
	      hacker wants executed.
        b.  The hacker can write a sequence of bytes into the array that
	      are the code the hacker wants executed.  This code from the
	      hacker is called "shellcode".
        c.  When the callee returns, it doesn't go back to the caller!
	      Instead, it jumps to the code inserted by the hacker.

    C.  Defenses against buffer-overflow exploits.
      1.  These exploits depend on knowing many details of the memory layout
            of the program when it is executed.
	      If the buffer is overwritten with a string provided as program
	    input, the hacker has to know all of these addresses *before*
	    the program is executed by the victim.
      2.  Some defenses:
        a.  Randomize the memory layout each time the program starts --
	      This is called "Address Space Layout Randomization" (ASLR)
	b.  Don't allow a page to be both writeable and executable this
	      is called "Data-Execution Prevention" (DEP) and also referred
	      to as W XOR X (a page can be writable, or have program
	      instructions, but not both).
	c.  Many others.
      3.  These defenses make security exploits harder, but not impossible.
        a.  References [29,31,33] describe how to make "hardened" exploits.
        b.  To make the paper a good introduction, the paper focuses on the
	      example of a simple buffer overflow.



III.  Exploit Generation
    A.  High-level approach
      1.  Find a program with bugs
        a.  Bug logs.
        b.  Fuzz testing (the paper uses black box).
      2.  Use symbolic execution to find executions that exploit these bugs
            to create security exploits.
      3.  That sound's easy, what's the problem:
        a.  Path explosion: a real program has lots of conditional branches.
	   i.  It's not practical to explore all branches.
	  ii.  How do we find promising paths.
        b.  Too many possible inputs:
	   i.  For example, a command line parameter can be a string of
	         arbitrary length -- what strings should we consider.
	  ii.  Even harder for parsers that handle various media formats,
	         document representations, etc.

    B.  Symbolic execution
      1.  Simple program: all instructions just update registers,
        a.  For example: add $i <- $j, $k
	b.  With symbolic execution,
	      start with symbolic variables for each register.
	      On an x86, we could call these eax_0, and so on.
	c.  The simulator has an array that has a symbolic expression for the
	      current value of each register.
	d.  To symbolically execute
	        add $i <- $j, $k
	      Read the symbolic expressions for $j and $k from the array.
	      Create a symbolic expression for $j + $k.
	      Store this expression in the array entry for $i.
	e.  At the end, we have a symbolic expression for each register.
	      If we want to know if we can end with $i = 42 (as an example)
	      we can add the constraint $i == 42, and ask if the formulas
	      are satisfiable.
      2.  Control-flow instructions: if-then-else, loops, etc.
        a.  In this paper, Mayhem constructs the symbolic formulas for a
	      particular path.
        b.  This means that a branch instructions, Mayhem adds a constraint
	      of the form "branch goes a particular way.
	      For example:
	        bleq L5, $i, $j  # go to L5 if $i <= $j
	      If we are looking at the path where the branch is taken, then
	      Mayhem reads the symbolic expressions for $i and $j, creates
	      the symbolic expression for $i <= $j, and adds this constraint
	      to what is often called "the path constraint"
        c.  At the end, we have a symbolic expression for each register *and*
	      a path constraint.  Now we can ask if there is an execution
	      along this path that ends with register $i holding the value 42
	      (or other constraints).
        d.  To get good coverage, we have to explore *lots* of paths.  This is
	      called the "path explosion" problem.  The paper talks about it.
        e.  An alternative is to consider the state of the program at each
	      execution point as a symbolic expression.  We'll examine this
	      when we look at "model-checking" next month.
      3.  Memory reads and writes, pointers
        a.  When the program reads from memory, can we figure out what symbolic
	      value is there?
	b.  We can if we can identify the last write to that location.
	c.  This involves showing that the address of the read and the write are
	      the same, ...
	d.  ... and that no write in between the two wrote to that address.
	e.  This is called "shape analysis".
	f.  The paper didn't go into details other than refering to [12].

    C.  Finding promising paths:
      1.  Pre-conditioned search
        a.  Given a bug, determine a possible exploit.
	   i.  For example, if there is a buffer-overflow bug
	  ii.  What would be needed to modify the return-address for the
	           current stack frame?
	 iii.  What would be needed to place the "shellcode" at the
	           hacked return-address?
        b.  Find simple conditions that must hold if the exploit is to occur.
	   i.  For example, the data that causes the overflow must cause a
	         big enough overflow to overwrite the return address.  This
		 imposes a minimum length on the corresponding input string
		 (for example).
	  ii.  There might be other constraints on the data.  For example,
	         if the hacker is exploiting the lack of bounds checks in
		 strcpy, then we require the data to be copied to be a
		 valid string.  In other words, it can't contain any
		 null bytes.
	 iii.  Note that satisfying these conditions doesn't guarantee that
	         there will be an exploit.  We just know that any exploit
		 must satisfy these conditions.
        c.  Use these conditions to reduce the set of inputs that need to
	      be considered when looking for an exploit.
      2.  Buggy path first
        a.  We know of one or more bugs in the program.
	b.  Execute code along the paths with the bug.
	c.  Observation: a bug often indicates an incorrect understanding
	      by the programmer.  If we continue along a path that includes
	      the bug; then, we're looking at code where the programmer had
	      faulty assumptions.  These are likely places for security
	      exploits.

    D.  Keeping the SMT problems tractable.
      1.  Mentioned that they started this work viewing the SMT solver as
              a magical oracle that answers logical queries.
	a.  That is a good place to start -- it helps you think about your
	        problem as a logic problem.
	b.  BUT, SMT solvers aren't magical.  You'll need to think about
	      how they work if you're going to solve large-scale problem.s
	c.  Another way to say this:
	  *  If you can't solve simple examples treating the SMT solver as a
	       magical oracle, then you probably won't be able to handle big
	       examples no matter what you do.
	  *  Once you've shown that the SMT-based approach is promising,
	       you'll almost certainly end up dropping down into thinking
	       about what the SMT solver is doing when you try to scale your
	       prototype up to solving larger problems.
      2.  Example: memory references
        a.  If a pointer can point to anything, then the SMT solver faces a
	      combinatorial explosion of possibilities.
        b.  There's no solution that works for all problems (a great
	      research topic)!
        c.  Mayhem models writes as being to concrete addresses.
	d.  Read addresses may be symbolic.
	   i.  Mayhem uses the SMT solver as a constraint solver to find
	         lower and upper bounds for the symbolic read address.
		 It does this with binary search using the SMT solver's
		 ability to reason about linear inequalities to test the
		 bounds.
	  ii.  Mayhem then models the read as a symbolic read to this
	         smaller segment of memory.
	e.  For more details, see [12].

 IV.  Evaluation
    A.  The paper describes how this approach finds a security exploit in
            acpi_listen
      1.  They stress that they chose this example because:
        a.  It doesn't expose a likely security hole (acpi_listen isn't
	      on the "attack surface").
        b.  It has been fixed:
	      But that doesn't mean everyone has installed the patch.
        c.  It is similar to real bugs that lead to real exploits.
      2.  They state that they have used Mayhem to find zero-day
            security exploits in ezstream and imview (media applications)
	    and latd and ndtpd (network applications).
      3.  They state that their approach can be extended to other kinds of
              exploits:
	a.  Exploits that work even if the OS won't let you execute from a
		writeable page:
	  * return-to-libc
	  * return-oriented-programming
	b.  Command-injection
	c.  Others.

    B.  A few comments of my own
      1.  The paper does a great job of citing related work.  I found myself
            spending quite a bit of time looking up references to find out
	    more about various topics from the paper.

      2.  On the other hand, the paper doesn't provide a quantitative
              comparison of their approach with the alternatives from similar
	      research
	a.  For example, the give the intuition behind "buggy path first".
	      They also mention alternatives:  depth first, breadth first,
	        randomized, and generational search.
	      Do they have data to validate their choice of bugs-first?
	      Is there a clear winner, or does it depend on properties
	        of the program being analysed.
	b.  Likewise for some of the other design choices.

      3.  On the other hand, the paper does do a good job of pointing out
            that their approach works best for paths that encounter the
	    exploit fairly early in the execution (e.g. while processing
	    command line arguments).  Deeper exploits are harder to find
	    by their methods because the code has to go through many
	    branches and the path explosion problem becomes more severe.

      4.  Overall, I found that the paper was a good introduction that made
            me more interested in the area than I had been before I read it.
	    It left me with lots of questions.  I think that's good.  Most
	    of my questions seem like they are beyond the scope of the paper
	    given the trade-offs the authors need to make to keep the paper
	    short, focused, and accessible to a general computer science
	    audience.

      5.  If we get this far, ask me about Rice's theorem and what it says
            about formal verification.