CpSc 538G: Automatic Exploit Generation
September 12, 2016
0. Announcements:
A. Registration issues debugged -- you can't take three versions of
CPSC 538 in the same year. I think you need my signature to take
two. Let me know if you need a form signed. We've also created
a section of CPSC 513 (same class, old name). You can sign-up
for that instead.
B. I've created a piazza class for CS 538G.
1. I'll start a thread for claiming papers you want to present.
2. Any and all other discussion is welcome.
I. Introduction
A. Problem statement
1. Programs have lots of bugs.
2. How can we identify those that are security concerns?
3. Aside: Why don't we just fix the bugs that we know about?
a. Sometimes the fix isn't simple, and the patch might introduce
more bugs
i. Example: replace a fixed sized array with one from malloc.
Now we need to make sure that it is freed properly, or
the patch introduces a memory leak.
ii. Example: what if the original algorithm isn't quite right
for the actual problem. Replacing the existing algorithm
and data structures could require changes throughout the
code.
iii. Example: a former UBC student who worked on the Space Shuttle
software told me of a known, divide-by-zero vulnerability in
the code. It only occurred if the shuttle flew through the
centre of the earth. The NASA people documented the "bug"
in case the software was re-used in a different context where
the origin of the coordinate system was somewhere that could
be on a flight path.
b. Sometimes we can't agree on whether or not something is a bug.
B. Basic approach:
1. Use "black-box fuzzing" to find bugs.
a. Run existing test cases.
b. Replace inputs with random and extreme values.
c. "Black box" means this is done without looking at the program
source code.
i. Would "white box" be better?
ii. White box lets the test generator see code constructs that
are likely to be vulnerable and generate test cases for
those.
iii. Black box works on the machine code, and can pick up machine
specific issues (such as bugs due to memory layout) that
might not be apparent to white box testing. Black box
works even if we don't have access to the original source
code.
2. For each bug found, attempt to generate a security exploit.
a. The paper focuses on buffer overflows that lead to
control hijacking
b. Command injection, format-string exploits, information leaks,
"return-to-libc" and other techniques for security exploits
are also mentioned.
3. The key innovations are finding ways to handle the path explosion
problem and keeping the formulas tractable for the SMT solver.
II. Hijacking control a by exploiting a buffer overflow bug.
A. Memory layout
1. C doesn't perform array-bounds checks:
There is code in use where inputs to the program cause data to
be written outside of the memory block allocated for the array.
2. The function call stack is implemented in memory as an array
that grows downward.
a. In other words, the callee has its stack
frame at a lower range of addresses than the caller.
b. At the interface between the caller's stack frame and the callee's
stack frame, the stack has locations for parameters, return
values, and the return address for the machine's program counter.
3. Local variables for a function are allocated onto the stack.
a. The bottom of the stack is moved down far enough to make room for
all of the variables.
b. Larger structures such as structs and arrays just take bigger
blocks of memory from the stack.
4. In C, an array is a pointer to the memory location for the value at
index=0. Arrays grow upward.
a. address(A[i]) = address(A[0]) + i*sizeof(element-of(A))
b. This means that if A is allocated in the callee's stack frame,
we can find an i so that A[i] refers to values of the caller's
stack frame, or the caller's caller's stack frame, or
*the return address for this function*, or ...
B. Basic approach to a buffer-overflow exploit.
1. Write data beyond the valid end-of-array for an array that is a
local variable.
a. In particular, write a sequence of bytes that changes the
return-address for the function to a block of code that the
hacker wants executed.
b. The hacker can write a sequence of bytes into the array that
are the code the hacker wants executed. This code from the
hacker is called "shellcode".
c. When the callee returns, it doesn't go back to the caller!
Instead, it jumps to the code inserted by the hacker.
C. Defenses against buffer-overflow exploits.
1. These exploits depend on knowing many details of the memory layout
of the program when it is executed.
If the buffer is overwritten with a string provided as program
input, the hacker has to know all of these addresses *before*
the program is executed by the victim.
2. Some defenses:
a. Randomize the memory layout each time the program starts --
This is called "Address Space Layout Randomization" (ASLR)
b. Don't allow a page to be both writeable and executable this
is called "Data-Execution Prevention" (DEP) and also referred
to as W XOR X (a page can be writable, or have program
instructions, but not both).
c. Many others.
3. These defenses make security exploits harder, but not impossible.
a. References [29,31,33] describe how to make "hardened" exploits.
b. To make the paper a good introduction, the paper focuses on the
example of a simple buffer overflow.
III. Exploit Generation
A. High-level approach
1. Find a program with bugs
a. Bug logs.
b. Fuzz testing (the paper uses black box).
2. Use symbolic execution to find executions that exploit these bugs
to create security exploits.
3. That sound's easy, what's the problem:
a. Path explosion: a real program has lots of conditional branches.
i. It's not practical to explore all branches.
ii. How do we find promising paths.
b. Too many possible inputs:
i. For example, a command line parameter can be a string of
arbitrary length -- what strings should we consider.
ii. Even harder for parsers that handle various media formats,
document representations, etc.
B. Symbolic execution
1. Simple program: all instructions just update registers,
a. For example: add $i <- $j, $k
b. With symbolic execution,
start with symbolic variables for each register.
On an x86, we could call these eax_0, and so on.
c. The simulator has an array that has a symbolic expression for the
current value of each register.
d. To symbolically execute
add $i <- $j, $k
Read the symbolic expressions for $j and $k from the array.
Create a symbolic expression for $j + $k.
Store this expression in the array entry for $i.
e. At the end, we have a symbolic expression for each register.
If we want to know if we can end with $i = 42 (as an example)
we can add the constraint $i == 42, and ask if the formulas
are satisfiable.
2. Control-flow instructions: if-then-else, loops, etc.
a. In this paper, Mayhem constructs the symbolic formulas for a
particular path.
b. This means that a branch instructions, Mayhem adds a constraint
of the form "branch goes a particular way.
For example:
bleq L5, $i, $j # go to L5 if $i <= $j
If we are looking at the path where the branch is taken, then
Mayhem reads the symbolic expressions for $i and $j, creates
the symbolic expression for $i <= $j, and adds this constraint
to what is often called "the path constraint"
c. At the end, we have a symbolic expression for each register *and*
a path constraint. Now we can ask if there is an execution
along this path that ends with register $i holding the value 42
(or other constraints).
d. To get good coverage, we have to explore *lots* of paths. This is
called the "path explosion" problem. The paper talks about it.
e. An alternative is to consider the state of the program at each
execution point as a symbolic expression. We'll examine this
when we look at "model-checking" next month.
3. Memory reads and writes, pointers
a. When the program reads from memory, can we figure out what symbolic
value is there?
b. We can if we can identify the last write to that location.
c. This involves showing that the address of the read and the write are
the same, ...
d. ... and that no write in between the two wrote to that address.
e. This is called "shape analysis".
f. The paper didn't go into details other than refering to [12].
C. Finding promising paths:
1. Pre-conditioned search
a. Given a bug, determine a possible exploit.
i. For example, if there is a buffer-overflow bug
ii. What would be needed to modify the return-address for the
current stack frame?
iii. What would be needed to place the "shellcode" at the
hacked return-address?
b. Find simple conditions that must hold if the exploit is to occur.
i. For example, the data that causes the overflow must cause a
big enough overflow to overwrite the return address. This
imposes a minimum length on the corresponding input string
(for example).
ii. There might be other constraints on the data. For example,
if the hacker is exploiting the lack of bounds checks in
strcpy, then we require the data to be copied to be a
valid string. In other words, it can't contain any
null bytes.
iii. Note that satisfying these conditions doesn't guarantee that
there will be an exploit. We just know that any exploit
must satisfy these conditions.
c. Use these conditions to reduce the set of inputs that need to
be considered when looking for an exploit.
2. Buggy path first
a. We know of one or more bugs in the program.
b. Execute code along the paths with the bug.
c. Observation: a bug often indicates an incorrect understanding
by the programmer. If we continue along a path that includes
the bug; then, we're looking at code where the programmer had
faulty assumptions. These are likely places for security
exploits.
D. Keeping the SMT problems tractable.
1. Mentioned that they started this work viewing the SMT solver as
a magical oracle that answers logical queries.
a. That is a good place to start -- it helps you think about your
problem as a logic problem.
b. BUT, SMT solvers aren't magical. You'll need to think about
how they work if you're going to solve large-scale problem.s
c. Another way to say this:
* If you can't solve simple examples treating the SMT solver as a
magical oracle, then you probably won't be able to handle big
examples no matter what you do.
* Once you've shown that the SMT-based approach is promising,
you'll almost certainly end up dropping down into thinking
about what the SMT solver is doing when you try to scale your
prototype up to solving larger problems.
2. Example: memory references
a. If a pointer can point to anything, then the SMT solver faces a
combinatorial explosion of possibilities.
b. There's no solution that works for all problems (a great
research topic)!
c. Mayhem models writes as being to concrete addresses.
d. Read addresses may be symbolic.
i. Mayhem uses the SMT solver as a constraint solver to find
lower and upper bounds for the symbolic read address.
It does this with binary search using the SMT solver's
ability to reason about linear inequalities to test the
bounds.
ii. Mayhem then models the read as a symbolic read to this
smaller segment of memory.
e. For more details, see [12].
IV. Evaluation
A. The paper describes how this approach finds a security exploit in
acpi_listen
1. They stress that they chose this example because:
a. It doesn't expose a likely security hole (acpi_listen isn't
on the "attack surface").
b. It has been fixed:
But that doesn't mean everyone has installed the patch.
c. It is similar to real bugs that lead to real exploits.
2. They state that they have used Mayhem to find zero-day
security exploits in ezstream and imview (media applications)
and latd and ndtpd (network applications).
3. They state that their approach can be extended to other kinds of
exploits:
a. Exploits that work even if the OS won't let you execute from a
writeable page:
* return-to-libc
* return-oriented-programming
b. Command-injection
c. Others.
B. A few comments of my own
1. The paper does a great job of citing related work. I found myself
spending quite a bit of time looking up references to find out
more about various topics from the paper.
2. On the other hand, the paper doesn't provide a quantitative
comparison of their approach with the alternatives from similar
research
a. For example, the give the intuition behind "buggy path first".
They also mention alternatives: depth first, breadth first,
randomized, and generational search.
Do they have data to validate their choice of bugs-first?
Is there a clear winner, or does it depend on properties
of the program being analysed.
b. Likewise for some of the other design choices.
3. On the other hand, the paper does do a good job of pointing out
that their approach works best for paths that encounter the
exploit fairly early in the execution (e.g. while processing
command line arguments). Deeper exploits are harder to find
by their methods because the code has to go through many
branches and the path explosion problem becomes more severe.
4. Overall, I found that the paper was a good introduction that made
me more interested in the area than I had been before I read it.
It left me with lots of questions. I think that's good. Most
of my questions seem like they are beyond the scope of the paper
given the trade-offs the authors need to make to keep the paper
short, focused, and accessible to a general computer science
audience.
5. If we get this far, ask me about Rice's theorem and what it says
about formal verification.