CpSc 538G: Automatic Exploit Generation

September 12, 2016
Paper:   “Automatic Exploit Generation”, T. Avgerinos, et. al.
Date:    September 12, 2016
My name: Mark Greenstreet

1. What problem does the paper address?
     Large software projects (e.g. linux distros) have thousands of known
     bugs.  We need a way to identify those bugs that are most vulnerable
     for creating security exploits.

2. What is the key insight/idea in the paper’s solution to this problem?
     The paper uses symbolic execution to find exploit-examples.  They
     introduce several techniques such as "preconditioned symbolic execution"
     and "buggy path first" to try to focus the search on traces that are
     most likely to have exploits.

3. What did the authors do to demonstrate their claims? (e.g. implement a tool,
   present a proof, run simulations, etc.)
     They have implemented an open source tool, Mayhem.  They described it's
     operation showing how it found an exploit in the acpi_listen utility.
     They stated that they've found exploitable bugs in several applications
     for streaming media and for networking.

4. Is the support for the claims convincing? Why or why not?
     The authors have used Mayhem to find zero-day exploits in code that is
     in widespread use.  That's pretty compelling.  OTOH, I would have liked
     to have seen a comparison that gave a more compelling justification for
     their design choices.  For example, they use "buggy path first" -- can
     they compare this with randomized path selection in KLEE [0] or
     "generational search" in [20]?

5. What are your questions or other comments about the paper?
     I like the paper as an intro to exploit generation, and the role of
     formal analysis techniques for generating exploits.  The paper is
     definitely aimed at a general, computer science audience, and left
     many slightly deeper "But what about...?" questions unanswered.  On
     the other hand, the references are *very* good.  I enjoyed skimming
     4 or 5 of the references when I was curious about "return-oriented
     programming" [33], reading about modeling the x86 shl instruction
     in SAGE [4], etc.