CpSc 538G: The DPLL algorithm

September 19, 2016

Based on "The Handbook of Satisfiability", (A. Biere, M. Heule, H. van Maaren,
and T. Walsh, Eds), Chapter 3: "Complete Algorithms", A. Darwiche and
K. Pipatsrisawat.  Sections 3.5-3.7

  I.  Basic DPLL: Decisions and Unit-Propagation
    A.  Given a Boolean formula (in CNF), find a satisfying assignment
    B.  Problem formulation
      1.  View the problem as searching a tree
        a.  Vertices are labeled with variables
        b.  edges are labeled "true" or "false" according to the choice
              for that variable
        c.  leaves either satisfy the formula or don't.
        d.  Note: there are many possible trees:
              for each subtree, we just need to pick a variable that isn't
              already assigned.
      2.  A formula is satisfiable iff the decision tree has a satisfying leaf
        a.  The path to the leaf gives such an assignment
        b.  Showing that all leaves fail to satisfy the formula is a
              refutation.
        c.  The number of leaves is 2^n where n is the number of variables.
    C.  Unit propagation
      1.  With each decision
        a.  Eliminate clauses that are now satisfied
        b.  Reduce the number of disjuncts in clauses that 
              have a false literal due to the decision.
      2.  If this produces a clause that has one literal
        a.  Such a clause is called a "unit clause"
        b.  We can make a decision for that variable right away --
              the other choice falsifies the formula.
        c.  We can do this for each unit clause produced by the decision
        d.  These choice can produce more unit clauses, and allow
              choices to be made for more variables.
      3.  If this produces an empty clause
        a.  The clause is unsatisfiable
        b.  Back-up to the most recent decision where there's a choice
             remaining
        c.  Make the other choice, and continue
      4.  If all clauses are eliminated (i.e. satisfied), then we've
            found a satisfying assignment.
    D.  Observations
      1.  DPLL doesn't have to go all the way to the leaves when exploring
              assignments
        a.  Unit propagation can refute a subtree or find an assignment by
              exploring a single path in that subtree.
        b.  Note that a satisfying assignment can be found without making
              assignments to all variables.  In this case, the remaining
              variables are "don't cares" -- they can be set arbitrarily
              to true or false and still produce a satisfying assignment.
        c.  Likewise, a subtree can be refuted without making assignments
              to all variables.
      2.  These optimizations make DPLL useful in practice
        a.  However, it is still, worst-case exponential time.


 II.  Key improvements: conflict-clauses and non-chronological backtracking
    A.  Implication graphs
      1.  Track the unit-propagation
      2.  This is a valuable data structure for analysing conflicts
      3.  Details:
        a.  Vertices are labelled with a tuple: (level, assignment)
              * An assignment is a variable and a value
              * If an assignment is a decision, its level is the number
                  of decisions that were made prior to this one.
              * If an assignment is an inference from unit propagation,
                  its level is the level for the last decision that 
                  led to this inference.
              * Special case: a label of {} means that a conflict,
                  i.e. empty clause, was inferred.
        b.  Edges are the inferences, they are labelled with clause
              instances.
    B.  conflict clauses
      1.  Conflict analysis.  When a clause becomes unsatisfiable,
        a.  The implication graph shows how the decisions led to this
              conflict.
        b.  Any cut in the graph that separates the decisions from the
              conflict yields a clause that excludes this conflict.
      2.  We want to find a small cut -- fewer literals in the learned
              clause make the learned clause more general.
      3.  In practice, we choose exactly one literal from the last
            level.
      4.  Add this inferred clause to the clauses of the problem
        a.  Any satisfying assignment must satisfy this clause.
        b.  Thus, we haven't changed the set of assignments that
              satisfy the formula.
        c.  But, the new clause allows the SAT solver to avoid repeating
              variations of this conflict.
    C.  non-chronological backtracking
      1.  What decisions should we undo?
        a.  Original DPLL goes back to the most recent decision.
              * This can result in lots of exploration of the same conflict
              * We'll do the example from the book.
        b.  Key observation:
              * The learned clause forces the choice of the most recent
                  decision, once the previous decision (in the learned clause)
                  is made.
      2.  Asserting level
        a.  Each literal in a learned clause has an associated level.
        b.  The highest level is the last decision, the one that led to the
              conflict.
        c.  The second highest level is called the "asserting level"
        d.  Once all of the decisions have been made up to the asserting level,
              the learned clause forces a unit-propagation for the variable at
              the last level.
      3.  With non-chronological backtracking, the solver backtracks to the
              asserting level.
        a.  Effectively, it discards all of the "irrelevant" decisions that
              it made between the asserting level and the actual conflict.
        b.  The solver retains the decision at the asserting level, adds
              the new clause, and continues with unit-propagation.

III.  More refinements
    A.  Asserting clauses
      1.  non-chronological backtracking exploits unit-propagation from
            the asserting level.
        a.  this means that once we've made decision up to and including
              the asserting level, the final decision should be implied
              by unit propagation.
        b.  this requires that we have exactly one variable in the conflict
              clause from the level at which the conflict was detected.
      2.  Now we see why the SAT solver makes this restriction on cuts.
        a.  The restriction is always satisfiable.
        b.  The decision variable at the last decision level suffices.
    B.  Unique implication points
      1.  There may be more than one variable at the last level that
              can be chosen as for the cut.
        a.  Each such variable is called a "unique implication point" (UIP)
        b.  Note that there can be more than one *unique* implication point
        c.  UIPs can be ordered, because all paths from the last decision
              to the conflict go through all UIPs, and in the same order.
        d.  Heuristically, picking the UIP closest to the conflict seems
              best.
      2.  On the other hand, picking the UIP closest to the conflict at
            other levels seems to degrade performance.
    C.  Deleting conflict driven clauses
      1.  Even with all these optimizations, SAT can still be hard
        a.  This means that the solver learns a lot of conflict clauses
        b.  If a problem has n variables, the solver must learn a new clause
              at least once for every n decisions.
        c.  The performance of the solver can be degraded by having too many
                clauses that aren't very useful
              * extra memory
              * extra work to maintain data structures
      2.  Solution: delete some clauses
        a.  subsumed clauses: clause A subsumes clause 

    D.  Restarts

 IV.  A few more observations
    A.  2SAT
    B.  Discovering equivalences
    C.  Exploiting symmetry
    D.  Certifying results