Probabilistic Programming

Homework # 5

Learning Goal

Learn how to evaluate a “higher-order” language with first class functions.

Task

In this homework you will implement an interpreter for the HOPPL minus the observe case (i.e. an evaluator/interpreter with no conditioning). This homework is in the same spirit as HW 2 where the task is to simply write an evaluator that samples from the prior on return values. And in this case you can only write an evaluator. However, unlike in HW 2 this evaluator will have to deal with first class functions. In essence you should be able to start from and use a large amount of the code you wrote in developing your FOPPL evaluating interpreter, however, the base cases you need to handle will change. In particular you will need to implement a fn case and your procedure application code will have to change to accommodate “compound procedures.”

Note that, unlike in HW 2, pseudo-code for such an evaluator is not provided in the book (the book instead advocates doing a continuation passing style transformation and intercepting calls to observe and sample). While the CPS transformation is amazing and allows for implementation of probabilistic programming inference in languages without writing a custom compiler/interpreter, it is ultimately probably too large of an ask for a course assignment. Plus you don’t get to learn about interpretation of higher order languages which, trust us, you only get from actually implementing an interpreter once (or twice).

So you are recommended to either or both read the relevant sections of SICP or consume an amazing online resource that beautifully describes the evaluation of a language that is very, very similar to the HOPPL (hint, hint). In fact, let me go further. You should “copy” (by suitably adapting and merging its key aspects into your FOPPL evaluator) Peter Norvig’s Lispy and spend your time understanding it.

Like in HW 2 Daphne will output a parsed and desugared form of the following programs for you. Note that the desugaring of HOPPL programs results in an IR that is substantially harder to read than the HW 2 desugared programs. This is principally because let bindings disappear and addresses are introduced.

In particular the address transformation from Chapter 6.2 of the book will be applied by Daphne for you, which is to say that there will be an extra argument (the first argument) passed to every function; this will be an address. sample and observe will also now also get such an address as an extra, first, argument. Further, the program will, as described, be wrapped in a fn which expects an address string prefix as the first argument. The value of this address prefix can be set in an outer scope to whatever value you like.

For example

(defn fib [x]
  (if (<= x 1) 1
      (+ (fib (- x 1)) (fib (- x 2)))))
(fib 5)

desugars to

["fn", ["alpha"],
 [["fn", ["alpha", "fib"],
   ["fib", ["push-address", "alpha", "addr1"], 5]],
  ["push-address", "alpha", "addr0"],
  ["fn", ["alpha", "x"],
   [["fn", ["alpha", "x", "fib"],
     ["if", ["<=", ["push-address", "alpha", "addr3"], "x", 1], 1,
      ["+", ["push-address", "alpha", "addr4"],
       ["fib", ["push-address", "alpha", "addr5"],
        ["-", ["push-address", "alpha", "addr6"], "x", 1], "fib"],
       ["fib", ["push-address", "alpha", "addr7"],
        ["-", ["push-address", "alpha", "addr8"], "x", 2], "fib"]]]],
    ["push-address", "alpha", "addr2"], "x",
    ["fn", ["alpha", "x", "fib"],
     ["if", ["<=", ["push-address", "alpha", "addr9"], "x", 1], 1,
      ["+", ["push-address", "alpha", "addr10"],
       ["fib", ["push-address", "alpha", "addr11"],
        ["-", ["push-address", "alpha", "addr12"], "x", 1], "fib"],
       ["fib", ["push-address", "alpha", "addr13"],
        ["-", ["push-address", "alpha", "addr14"], "x", 2],
        "fib"]]]]]]]]

Careful readers of this desugaring will note the use of the Y Combinator in generating the recursion with a letrec or equivalent special form.

In order to accommodate push-address a primitive that implements its functionality is provided in the support code.

Also, like in HW 3, the support code provides a number of tests that you should employ and expand upon to do test-driven development.

Important: in this homework you must use pure datastructure primitives in order to build towards the next homework efficiently. You will suffer a serious reduction in marks if you use regular Python datastructures.

Rubric

Submit a .pdf file to gradescope.ca showing your answers for each section. As in previous homeworks copy the provided wandb report and replace the plotted runs with your own.

Convergence of prior expectations against wall-clock time will be assessed for the short HOPPL programs below. Prior expectations (and or variances where applicable) that are within 5% of ground truth within 30 minutes of wall-clock runtime will be given 10 points each. Ten points will also be awarded for passing all tests (“Program” 1).

Provide histogram plots of the distributions obtained for each return value of each program.

Program 1)

Report successful in passing all of the deterministic tests in the HW5 support code.

Program 2)

A geometrically distributed random quantity. Effectively a very small rejection sampler.

(defn until-success [p n]
   (if (sample (flip p))
     n
     (until-success p (+ n 1))))

(let [p 0.01]
  (until-success p 0))     

Program 3)

A normally distributed random quantity, via transformation and rejection. Take a little time to think about the sampled values of x and y and be amazed that this works. Think a little about how to deal with this kind of case in amortized inference settings.

(defn marsaglia-normal [mean var]
   (let [d (uniform-continuous -1.0 1.0)
         x (sample d)
         y (sample d)
         s (+ (* x x ) (* y y ))]
    (if (< s 1)
        (+ mean (* (sqrt var)
                   (* x (sqrt (* -2 (/ ( log s) s))))))
        (marsaglia-normal mean var))))

(let [mu (marsaglia-normal 1 5)
      sigma (sqrt 2)
      lik (normal mu sigma)]
  (observe lik 8)
  (observe lik 9)
  mu)

Program 4)

It’s an HMM again, this time implemented generically. Take the time to read this source code to see how this works. When you get this working you can be very proud. You will be most of the way towards a very powerful HOPPL language implementation.

(defn reduce [f x values]
               (if (empty? values)
                  x
                  (reduce f (f x (first values)) (rest values))))

(let [observations [0.9 0.8 0.7 0.0 -0.025 -5.0 -2.0 -0.1 0.0 0.13 0.45 6 0.2 0.3 -1 -1]
      init-dist (discrete [1.0 1.0 1.0])
      trans-dists {0 (discrete [0.1 0.5 0.4])
                   1 (discrete [0.2 0.2 0.6])
                   2 (discrete [0.15 0.15 0.7])}
      obs-dists {0 (normal -1 1)
                 1 (normal 1 1)
                 2 (normal 0 1)}]
      (reduce
        (fn [states obs]
          (let [state (sample (get trans-dists
                                   (peek states)))]
            (observe (get obs-dists state) obs)
            (conj states state)))
        [(sample init-dist)]
        observations))

Infrastructure

Support Code

Additionally, helpful Python scaffolding is provided, highlighting in particular primitives implemented in terms of PyTorch. Reminder: in this homework you must use pure datastructure primitives in order to build towards the next homework efficiently. You will suffer a serious reduction in marks if you use regular Python datastructures.

CS532-HW5.zip