Probabilistic Programming

Homework # 2

Learning Goal

Learn how to write the first part of an interpreter for the FOPPL. In particular learn how to manipulate Daphne compiler abstract syntax tree and graphical model outputs to sample from the prior, i.e. no conditioning (yet).

Task

Write two different FOPPL evaluators, one that evaluates the FOPPL as described in lecture and in the book (Alg. 6), and one that evaluates the FOPPL program via ancestral sampling in the graphical model. The task is to sample from the prior distribution denoted by the program. Note that this means that observe statements should be treated differently. In particular they can be transformed to sample statements, ideally in a pre-interpretation transformation pass. Or they can be ignored. Also, clearly, the observations themselves, $Y$, should be ignored.

Hint 1: you should probably study and complete the pure deterministic language evaluator first as it will be required to evaluate the link expressions when you interpret the program via ancestral sampling in the graphical model. Test based development is the way to go here.
Hint 2: sampling from the prior in the probabilistic programming context means generating samples of the return value.
Hint 3: writing your own suite of unit tests, starting with simple fully deterministic programs, then proceeding to models with fixed and known variables names is a good way to test both correctness and coverage. In other words you may wish to both write tests that do only deterministic computation in the FOPPL and also to write directly in the JSON language

1)

The referenced (below) support code contains a large (but not necessarily complete) set of tests along with scaffolding to run those tests (and more) (see run_deterministic_tests and run_probabilistic_tests). The first task is to ensure that both of your evaluators pass all of the tests. We strongly suggest that you expand the set of tests to encompass an increasing amount of functionality, with an eye towards matrix operations in particular.

2)

This program is a standard unit test for probabilistic programming systems, often referred to as a Gaussian unknown mean problem.

(let [mu (sample (normal 1 (sqrt 5)))
           sigma (sqrt 2)
           lik (normal mu sigma)]
       (observe lik 8)
       (observe lik 9)
       mu)

3)

This program specifies a univariate Bayesian linear regression problem.

(defn observe-data [_ data slope bias]
        (let [xn (first data)
              yn (second data)
              zn (+ (* slope xn) bias)]
          (observe (normal zn 1.0) yn)
          (rest (rest data))))
(let [slope (sample (normal 0.0 10.0))
             bias  (sample (normal 0.0 10.0))
             data (vector 1.0 2.1 2.0 3.9 3.0 5.3
                          4.0 7.7 5.0 10.2 6.0 12.9)]
   (loop 6 data observe-data slope bias)
   (vector slope bias))

4)

This program encodes a hidden Markov model.

(defn hmm-step [t states data trans-dists likes]
      (let [z (sample (get trans-dists
                           (last states)))]
        (observe (get likes z)
                 (get data t))
        (append states z)))
(let [data [0.9 0.8 0.7 0.0 -0.025 -5.0 -2.0 -0.1
            0.0 0.13 0.45 6 0.2 0.3 -1 -1]
      trans-dists [(discrete [0.10 0.50 0.40])
                   (discrete [0.20 0.20 0.60])
                   (discrete [0.15 0.15 0.70])]
      likes [(normal -1.0 1.0)
             (normal 1.0 1.0)
             (normal 0.0 1.0)]
      states [(sample (discrete [0.33 0.33 0.34]))]]
  (loop 16 states hmm-step data trans-dists likes))

5)

This program corresponds to Bayesian neural network learning.

(let [weight-prior (normal 0 1)
      W_0 (foreach 10 []
            (foreach 1 [] (sample weight-prior)))
      W_1 (foreach 10 []
            (foreach 10 [] (sample weight-prior)))
      W_2 (foreach 1 []
            (foreach 10 [] (sample weight-prior)))

      b_0 (foreach 10 []
            (foreach 1 [] (sample weight-prior)))
      b_1 (foreach 10 []
            (foreach 1 [] (sample weight-prior)))
      b_2 (foreach 1 []
            (foreach 1 [] (sample weight-prior)))

      x   (mat-transpose [[1] [2] [3] [4] [5]])
      y   [[1] [4] [9] [16] [25]]
      h_0 (mat-tanh (mat-add (mat-mul W_0 x)
                             (mat-repmat b_0 1 5)))
      h_1 (mat-tanh (mat-add (mat-mul W_1 h_0)
                             (mat-repmat b_1 1 5)))
      mu  (mat-transpose
            (mat-tanh (mat-add (mat-mul W_2 h_1)
                               (mat-repmat b_2 1 5))))]
(foreach 5 [y_r y
            mu_r mu]
   (foreach 1 [y_rc y_r
               mu_rc mu_r]
      (observe (normal mu_rc 1) y_rc)))
[W_0 b_0 W_1 b_1])

Rubric

Submit a .pdf file to gradescope.ca showing your answers for each section. Provide code snippets that document critical aspects of your implementation sufficient to allow us to quickly determine whether or not you individually completed the assignment. Draw 1000 samples for each of the task programs and report marginal expectations for all return value dimensions, labeled appropriately. Include histogram plots of the resulting per-dimension sample marginal posterior distribution(s) as well. This homework is worth 50 points. Each program that is within tolerance is worth 10 points. Demonstrably passing all unit tests is worth 10 points. Include a link to a publicly accessible github repo that allows us to review all of your code.

Infrastructure

Daphne

We are providing you with a compiler called Daphne that implements the desugaring and FOPPL compilation rules discussed in class and described in the book. Daphne transforms FOPPL programs into JSON datastructures.

These datastructures correspond to two different translations of a FOPPL program 1) An abstract syntax tree of the desugared FOPPL program 2) A graphical model datastructure that results from the compilation process described in the introduction to probabilistic programming book

Clone Daphne from git@github.com:plai-group/daphne.git

Installation

To use Daphne you will need to have both a JVM installed and Leiningen installed.

Python

Your solution must be in Python. Helpful Python scaffolding is provided, highlighting in particular primitives implemented in terms of PyTorch.

CS532-HW2.zip

The set of distributions to recognize and support must include normal, uniform-continuous, beta, and bernoulli but recognizing that this list may grow may help you in future assignments.

vector, hashmap, and list primitive operations must be supported as well. This means that first, rest, last, nth, conj, cons must be implemented as described in the book.

Primitive mathematical operations must be supported as well, such as +, -, etc.

Many of these primitives are provided for you in the Python support code.

Extras

What follows is an example list of the kinds of programs that your interpreter should support. You may wish to establish them as unit tests in your implementation framework.

(let [x (sample (normal 0 1))]
  x)

(let [data (vector 1 2 3)
      a (vector 2)]
  (vector (first (rest (rest data))) a))

(let [data (vector 1 2 (sample (normal 1 1)))
      a (conj [] (sample (normal 0 2)))
      b (conj a (sample (normal 0 3)))]
  (observe (normal (second b) 4) (first (rest data)))
  b)

(let [x (sample (normal 0 1))]
  (sample (normal x 1)))

(let [p (sample (beta 1 1))
      x (sample (beta (exp p) 1))
      d (bernoulli (* x p))]
  (observe d 1)
  p)

(defn observe-data [_ data slope bias]
      (let [xn (first data)
            yn (second data)
            zn (+ (* slope xn) bias)]
        (observe (normal zn 1.0) yn)
        (rest (rest data))))

(let [slope (sample (normal 0.0 10.0))
      bias  (sample (normal 0.0 10.0))
      data (vector 1.0 2.1 2.0 3.9 3.0 5.3
                  4.0 7.7 5.0 10.2 6.0 12.9)]
  (loop 6 data observe-data slope bias)
  (vector slope bias))