Probabilistic Programming

Extra Homework # 4

Learning Goal

Learn how to write a compiler, in particular a compiler that supports a non-standard interpretation of a program as a declarative language for graphical models (with control structure).

Task

You must write a compiler for programs written in the sugared first order probabilistic programming language (FOPPL) [1, Chap. 3] :

$v ::=$ $\textrm{variable}$
$c ::=$ $\textrm{constant value or primitive operation}$
$f ::=$ $\textrm{procedure}$
$e ::=$ $c$ | $v$ | (let [$v$ $e_1$] $e_2$) | (if $e_1$ $e_2$ $e_3$) | ($f$ $e_1$ $\ldots$ $e_n$) | ($c$ $e_1$ $\ldots$ $e_n$) | (sample $e$) | (observe $e_1$ $e_2$)
$q ::=$ $e$ (defn $f$ [$v_1$ $\ldots$ $v_n$] $e$) $q$

that compiles to a graphical model datastructure.

The choice of implementing language is yours. The choice of underlying datastructure is yours. An interface to the datastructure produced must have methods implemented that include count-vertices, count-edges, print-graph, sample-from-prior, sample-from-joint. You may optionally support bind-free-variables. The functionality of all but the last should be obvious from their names.

The set of distributions to recognize and support must include normal, uniform-continuous, beta, and bernoulli.

Vector and hashmap sugar and primitive operations must be supported as well. List operations must be supported. This means that first, rest, last, nth, conj, cons must be supported.

Primitive mathematical operations must be supported as well, such as +, -, etc.

A reasonable set of primitives can be had, if implementing in Clojure, by stealing the definition of primitive procedures from Anglican and modifying it to also count your own implemented primitive procedures. Note that in the following foppl-compiler.primitives would be replaced with your own namespace. Thanks to David Tolpin for the Anglican implementation of this function.

In other words, with the following in your namespace

(:require [anglican.runtime :refer :all])
(:require [anglican.core :refer :all])

and, implementations like this in your own namespaces

(ns foppl-compiler.primitives
  (:require [clojure.core.matrix :as m])
  (:require [anglican.runtime :refer [tanh]]))

  (defn append [& args] (apply conj args))

  (defn mat-mul [& args] (apply m/mmul args))
  (defn mat-add [& args] (apply m/add args))
  (defn mat-transpose [& args] (apply m/transpose args))
  (defn mat-tanh [M] (m/emap tanh M))
  (defn mat-relu [M] (m/emap (fn [x] (if (> x 0) x 0)) M))
  (defn mat-repmat [M r c]
    (let [R (reduce (partial m/join-along 0) (repeat r M))]
      (reduce (partial m/join-along 1) (repeat c R))))

you can define a “white-list” of primitive procedures with a function like this:

(def ^:dynamic *primitive-procedures*
  "primitive procedures, do not exist in CPS form"
  (let [;; higher-order procedures cannot be primitive
        exclude '#{loop
                   map reduce
                   filter keep keep-indexed remove
                   repeatedly
                   every? not-any? some
                   every-pred some-fn
                   comp juxt partial}
        ;; runtime namespaces
        runtime-namespaces '[clojure.core anglican.runtime foppl-compiler.primitives]]
    (set (keep (fn [[k v]]
                 (when (and (not (exclude k))
                            (fn? (var-get v)))
                   k))
               (mapcat ns-publics runtime-namespaces)))))

Rubric

Your compiler will be evaluated on the following four programs:

1)

(let [mu (sample (normal 1 (sqrt 5)))
           sigma (sqrt 2)
           lik (normal mu sigma)]
       (observe lik 8)
       (observe lik 9)
       mu)

2)

(defn observe-data [_ data slope bias]
        (let [xn (first data)
              yn (second data)
              zn (+ (* slope xn) bias)]
          (observe (normal zn 1.0) yn)
          (rest (rest data))))
(let [slope (sample (normal 0.0 10.0))
             bias  (sample (normal 0.0 10.0))
             data (vector 1.0 2.1 2.0 3.9 3.0 5.3
                          4.0 7.7 5.0 10.2 6.0 12.9)]
   (loop 6 data observe-data slope bias)
   (vector slope bias))

3)

(defn hmm-step [t states data trans-dists likes]
      (let [z (sample (get trans-dists
                           (last states)))]
        (observe (get likes z)
                 (get data t))
        (append states z)))
(let [data [0.9 0.8 0.7 0.0 -0.025 -5.0 -2.0 -0.1
            0.0 0.13 0.45 6 0.2 0.3 -1 -1]
      trans-dists [(discrete [0.10 0.50 0.40])
                   (discrete [0.20 0.20 0.60])
                   (discrete [0.15 0.15 0.70])]
      likes [(normal -1.0 1.0)
             (normal 1.0 1.0)
             (normal 0.0 1.0)]
      states [(sample (discrete [0.33 0.33 0.34]))]]
  (loop 16 states hmm-step data trans-dists likes))

4)

(let [weight-prior (normal 0 1)
      W_0 (foreach 10 []
            (foreach 1 [] (sample weight-prior)))
      W_1 (foreach 10 []
            (foreach 10 [] (sample weight-prior)))
      W_2 (foreach 1 []
            (foreach 10 [] (sample weight-prior)))

      b_0 (foreach 10 []
            (foreach 1 [] (sample weight-prior)))
      b_1 (foreach 10 []
            (foreach 1 [] (sample weight-prior)))
      b_2 (foreach 1 []
            (foreach 1 [] (sample weight-prior)))

      x   (mat-transpose [[1] [2] [3] [4] [5]])
      y   [[1] [4] [9] [16] [25]]
      h_0 (mat-tanh (mat-add (mat-mul W_0 x)
                             (mat-repmat b_0 1 5)))
      h_1 (mat-tanh (mat-add (mat-mul W_1 h_0)
                             (mat-repmat b_1 1 5)))
      mu  (mat-transpose
            (mat-tanh (mat-add (mat-mul W_2 h_1)
                               (mat-repmat b_2 1 5))))]
(foreach 5 [y_r y
            mu_r mu]
   (foreach 1 [y_rc y_r
               mu_rc mu_r]
      (observe (normal mu_rc 1) y_rc)))
[W_0 b_0 W_1 b_1])

One point will be awarded for each vertex and edge count that matches. To clarify, each test program will be worth two points.

Extras

What follows is an example list of the kinds of programs that your compiler should support.

(let [x (sample (normal 0 1))]
  x)

(let [data (vector 1 2 3)
      a (vector 2)]
  (vector (first (rest (rest data))) a))

(let [data (vector 1 2 (sample (normal 1 1)))
      a (conj [] (sample (normal 0 2)))
      b (conj a (sample (normal 0 3)))]
  (observe (normal (second b) 4) (first (rest data)))
  b)

(let [x (sample (normal 0 1))]
  (sample (normal x 1)))

(let [p (sample (beta 1 1))
      x (sample (beta (exp p) 1))
      d (bernoulli (* x p))]
  (observe d 1)
  p)

(defn observe-data [_ data slope bias]
      (let [xn (first data)
            yn (second data)
            zn (+ (* slope xn) bias)]
        (observe (normal zn 1.0) yn)
        (rest (rest data))))

(let [slope (sample (normal 0.0 10.0))
      bias  (sample (normal 0.0 10.0))
      data (vector 1.0 2.1 2.0 3.9 3.0 5.3
                  4.0 7.7 5.0 10.2 6.0 12.9)]
  (loop 6 data observe-data slope bias)
  (vector slope bias))

J. W. van de Meent, B. Paige, H. Yang, and F. Wood, “Introduction to Probabilistic Programming,” Foundations and Trends in Machine Learning, pp. in review, 2018.