Probabilistic Programming

Extra Homework # 5

Learning Goal

The goal of this fifth exercise is to understand and implement a (toy) reverse-mode automatic differentiation system.

Task

Your should write a reverse-mode automatic differentiation implementation for the zero-eth order pure deterministic language.

As a reminder, this language contains only the following forms

$v ::=$ $\textrm{variable}$
$c ::=$ $\textrm{constant value or primitive operation}$
$e ::=$ $c$ | $v$ | (if $e_1$ $e_2$ $e_3$) | ($c$ $e_1$ $\ldots$ $e_n$)

Assume that such an expression in this language is wrapped in a clojure fn form and that any expression (in the zeroeth order language) wrapped in this way and provided to you as a homework problem returns a scalar. For example, the following expression is one your autodiff system should work on

(fn [a b c]
  (/ (+ a (* 7 b)) (sin c)))

Call this function $f$. Your task is to write a function which consumes an expression of this form and returns a function of the same arguments that, when applied to an appropriately shaped list of arguments, returns the primal value of evaluating the function on those arguments and a map that contains key value pairs in which each key is an argument symbol and the corresponding value is the value of partial derivative of the return value of the original function with respect to each of its arguments evaluated at the same input value.

In other words you should write code that writes a function, call it autodiff which takes the expression $f$ above in quoted form and a list of argument values and returns the value of (apply (eval f) [2 3 4]) and a map {'a (df-da 2 3 4), 'b (df-db 2 3 4), 'c (df-dc 2 3 4)} where, for instance, df-da is the function $\frac{\partial f}{\partial a}$ .

Primitive functions you must support

You can expect all functions to be scalar valued with scalar arguments. With respect to random quantities you will only be tested on expressions that include (normpdf x mu sigma).

Rubric

One point will be given for each gradient that is computed correctly for the programs below:

1)

(fn [x] (exp (sin x)))

2)

(fn [x y] (+ (* x x) (sin x)))

3)

(fn [x] (if (> x 5) (* x x) (+ x 18)))

4)

(fn [x] (log x))

5)

(fn [x mu sigma] (+ (- 0 (/ (* (- x mu) (- x mu))
                                                  (* 2 (* sigma sigma))))
                                       (* (- 0 (/ 1 2)) (log (* 2 (* 3.141592653589793 (* sigma sigma)))))))

6)

(fn [x mu sigma] (normpdf x mu sigma))

7)

(fn [x1 x2 x3] (+ (+ (normpdf x1 2 5)
                        (if (> x2 7)
                          (normpdf x2 0 1)
                          (normpdf x2 10 1)))
                    (normpdf x3 -4 10)))

Helpful links

I personally found, in addition to the Evaluating Derivatives book, Cohen’s lecture notes and the following blog are quite helpful.

Helpful code

Here is a list of partial derivative implementations that you may find useful

  (def partial-fns
    (reduce
      merge
      (list
        {'* [(fn [a b] b) (fn [a b] a)]}
        ; f(a,b) = a * b <-> (* a b)
        ; df/da = b
        ; df/db = a

        {'- [(fn [a b] 1) (fn [a b] -1)]}
        ; f(a,b) = a - b <-> (- a b)
        ; df/da = 1
        ; df/db = -1

        {'+ [(fn [a b] 1) (fn [a b] 1)]}
        ; f(a,b) = a + b <-> (+ a b)
        ; df/da = 1
        ; df/db = 1

        {'/ [(fn [a b] (/ 1 b)) (fn [a b] (* a (/ -1 (* b b))))]}
        ; f(a,b) = a / b <-> (/ a b)
        ; df/da = 1
        ; df/db = -1/b^2

        {'exp [(fn [a] (exp a))]}
        ; f(a) = (exp a)
        ; df/da = (exp a)

        {'relu [(fn [a] (if (> a 0) 1 0))]}
        ; f(a) = (relu a)
        ; df/da = 1 if a > 0, 0 otherwise

        {'log [...]} <--- base e

        {'normpdf [...]}


        {'sin [(fn [a] (cos a))]})))

as well as a simple implementation of normpdf

(defn normpdf [y m s]
  (observe* (normal m s) y))

You can test your own code by using finite differences to check your reverse mode automatic differentiation implementation. Here is some code that will produce a finite difference function expression that can be evaluated to produce a gradient of the fn form given above with respect to all of its arguments.

(defn addd [exprl i d]
  (if (= i 0)
    (reduce conj [`(~'+ ~d ~(first exprl))] (subvec exprl 1))
    (reduce conj (subvec exprl 0 i)
            (reduce conj [`(~'+ ~d ~(get exprl i))] (subvec exprl (+ i 1))))))

(defn finite-difference-expr [expr args i d]
  `(~'/ (~'- (~expr ~@(addd args i d)) (~expr ~@args)) ~d))

(defn finite-difference-grad [expr]
  (let [[op args body] expr
        d (gensym)
        fdes (map #(finite-difference-expr expr args % d) (range (count args)))
        argsyms (map (fn [x] `(~'quote ~x)) args)]
    `(~'fn [~@args]
       (~'let [~d 0.001]
         ~(zipmap argsyms fdes)))))