For those who weren't in class - this week we discussed some of the problems people were having with the Bayesian inference section of the assignment and went through getting started in Tensorflow. The code for the Tensorflow tutorial is below...

import tensorflow as tf
import numpy as np

Basics¶

There are loads of good tensorflow tutorials online, so I'm not going to try cover everything but rather focus on the bare essentials to get you started. The key idea to using tensorflow is that we build computational graphs that get compiled and run rather than working in an imperitive fashion as you would in Numpy or Matlab. This is sort of abstract so let's look at an example instead.

If we were doing linear regression using gradient descent in numpy, our code would look like this:

# generate random input data
n = 1000 # training examples
d = 5 # features
X = np.random.randn(n*d).reshape((n,d)) # design matrix
Y = np.dot(X, np.array([2., 4., 1., -2., 3.])) + np.random.randn(n) # made up coefficients

# model code
W = np.zeros(d)
alpha = 0.1
old_loss = np.inf
for i in range(100):
    y_hat = np.dot(X, W) # no biases because I'm lazy :)
    loss = np.mean(np.square(Y - y_hat))
    grad_loss = 1./n * np.dot(X.T, -2 * (Y - y_hat))
    W -= alpha * grad_loss
    print i, loss, np.round(W, 1)
    if np.abs(old_loss - loss) < 1e-6:
        break
    else:
        old_loss = loss

0 35.8501804302 [ 0.4  0.8  0.3 -0.5  0.6]
1 22.8662342392 [ 0.8  1.4  0.5 -0.8  1.1]
2 14.7328965693 [ 1.1  1.9  0.6 -1.1  1.5]
3 9.63245213428 [ 1.3  2.3  0.7 -1.3  1.8]
4 6.43018547634 [ 1.4  2.6  0.8 -1.5  2. ]
5 4.41714843987 [ 1.6  2.9  0.9 -1.6  2.2]
6 3.15000244966 [ 1.7  3.1  0.9 -1.7  2.4]
7 2.35123736047 [ 1.7  3.3  0.9 -1.8  2.5]
8 1.8469630683 [ 1.8  3.4  1.  -1.8  2.6]
9 1.52809698268 [ 1.8  3.5  1.  -1.9  2.7]
10 1.32612910047 [ 1.9  3.6  1.  -1.9  2.8]
11 1.19797648785 [ 1.9  3.7  1.  -1.9  2.8]
12 1.11650942625 [ 1.9  3.7  1.  -1.9  2.8]
13 1.06461944898 [ 1.9  3.8  1.  -2.   2.9]
14 1.03150116399 [ 2.   3.8  1.  -2.   2.9]
15 1.01031904398 [ 2.   3.8  1.  -2.   2.9]
16 0.996741530312 [ 2.   3.8  1.  -2.   2.9]
17 0.988018857997 [ 2.   3.9  1.  -2.   3. ]
18 0.982402130452 [ 2.   3.9  1.  -2.   3. ]
19 0.978776830203 [ 2.   3.9  1.  -2.   3. ]
20 0.976431253216 [ 2.   3.9  1.  -2.   3. ]
21 0.974909954995 [ 2.   3.9  1.  -2.   3. ]
22 0.973920839086 [ 2.   3.9  1.  -2.   3. ]
23 0.973276146681 [ 2.   3.9  1.  -2.   3. ]
24 0.972854906383 [ 2.   3.9  1.  -2.   3. ]
25 0.972578992346 [ 2.   3.9  1.  -2.   3. ]
26 0.972397827503 [ 2.   3.9  1.  -2.   3. ]
27 0.972278589313 [ 2.   3.9  1.  -2.   3. ]
28 0.972199924797 [ 2.   3.9  1.  -2.   3. ]
29 0.972147908299 [ 2.   3.9  1.  -2.   3. ]
30 0.972113435609 [ 2.   3.9  1.  -2.   3. ]
31 0.972090540068 [ 2.   3.9  1.  -2.   3. ]
32 0.972075301804 [ 2.   3.9  1.  -2.   3. ]
33 0.972065139459 [ 2.  4.  1. -2.  3.]
34 0.972058349154 [ 2.  4.  1. -2.  3.]
35 0.972053803632 [ 2.  4.  1. -2.  3.]
36 0.972050755468 [ 2.  4.  1. -2.  3.]
37 0.972048708017 [ 2.  4.  1. -2.  3.]
38 0.972047330583 [ 2.  4.  1. -2.  3.]
39 0.972046402534 [ 2.  4.  1. -2.  3.]

If we were being more generic about things, we could write functions for some of those operations. For example, we could write the follow function to calculate the loss...

def loss(x,y,W):
    y_hat = np.dot(x, W) # no biases because I'm lazy :)
    return np.mean(np.square(y - y_hat))
loss(X,Y,W)

0.97204577638871159

This is essentially a python version of a tensorflow "computational graph" that calculates the loss given placeholders x and y and Variable W. In tensorflow, we'd write the following:

x = tf.placeholder(tf.float32, shape=(None, d)) # we use None so we can choose and arbitrary batch size
y = tf.placeholder(tf.float32, shape=(None, 1))
W = tf.Variable(tf.random_normal([d,1], stddev=0.3), name='W')
b = tf.Variable(tf.zeros([1]), name='b') # let's use biases this time

y_hat = tf.matmul(x, W) + b
loss = tf.reduce_mean(tf.square(y - y_hat))

To actually run the code, you need to initialize the variables and run it in a tensorflow "session" (see the documentation for more on this).

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init_op)
    loss_output = sess.run(loss, feed_dict = {x:X, y:Y.reshape((-1,1))})
    print loss_output

33.9292

Why both with all of this? Tensorflow gives you two big advantages over numpy:

You can run your code on a GPU (more on that later)
Given the computational graph, you can calculate gradients automatically... this helps a lot in complicated models!

params = tf.trainable_variables()

params = [W, b]

grads = tf.gradients(loss, params)
alpha = 0.1
train_op = []
for v, g in zip(params, grads):
    update = tf.assign(v, v - alpha * g)
    train_op.append(update)

train_op

[<tf.Tensor 'Assign_2:0' shape=(5, 1) dtype=float32_ref>,
 <tf.Tensor 'Assign_3:0' shape=(1,) dtype=float32_ref>]

with tf.Session() as sess:
    sess.run(init_op)
    for i in range(50):
        _, loss_output, curr_par = sess.run([train_op, loss, W], feed_dict={x:X, y:Y.reshape((-1,1))})
        print i, loss_output, np.array(curr_par.flatten())

0 32.3069 [ 0.73563361  0.59163547  0.74043798 -0.49197033  0.93631476]
1 20.7954 [ 1.02590358  1.23854554  0.84207535 -0.83534873  1.35853434]
2 13.5342 [ 1.25106323  1.75943589  0.91375804 -1.10196698  1.69442129]
3 8.94701 [ 1.42547286  2.17914486  0.96341401 -1.3087908   1.9617275 ]
4 6.04435 [ 1.56036484  2.51756144  0.99699205 -1.46906614  2.17453551]
5 4.20449 [ 1.66451859  2.79062057  1.01893842 -1.59313035  2.34402108]
6 3.0362 [ 1.74479079  3.01109815  1.0325613  -1.68904662  2.47905564]
7 2.29295 [ 1.80653203  3.18924356  1.04030919 -1.7631011   2.58668423]
8 1.81918 [ 1.85391366  3.33328509  1.0439831  -1.82019138  2.67250299]
9 1.51656 [ 1.89018488  3.44983268  1.04489827 -1.86413097  2.74095869]
10 1.32285 [ 1.91787362  3.54419947  1.04400659 -1.89788735  2.79558635]
11 1.19859 [ 1.93894458  3.62065959  1.04198968 -1.92376781  2.83919692]
12 1.11869 [ 1.95492256  3.68265367  1.03932905 -1.94356465  2.87402654]
13 1.0672 [ 1.96698987  3.73295259  1.03635895 -1.9586693   2.90185475]
14 1.03394 [ 1.97606146  3.7737906   1.03330588 -1.9701606   2.92409801]
15 1.0124 [ 1.98284447  3.80696917  1.03031862 -1.97887433  2.94188476]
16 0.998421 [ 1.98788452  3.83394289  1.02749014 -1.98545718  2.95611358]
17 0.989323 [ 1.99160147  3.85588646  1.02487385 -1.99040878  2.96750093]
18 0.983389 [ 1.99431813  3.87374949  1.02249575 -1.99411476  2.97661805]
19 0.979509 [ 1.99628186  3.88829994  1.02036333 -1.99687231  2.98392057]
20 0.976965 [ 1.99768186  3.9001596   1.01847184 -1.99890983  2.98977208]
21 0.975293 [ 1.99866223  3.90983224  1.01680875 -2.00040269  2.99446297]
22 0.974192 [ 1.99933243  3.9177258   1.01535714 -2.00148535  2.99822474]
23 0.973465 [ 1.99977541  3.92417121  1.01409781 -2.00226045  3.00124288]
24 0.972984 [ 2.00005364  3.9294374   1.0130111  -2.00280643  3.00366521]
25 0.972665 [ 2.00021386  3.93374252  1.01207757 -2.00318265  3.00560999]
26 0.972453 [ 2.00029111  3.93726397  1.01127875 -2.00343418  3.00717211]
27 0.972312 [ 2.00031137  3.94014597  1.01059759 -2.00359488  3.00842738]
28 0.972218 [ 2.00029349  3.94250584  1.01001859 -2.00369024  3.00943637]
29 0.972155 [ 2.00025129  3.94443917  1.00952768 -2.00373936  3.01024771]
30 0.972113 [ 2.00019455  3.94602394  1.00911248 -2.00375652  3.0109005 ]
31 0.972084 [ 2.00013041  3.94732356  1.008762   -2.00375199  3.01142573]
32 0.972065 [ 2.00006366  3.94839001  1.00846684 -2.0037334   3.01184869]
33 0.972052 [ 1.99999774  3.94926548  1.00821865 -2.00370622  3.01218915]
34 0.972044 [ 1.99993467  3.94998431  1.00801027 -2.00367427  3.01246357]
35 0.972038 [ 1.99987578  3.95057511  1.00783563 -2.00364041  3.01268458]
36 0.972034 [ 1.99982178  3.95106053  1.00768936 -2.00360656  3.01286268]
37 0.972031 [ 1.99977279  3.95145988  1.00756717 -2.00357389  3.01300645]
38 0.97203 [ 1.99972892  3.95178843  1.007465   -2.00354314  3.01312232]
39 0.972028 [ 1.99969006  3.95205879  1.00737977 -2.00351477  3.01321578]
40 0.972027 [ 1.99965584  3.95228124  1.00730872 -2.00348878  3.01329112]
41 0.972027 [ 1.9996258   3.95246458  1.00724959 -2.00346541  3.01335192]
42 0.972027 [ 1.9995997   3.9526155   1.00720036 -2.00344467  3.01340103]
43 0.972026 [ 1.99957705  3.95273995  1.00715947 -2.00342631  3.01344085]
44 0.972026 [ 1.9995575   3.95284247  1.0071255  -2.0034101   3.0134728 ]
45 0.972026 [ 1.99954069  3.95292711  1.00709724 -2.0033958   3.01349878]
46 0.972026 [ 1.99952626  3.95299697  1.00707388 -2.0033834   3.01351976]
47 0.972026 [ 1.99951386  3.95305443  1.00705445 -2.00337267  3.01353669]
48 0.972026 [ 1.99950337  3.95310187  1.00703835 -2.00336337  3.01355028]
49 0.972026 [ 1.99949443  3.95314121  1.007025   -2.0033555   3.01356125]

Feed forward net¶

Let's do something a little more interesting and build a feed forward network.

We essentially need to write:

h1 = relu(tf.matmul(x, W1) + b1) h2 = relu(tf.matmul(x, W2) + b2) ... y_hat = tf.matmul(x, Wn) + bn

But given that we're going to be reusing this structure over and over, it makes sense to write a feed forward layer class or function to hold the variables and perform the forward pass.

tf.reset_default_graph() # clear everything from earlier so we can build a new graph

def relu(x):
    return tf.maximum(0., x)

class FFlayer(object):
    def __init__(self, num_in, num_out):
        self.W = tf.Variable(tf.random_normal([num_in,num_out], stddev=0.3), name='W')
        self.b = tf.Variable(tf.zeros([num_out]) + 0.2, name='b')
        
    def __call__(self, layer_below):
        return tf.matmul(layer_below, self.W) + self.b

# inputs
x = tf.placeholder(tf.float32, shape=(None, d)) 
# outputs
y = tf.placeholder(tf.float32, shape=(None, 1))

#network
h1 = relu(FFlayer(d, 64)(x))
h2 = relu(FFlayer(64, 32)(h1))
y_hat = FFlayer(32, 1)(h2)
loss = tf.reduce_mean(tf.square(y - y_hat))

# Let's use tensorflow's gradient descent implementation
optimizer = tf.train.AdamOptimizer(0.01)
train_op = optimizer.minimize(loss) 

# Variable init
init_op = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init_op)
    for i in range(500):
        _, loss_output = sess.run([train_op, loss], feed_dict={x:X, y:Y.reshape((-1,1))})
        if i % 10 == 9:
            print i, loss_output

9 3.93888
19 2.03188
29 1.73759
39 1.46565
49 1.2259
59 1.14244
69 1.08653
79 1.04232
89 1.00749
99 0.978529
109 0.954659
119 0.93411
129 0.915953
139 0.900058
149 0.885994
159 0.872881
169 0.860811
179 0.849106
189 0.837863
199 0.827669
209 0.818564
219 0.810392
229 0.80296
239 0.796053
249 0.789899
259 0.78425
269 0.778845
279 0.773519
289 0.768071
299 0.762862
309 0.757706
319 0.752704
329 0.747952
339 0.743424
349 0.739149
359 0.734949
369 0.730867
379 0.726698
389 0.72274
399 0.718806
409 0.714963
419 0.711232
429 0.707439
439 0.703587
449 0.699875
459 0.696158
469 0.692529
479 0.688962
489 0.685394
499 0.681739

Writing our own classes for each layer we use gets tedious quickly (although it's a good skill to have). Fortunately there's a massive number of packages that build on top of tensorflow that do this for you.

Tensorflow has:

Keras built in (high level but powerful)
TFSlim (lightweight which is nice if you want something more like raw tensorflow code)
Sonnet from DeepMind (released today! Looks good...)

Also worth taking a look at PyTorch - it is imperitive so feel more like working with Numpy / Matlab. Personally, I really like the look of it... but on the downside it doesn't have the ecosystem that TensorFlow has (yet!)... tools like TensorBoard are really nice for visualizing your training.

As an example, let's take a look at the Pytorch GAN tutorial to see how you'd build a GAN in a higher level framework.