Computer Graphics (CG) enables artists to realize their creative visions.
Technically, one is concerned with efficiently simulating light transport for photorealistic rendering,
finding the right parametrization for the shape and appearance of objects, and making these digital models accessible to the artist.
Creating digital content is a tedious process, which is alleviated by Computer Vision (CV) by reconstructing real scenes from natural images.
This course will explore the recent trend of using Deep Learning (DL) for the above-mentioned tasks; neural rendering, generative networks, disentangled representation learning, and reconstruction using neural networs. The first third of the course will introduce the essential tools (CNN architectures, attention models, output representations, and batch-norm layers), those needed to understand the advanced topics covered in the main part (self-supervised and generative models, such as VAEs and GANs).
A central theme is the learning of neural representations from labeled and unlabeled data. At one end, we will discuss approaches that can reconstruct a person photorealistic, from a single image and up to the scale of wrinkles, by using a volumetric grid with thousands of parameters. On the other end, we will study techniques for learning compact and meaningful embeddings, such as a parametric face model that decomposes expression, appearance, and pose and makes these latent dimensions controllable by the artist. At the end of the course, students will be able to work with a diverse spectrum of DL tools and be prepared to conduct research on how to create models that yield both, high visual quality and have artistic utility.
Compute: The assignments and course projects will deal with images of relatively high resolution and require a CUDA-capable GPU with at least 4GB of GPU RAM. While you can use your own GPU, we will also provide credits for cloud services. These will be limited per student. You have to be careful about your execution times, to keep sufficient compute resources for the final project.
Machine Learning. It is recommended to first study CPSC 340 or an equivalent course. Before starting this course, the following ML terms should be familiar to you: stochastic gradient descent and momentum; classification and regression; weight initialization and data whitening; training, validation and test set; convolution and translational invariance; and regularization and weight decay.
Please contact me if you have any doubts.
Python. The exercises are designed to prepare students for the practical project and provide a step-by-step introduction to the PyTorch machine learning framework. While no prior knowledge on PyTorch is required, essential python experience is expected; no python programming tutorial will be offered.
Prior courses in CG and CV are a plus and projects that build upon these are encouraged.
- Challenges in using deep learning for creative tasks
- Course expectations and grading
Homework 1 posted
|SIGGRAPH program / video|
|Jan 9||Deep learning basics and best practices
- regression/classification, objective functions
- stochastic gradient descent, vanishing and exploding gradients.
Deep Learning Book - Chapter 8
|W2||Jan 14||Network architectures
- Which neural network architectures work, why and how?
- ResNet, DenseNet, UNet
|Deep Learning Book - Chapter 9 ResNet, Unet|
|Jan 16||Representing images and sparse 2D keypoints
- edges, segmentation
- heat maps, part-affinity fields
Homework 1 due
Part Affinity Fields
|W3||Jan 21||Representing dense and 3D keypoints
- location maps, joint-angle skeleton
- uv-coordinates, warp-fields
Homework 2 posted
|Jan 23||Representing geometry and shape
- voxels, implicit functions
- PointNet, graph CNN, spiral convolution
|W4||Jan 28||Representation learning I (deterministic)
- principal component analysis (PCA)
- auto-encoder (AE)
PCA face model
Deep Learning Book - Chapter 14
|Jan 30||Representation learning II (probabilistic)
- variational auto encoder (VAE)
- generative adversarial network (GAN)
|Deep Learning Book - Chapter 20|
|W5||Feb 4||Sequential decision making
- Monte Carlo methods
- reinforcement learning
Homework 2 due, Homework 3 posted
|Deep Learning Book - Chapter 17|
|Feb 6||Unpaired image translation
- cycle consistency
- style transfer
|W6||Feb 11||Attention models
- spatial transformers, RoI pooling, attention maps
- camera models and multi-view
|Feb 13||Project Pitches (3 min pitch)
|W7||Feb 17||Homework 3 due
|Feb 18-21||Midterm Break. (no class)
Conditional content generation
Park et al., Semantic Image Synthesis with Spatially-Adaptive Normalization paper
Li et al., Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments paper
Read one of the two papers listed for each cours.
Chan et al, Everybody Dance Now paper
Gao et al., Automatic Unpaired Shape Deformation Transfer paper
Rhodin et al., Interactive Motion Mapping for Real-time Character Control paper
Holden et al., Phase-Functioned Neural Networks for Character Control paper
Vondrick et al., Tracking Emerges by Colorizing Videos paper
Doersch et al., Unsupervised visual representation learning by context prediction paper
Novel view synthesis
Hinton et al., Transforming Auto-encoders, paper
Rhodin et al., Geometry-Aware Representation Learning, paper
Rhodin et al., Differentiable Visibility, paper
Wen. et al., tba
Learning person models
Lorenz et al., Unsupervised Part-Based Disentangling of Object Shape and Appearance paper
Rhodin et al., Neural Scene Decomposition for Human Motion Capture paper
Object parts and physics
Li et al., GRASS: Generative Recursive Autoencoders for Shape Structures, paper
Xie et al., tempoGAN: A Temporally Coherent, Volumetric GAN for Super-resolution Fluid Flow paper
Objective functions - log likelihood
Jonathan T. Barron, A General and Adaptive Robust Loss Function paper
Christopher Bishop, Mixture Density Networks paper
Self-supervised object detection
Crawford et al., Spatially invariant unsupervised object detection with convolutional neural networks. paper
Proposal optimization, tba
Bagautdinov et al., Modeling Facial Geometry using Compositional VAEs paper
Verma et al., Feastnet: Feature-steered graph convolutions for 3d shape analysis paper
Sitzmann et al., DeepVoxels: Learning Persistent 3D Feature Embeddings paper
Saito et al., PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization paper
|W14||April 7||Project Presentations. (10 min talk per group, first half of groups)
|April 9||Project Presentations. (10 min talk per group, second half of groups)
|April 14||Project Report submission. (8 page PDF document, 11:59 pm via pizza)
While the assignments add points to the final grade, their primary goal is to prepare you for the project.
The first assignment is to set up a general training framework that lets one experiment with different datasets, neural network architectures, and objective functions. This framework will be extended and used in the remaining two assignments and also the final project.
Instead of step-by-step instructions, exercises are posed as high-level tasks. Questions are set up to train independent thinking, which is integral to solving actual problems in research and industry settings. This does not mean that you are without support; you are highly encouraged to search online and consult the abundance of deep learning tutorials as a basis for your solution. Furthermore, office hours are set up to provide personal assistance.
Academic Integrity. Assignments must be solved individually using the available course material and other online sources. You are neither allowed to copy nor look at parts of any of the assignments from anyone else. Accordingly, it is not allowed to post solutions online or distribute them in (private) forums. The university policies on academic integrity are rigorously applied.
Submission. Solutions must be handed in through the Pizza system and be kept private.
Deadline and grading. Assignments will be due on the dates specified in the schedule, always at 11:59 pm PST. Late submission will reduces the score by -25% per late day. The grading is based on correctness and completeness, as detailed in each exercise description.
The second half of the course will constitute a reading group, focussing on seminal and latest research. Each student will write reviews every week and is expected to participate actively in the discussion, along with presenting once during the course. Preferences will be considered for the presentation topic assignment.
Reviews. All students are expected to read one of the two papers that will be presented and discussed in each class. Nevertheless, it is recommended to at least skim through both the papers. Each presentation will be followed by an in-depth discussion on technical aspects, relations to other works, and practical utility. To initiate and start new arguments, a short review of the selected paper will have to be submitted through pizza, on the day before the class. Try to raise non-obvious aspects with the following:
These reviews will be distributed across all course participants. Keep your review concise (less than half a page) and self-contained.
Presentation. The presentation should be compact and precisely 25 minutes long. Focus on the most crucial points, those needed to understand the content, novelty, and impact of the given work. Most importantly, draw links to previous presentations, lecture content, and the related work you know. Such an embedding into a broader context will make your presentation unique and stimulating, especially for those who have read the underlying paper in detail. You can take the paper illustrations and other available online material, such as videos supplementing the paper, as a basis for your presentation, as long as you reference the source adequately on each slide. You will be the expert and be required to study the paper in sufficient depth to answer questions and trigger a rich discussion. Practice your talk before, ideally in front of a small audience, to get the timing and the messages right. Note that a part of your audience will have read the paper in detail and others will only have skimmed through. Therefore, try to address experts and newcomers alike. The tutor will assist you in this and give feedback on your planned presentation once.
The whole course is structured to prepare for the final research project. The goal is to either extend an existing approach, combine existing tools for solving a new problem, or to come up with an entirely new algorithm. Some topics and open research problems are given in class, but we highly encourage you to come up with your own idea on the basis of your unique background and experience. Projects can deviate from the core lecture topics, as longs as they are within CG, CV, and ML. You are encouraged to work in groups of two. Joining forces will allow you to complement your strengths and work on more challenging and rewarding problems. Nevertheless, both of you must have unique responsibilities, which have to be explicitly listed, and be involved in all of the following project stages:
Project proposal, pitch, report, and final presentations will be due on the dates and times specified in the schedule above (proposal and pitch before the mid-term break, presentations in the last two weeks, submitting the report on the last day of classes). Again, the rule of -25% points per late day will apply. The grade will mainly depend on the project definition, how well you present them in the report, how distinctly you position your work in the related literature, how thorough your experiments are, and how thoughtfully your contribution is derived.