When Convolutions Meet Reality - Structured Neural Networks in Vision and Graphics - Andreas Lehrmann, Facebook Reality Labs


X836 - 2366 Main Mall, V6T1Z4


In recent years, deep neural networks have led to tremendous progress in many areas of computer vision and computer graphics, including character animation, scene understanding, video synthesis, and virtual avatars. At the same time, it has become apparent that generic architectures (e.g., VGG, GoogLeNet, ResNet) are not the ideal solution to many complex synthesis tasks, because they do not account for the intrinsic structure of the problem domain. As a consequence, they ignore physical realities, lack interpretability, complicate interactive manipulation, and are prone to overfitting. Probabilistic graphical models (Bayesian networks, Markov random fields), on the other hand, have a long history in machine learning and provide principled frameworks for such structured data. With the rise of auto-differentiation and dynamic computation graphs, it is thus natural to use their factored representations as a means of enforcing structure and injecting domain knowledge in deep neural networks. In this talk, we cover a series of structured tasks in vision and graphics and show how they can be approached with architectures that unify the benefits of deep neural networks, probabilistic graphical models, and explicit integration of domain knowledge. We begin with a discussion of a principled encoder-decoder architecture that formulates vision and graphics in a common framework. In such a framework, the description of the intrinsic representations and the structure of the processes operating on these representations have important repercussions on the bias and variance of the resulting model. We explore this spectrum from fully non-parametric formulations and vastly overspecified black-box processes to coarse approximations and exact, physics-based operations in a number of different architectural elements, including priors, latent spaces, decoders, and output spaces. We accompany these statistical principles with applications to various topics in the visual domain, such as scene analysis, animation, and video synthesis in 2D and 3D.

Short Bio:

Andreas Lehrmann is a postdoctoral research scientist with Facebook AI Research working at the intersection of machine learning, computer vision, and computer graphics. His research focuses on the development of deep generative models for structured data and approximate methods for the associated inference tasks. He is also interested in semantic latent representations and efficient encodings of contextual information in space and time using graphical models. Fields of application in computer vision and computer graphics include scene understanding, inverse rendering, and human pose estimation. Before joining Facebook in 2018, Andreas was a postdoctoral research associate at Disney Research and a Microsoft Research Ph.D. scholar at ETH Zurich (Switzerland) and the Max-Planck-Institute for Intelligent Systems (Germany). Prior to that, he obtained a Master’s in bioinformatics from the University of Tuebingen (Germany).