PhD Thesis Defense - Mohammed Suhail

Date

March 13, 2024 10:00 AM –1:00 PM

Name: Mohammed Suhail

Date: March 13, 2024

Time:10 am

Location: 204

Supervisor: Leonid Sigal

Title: Understanding Semantics and Geometry of Scenes

Abstract:

In this dissertation, we present new approaches for structured scene understanding from images and videos. Structured scene understanding finds numerous applications, including in robotics and autonomous vehicles, as well as in 3D content creation and video editing. The focus of this research is on three specific tasks: scene graph generation, novel view synthesis, and layered scene representation. Scene graph generation involves creating a graph structure that represents the objects and their relationships in a scene. Generating a scene graph from an image demands a comprehensive comprehension of constituent objects and their associations. Our exploration delves into integrating the often overlooked structure of the output space into the reasoning framework. Additionally, we extend beyond bounding box granularity by leveraging pixel-level masks to ground objects when such annotations are absent in scene graph datasets. Novel view synthesis involves generating new views of a scene from input images. Achieving this demands a deep comprehension of the scene's underlying geometry to ensure the rendering of pixels aligns seamlessly with the scene's structure. Within this dissertation, our exploration centers on methods capable of accurately rendering scenes, particularly when dealing with non-lambertian surfaces.

Moreover, we address the challenge of developing view-synthesis techniques capable of generating new scene perspectives without necessitating training for each scene. Layered scene representation involves decomposing a scene into different semantically meaningful layers. In our pursuit of this task, we confront the constraints inherent in existing methods when handling videos with parallax effects resulting from homography-based modeling. To address this, our exploration focuses on a methodology aimed at learning a three-dimensional (3D) layered representation. This approach aims to surpass these limitations and facilitate a more comprehensive scene decomposition.

The main contributions of this thesis thus include the exploration and advancement of these tasks.