Raghav Goyal

PhD Student (Fall, 2019 - Present)
| in Computer Vision and Machine Learning
| at the University of British Columbia and Vector Institute
| advised by Prof. Leonid Sigal

Email: rgoyal14 (at) cs.ubc.ca
Physical Mail: 201-2366 Main Mall, Vancouver, BC, V6T 1Z4, Canada
Other: Google Scholar | Github | Blog

I'm broadly interested in video understanding, efficient (few-, weak-) learning paradigms, and structured prediction. My research is primarily supported by UBC's Four Year Doctoral Fellowship (4YF).
Prior to this I spent three years at Twenty Billion Neurons GmbH (20bn) in Berlin, Germany; working on video understanding under the supervision of Roland Memisevic, PhD. I also worked as an intern for five months at Xerox Research Centre Europe (now Naver Labs) in Grenoble, France; working on Natural Language Generation with Dr. Marc Dymetman.
I obtained Integrated M.Tech. (5-year programme) from IIT Delhi with Mathematics and Computing as my major.


A Simple Baseline for Weakly-Supervised Human-centric Relation Detection
Raghav Goyal, Leonid Sigal
In BMVC. Virtual. 2021. [pdf]

Random highlight: This project started out as an observation from experimental results of a different project.

UniT: Unified Knowledge Transfer for Any-shot Object Detection and Segmentation
Siddhesh Khandelwal*, Raghav Goyal*, Leonid Sigal (* equal contribution)
In CVPR. Virtual. 2021. [pdf]

Random highlight: The weakly-supervised component was notoriously hard and it took us nearly two months to make it competitive to SOTA.

Improved Few-Shot Visual Classification
Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal
In CVPR. Seattle, USA. 2020. [pdf]

Random highlight: Mahalanobis distance was surprisingly effective on Meta-Dataset as compared to Euclidean distance.

Evaluating visual "common sense" using fine-grained classification and captioning tasks
Raghav Goyal, Farzaneh Mahdisoltani, Guillaume Berger, Waseem Gharbieh, Ingo Bax, Roland Memisevic
In ICLR Workshop. Vancouver, Canada. 2018. [pdf]

Random highlight: Grad-CAM gave some pretty cool saliency maps for videos. Some examples can be found here.

The "something something" video database for learning and evaluating visual common sense
Raghav Goyal, Samira Ebrahimi Kahou, Vincent Michalski, *, Ingo Bax, Roland Memisevic (* see paper for additional authors)
In ICCV. Venice, Italy. 2017. [pdf] [supp] [code] [data]

Random highlight: When the dataset was being collected and the size was roughly around 10K videos, the top-1 accuracy on it was about 2%. Fast-forward to leaderboard here.

Natural Language Generation through Character-based RNNs with Finite-state Prior Knowledge
Raghav Goyal, Marc Dymetman, Eric Gaussier
In COLING. Osaka, Japan. 2016. [pdf]

Random highlight: I hacked into infamous Theano's scan function to write beam search decoding algorithm. Took me around three weeks but was pretty satisfying!

ML Challenges