Collaborative Filtering and The Missing at Random Assumption 

By Benjamin Marlin

In this talk I will present a broad overview of my research on the problem of non-random missing data in collaborative filtering. I will introduce the concept of a missing data mechanism following Little and Rubin, describe how the missing at random assumption might easily be violated in a recommender system, and what the implications are for modeling, learning, inference, prediction, and error estimation. I will describe work done at Yahoo! Research and Yahoo! Music to collect a novel data set that allows us to study these questions in the context of a real recommender system. Finally, I will describe some of the models we have looked at that include simple non-random missing data mechanisms, and discuss empirical results on both the collaborative prediction and collaborative ranking tasks.

Visit the LCI Forum page