Probability and Equality: A Probabilistic Model of Identity Uncertainty

ID
TR-2005-02
Authors
R. Sharma and David Poole
Publishing date
April 03, 2006
Length
12 pages
Abstract
Identity uncertainty is the task of deciding whether two descriptions correspond to the same object. It is a difficult and important problem in real world data analysis. It occurs whenever objects are not assigned with unique identifiers or when those identifiers may not be observed perfectly. Traditional approaches to identity uncertainty assume that the attributes in the descriptions are independent of eachother given whether or not the descriptions refer to the same object. However, this assumption is often faulty. For example, in the person identity uncertainty problem -- the problem of deciding whether two descriptions refer to the same person, the attributes "date of birth" and "last name" have the same values for twins. In this paper we discuss the identity uncertainty problem in the context of person identity uncertainty. We model the inter-dependence of the attributes and the probabilistic relations between the observed value of attributes and their actual values using a similarity network representation. Our approach allows queries such as, "what is the distribution over the actual names of a person given the names that appear in the description of the person", or, "what is the probability that two descriptions refer to the same person". We present results that show that our method outperforms the traditional approach for person identity uncertainty which considers the attributes as independent of each other.