A tractable pseudo-likelihood function for Bayes nets applied to relational data
By Oliver Schulte, Simon Fraser University
Bayes nets (BNs) for relational databases are a major research topic in machine learning and artificial intelligence. When the database exhibits cyclic probabilistic dependencies, measuring the fit of a BN model to relational data with a likelihood function is a challenge. A common approach to difficulties in defining a likelihood function is to employ a pseudo-likelihood; a prominent example is the pseudo likelihood defined for Markov Logic Networks (MLNs). This paper proposes a new pseudo likelihood P* for Parametrized Bayes Nets (PBNs) [Poole IJCAI 2003] and other relational versions of Bayes nets. The pseudo log-likelihood L* = ln(P*) is similar to the single-table BN log-likelihood, where row counts in the data table are replaced by frequencies in the database. We introduce a new type of semantics based on the concept of random instantiations (groundings) from classic AI research [Halpern 1990, Bacchus 1990]. The measure L* is the expected log-likelihood of a random instantiation of the 1st-order variables in the PBN. For parameter learning, the L*-maximizing estimates are the empirical conditional frequencies in the databases. For PBN structure learning, we show that the state of the art learn-and-join method of Khosravi et al. [AAAI 2010] implicitly maximizes the L* measure. The measure provides a theoretical foundation for this algorithm, while the algorithm's empirical success provides experimental validation for its usefulness. This work will be presented at the SIAM SDM data mining conference.

Visit the LCI Forum page