A tractable pseudo-likelihood function for Bayes nets applied to relational data
By Oliver Schulte, Simon Fraser University
Abstract:
Bayes nets (BNs) for relational databases are a major research topic in machine learning and artificial intelligence. When the database exhibits cyclic probabilistic dependencies, measuring the fit of a BN model to relational data with a likelihood function is a challenge. A common approach to difficulties in defining a likelihood function is to employ a pseudo-likelihood; a prominent example is the pseudo likelihood defined for Markov Logic Networks (MLNs). This paper proposes a new pseudo likelihood P* for Parametrized Bayes Nets (PBNs) [Poole IJCAI 2003] and other relational versions of Bayes nets. The pseudo log-likelihood L* = ln(P*) is similar to the single-table BN log-likelihood, where row counts in the data table are replaced by frequencies in the database. We introduce a new type of semantics based on the concept of random instantiations (groundings) from classic AI research [Halpern 1990, Bacchus 1990]. The measure L* is the expected log-likelihood of a random instantiation of the 1st-order variables in the PBN. For parameter learning, the L*-maximizing estimates are the empirical conditional frequencies in the databases. For PBN structure learning, we show that the state of the art learn-and-join method of Khosravi et al. [AAAI 2010] implicitly maximizes the L* measure. The measure provides a theoretical foundation for this algorithm, while the algorithm's empirical success provides experimental validation for its usefulness.
This work will be presented at the SIAM SDM data mining conference.