Integrating Multiple Cues in Unsupervised Grammar Induction

by Adam Pauls

I present a modified version of the existing EMILE algorithm for unsupervised grammar induction which uses linguistic knowledge, specifically partial bracketing and function words, in the process of unsupervised grammar induction. This knowledge is taken into account in two specific modifications of the algorithm: firstly, in pruning out expressions known to cross brackets, and secondly, in actively encouraging EMILE to use grammatical types which contain known phrases, lone proforms, or begin/end with modifier (i.e. phrase boundary) function words. The first modification is moderately successful in improving EMILE's accuracy and very successful in decreasing its runtime, while the second modification is too tied to the specifics of the EMILE algorithm to produce consistent results.

