"What are they talking about?" Finding Topics in Email Conversations

By Shafiq Joty

Our ongoing research addresses the task of finding topics at the sentence level in email conversations. For instance, an email thread about arranging a conference can discuss topics like “location and time”, “registration”, “food menu”, “workshops”, etc. However, as an asynchronous collaborative application, email has its own characteristics which differ from written monologues (e.g., text books, news articles) or spoken dialogs (e.g., meetings). Hence, existing methods such as the generative topic models (e.g., Latent Dirichlet Allocation (LDA)), and the lexical chain based approach (e.g., LCSeg) which are successful in monologue or dialog, may not be successful by themselves in asynchronous written conversations like emails. We claim that in order to find topics we need to consider the conversation structure and other conversation specific features. In our experiments on a small development set we see that considering conversation structure significantly improves the performance over the existing methods. To this end, we propose a novel graph-theoretic framework to solve the problem considering a rich feature set. Crucial to our proposed approach is that it captures the discriminative email features and integrates the strengths of the supervised approach with the unsupervised technique, still considering LDA and LCSeg as important factors.  

Visit the LCI Forum page