Date: Mar 12th, 2009
Room: DMP 310
Speaker: Surajit Chaudhuri
Title: A Programming Framework for Data Cleaning
Abstract:
Data cleaning is a critical component of Business Intelligence software. Specifically, the data cleaning step reconciles multiple representations of the same data and thereby ensures accurate data analysis. With the advent of the web, this problem has gained even more importance. For example, data cleaning technology is also used to capture typos and differences in representation when looking up addresses in an online address search. Traditionally, data cleaning has been driven by consultants and software that is custom made for specific vertical domains. In this talk, we take a different approach and propose a programming framework for data cleaning. We will identify some of the key aspects of such a programming framework.