I am a postdoctoral research fellow at the Department of Computer Science, the University of British Columbia. Prior to this, I was a postdoctoral researcher at the Data Science Lab, McMaster University.
My research interests are data management, data quality, data cleaning and applications of AI in data management. My past research was about rule-based query languages, ontology-based data access and multidimensional databases.
Pastwatch is a data summarization, explanation and visualization framework for the provenance of aggregate queries. Data provenance includes any information about the origin of a piece of data and the process that led to its creation. The provenance of a query over a database is the data in the database that contributed to the query answer. For aggregate queries that apply mathematical functions, such as sum and average, the provenance of a query answer usually contains a large number of database records which makes it difficult for a database user to explore and understand it. Pastwatch facilitates database access by providing provenance summarization of queries, which helps users to understand the query answers.
CurrentClean is a probabilistic system for the detection and cleaning of stale data. It learns spatio-temporal update patterns for values in a database via past update queries. CurrentClean applies inference rules to model the causal and co-occurrence update patterns seen in real data and estimates currency of values and recommends spatio-temporal-aware repairs for stale values. We applied several optimization techniques that improve the inference run-time in the system and we conducted extensive experiments and studied CurrentClean's comparative accuracy to detect stale values in real data, as well as its repair effectiveness. [link]
PACAS is a framework for facilitating data cleaning between a client and a service provider. The goal of this framework is to improve data accuracy with respect to a master database owned by the service provider. The interaction between the client and the service provider is done via a data-pricing scheme where the service provider charges the client for each disclosed value, according to its adherence to the privacy model. In PACAS, we introduced a new privacy model based on data publishing that considers the data semantics while providing stronger privacy protection. We also presented a data-cleaning algorithm that resolves errors by updating them to their true values in the service provider data. [link]
O. AlOmeir, E. Y. Lai, M. Milani and R. Pottinger
To Appear in IEEE International Conference on Data Engineering, 2020 (ICDE '20)
Z. Zheng, T. Quach, Z. Jin, M. Milani, F. Chiang
ACM International Conference on Information and Knowledge Management, 2019 (CIKM'19)
J. Liu, Z. Zolaktaf, R. Pottinger, M. Milani
International Conference on Scientific and Statistical Database Management, 2019 (SSDBM '19)
M. Milani, Z. Zheng and F. Chiang
IEEE International Conference on Data Engineering, 2019 (ICDE '19)
Y. Huang, M. Milani and F. Chiang
IEEE International Conference on Big Data, 2018 (BigData '18)
L. Bertossi and M. Milani
Journal of Data and Information Quality, 2018 (JDIQ '18)
McMaster University, Department of Computing and Software, Fall 2018