Research Overview
The research in the DMM group centres on helping users to manage, mine, understand, and explore their data, whether that is data that's in a traditional relational database, an information network, or genomic data.Research by faculty member
Laks Lakshmanan
As the world we live in is getting more and more networked, the need to understand, manage, and harness the data on the web is becoming critical. While data in traditional databases tends to be highly structured, with a clear notion of schema, data on the web is loosely structured (also called semi-structured), or worse, unstructured, and is often not accompained by any clear notion of schema. What does it mean to query this data? What do you look for when you mine this data? If there are several data sources containing related information, how do you combine the information in them to answer queries involving them all? How can you index such data for efficient storage and retrieval? What do you do when the data you want to analyze is not stored some place but is streaming through? My research has been concerned with addressing these questions. I am also interested in newer applications which challenge the foundations and technology of databases.
More recently, I am interested in integrating the paradigms of database-style querying, IR-style search, and RecSys-style recommendations. And I want to do this taking user's context into account. Context as in the social neighborhood of the user as well as context as in the user's current information needs or her current task. Opinions and "intelligence" of the crowd is something to be naturally harnessed in this setting. Stay tuned for more information on what drives my research these days.
Raymond Ng
As the Chief Informatics Officer of the PROOF Centre of Excellence for the prevention of organ failures since 2008, I have been leading a team of computational scientists, statisticians and system biologists to conduct various genomics studies on heart, lung and kidney failures. The team oversees every aspect of "Big Data" from storage, quality control to data mining, model building, discovery and validation of biomarker panels. The team has developed state-of-the-art computational pipelines for every step of biomarker discovery and validation. Those analysis pipelines have been applied successfully to numerous studies. The flagship biomarker project of the PROOF Centre is the development of biomarker panels for diagnosing acute rejection on transplanted heart or kidney patients. Starting from 2004, with total funding in excess of $20 million Canadian dollars, we have worked diligently on every step of the process, from discovery, to validation and clinical implementation. There was also an international trial involving hundreds of patients in Canada, US, Australia and India. The panel for heart transplants, in particular, has been made into a new laboratory test, to be given to patients in St Paul’s hospital starting this year.
A totally different direction of my research contributions is the body of studies on summarizing and extracting information from written conversations, such as emails, blogs and tweets. Over the past 15 years, the group led by Carenini, a UBC colleague, and myself have published extensively in all the premier international forums. See here for more details. Our projects were partially funded by Google, IBM and SAP. This line of work has culminated into our book on summarizing text conversations. Since its publication in 2011, the book has become the third most downloaded books of the Morgan Claypool series on data management.
Lastly, I also lead a research program that focal areas: (A) aggregate query processing for wireless sensor networks; (B) topic modeling and sentiment extraction for text streams; (C) outlier detection and explanations; and (D) prefix based forecasting.
Rachel Pottinger
The research that my students and I are doing centres on (1) how to help people understand and explore their data, (2) how to manage data that is currently not well supported by databases, and (3) how data can be managed in situations where there are multiple databases. To that end, my students and I are currently exploring a number of topics, including:
-
Data lakes and open data are not easy to navigate. Trying to improve the ability for users to find the data that they are looking for is important to making the most of data. In this project we are focusing on table annotation and discovery in data lakes.
-
In many cases where analysis is being performed, a user may have an aggregation query to which she knows what the correct answer should be for one case. Trying to determine why the answer that the user is getting is different from the one provided by the “Oracle” is a frustrating and error-prone process. This project seeks to allow users to get feedback to why their aggregation queries are not providing the answer that they expect.
-
When asking queries, it can be useful to try to predict what the next query will be based on queries that have been asked in the past. This is part of a serious of long going project, most of which have used machine learning techniques.