Research by faculty member

Laks Lakshmanan

As the world we live in is getting more and more networked, the need to understand, manage, and harness the data on the web is becoming critical. While data in traditional databases tends to be highly structured, with a clear notion of schema, data on the web is loosely structured (also called semi-structured), or worse, unstructured, and is often not accompained by any clear notion of schema. What does it mean to query this data? What do you look for when you mine this data? If there are several data sources containing related information, how do you combine the information in them to answer queries involving them all? How can you index such data for efficient storage and retrieval? What do you do when the data you want to analyze is not stored some place but is streaming through? My research has been concerned with addressing these questions. I am also interested in newer applications which challenge the foundations and technology of databases.

More recently, I am interested in integrating the paradigms of database-style querying, IR-style search, and RecSys-style recommendations. And I want to do this taking user's context into account. Context as in the social neighborhood of the user as well as context as in the user's current information needs or her current task. Opinions and "intelligence" of the crowd is something to be naturally harnessed in this setting. Stay tuned for more information on what drives my research these days.


Raymond Ng

As the Chief Informatics Officer of the PROOF Centre of Excellence for the prevention of organ failures since 2008, I have been leading a team of computational scientists, statisticians and system biologists to conduct various genomics studies on heart, lung and kidney failures. The team oversees every aspect of "Big Data" from storage, quality control to data mining, model building, discovery and validation of biomarker panels. The team has developed state-of-the-art computational pipelines for every step of biomarker discovery and validation. Those analysis pipelines have been applied successfully to numerous studies. The flagship biomarker project of the PROOF Centre is the development of biomarker panels for diagnosing acute rejection on transplanted heart or kidney patients. Starting from 2004, with total funding in excess of $20 million Canadian dollars, we have worked diligently on every step of the process, from discovery, to validation and clinical implementation. There was also an international trial involving hundreds of patients in Canada, US, Australia and India. The panel for heart transplants, in particular, has been made into a new laboratory test, to be given to patients in St Paul’s hospital starting this year.

A totally different direction of my research contributions is the body of studies on summarizing and extracting information from written conversations, such as emails, blogs and tweets. Over the past 15 years, the group led by Carenini, a UBC colleague, and myself have published extensively in all the premier international forums. See here for more details. Our projects were partially funded by Google, IBM and SAP. This line of work has culminated into our book on summarizing text conversations. Since its publication in 2011, the book has become the third most downloaded books of the Morgan Claypool series on data management.

Lastly, I also lead a research program that focal areas: (A) aggregate query processing for wireless sensor networks; (B) topic modeling and sentiment extraction for text streams; (C) outlier detection and explanations; and (D) prefix based forecasting.


Rachel Pottinger

My research centers on (1) how data can be managed in situations where there are multiple databases and (2) how to manage data that is currently not well supported by databases. To that end, my students and I are currently exploring a number of topics, including:
  • Making sense of data that is stored in relational databases or XML is difficult. For example, if civil engineers are trying to extract information about where two pieces of a building intersect, they may need to find 10 different elements in a schema that contains thousands of options. This project seeks to allow users to understand their schemas well enough to query them. This is joint work with Zainab Zolaktaf.
  • In many cases where analysis is being performed, a user may have an aggregation query to which she knows what the correct answer should be for one case. Trying to determine why the answer that the user is getting is different from the one provided by the "Oracle" is a frustrating and error-prone process. This project seeks to allow users to get feedback to why their aggregation queries are not providing the answer that they expect. This is joint work with Omar AlOmeir.

  • Old Projects