It was July 2017 when UBC Professor Laks V.S. Lakshmanan was invited by his colleague abroad — Professor Reynold Cheng — to visit the University of Hong Kong where Cheng works. Laks jumped at the chance. After all, Professor Cheng had asked Laks to be the external examiner for one of his PhD students, Yixiang Fang, and this would also be an opportunity to explore possible collaborative research ideas between them. One thing led to another and the collaboration between the trio resulted in a publication in VLDB (Very Large Data Bases), one of the top conferences in data management the very next year.
Over the following years, the trio expanded their research collaborations to include another PhD student, Chenhao Ma and then two more professors, amounting to six total from three different countries (Canada, Hong Kong and Australia).
Their most recent efforts resulted in a paper published by ACM SIGMOD in 2020 that has now won them the highly-distinguished 2021 SIGMOD Research Highlight Award.
A prestigious award for core database research
By definition, the ACM Special Interest Group on Management of Data (SIGMOD) “is concerned with the principles, techniques and applications of database management systems and data management technology.” The SIGMOD Research Highlight Award showcases research projects that exemplify core database research. In particular, “the project must address an important problem, represent a definitive milestone in solving the problem, and have the potential of significant impact.” The award process is highly selective, as the PC chairs of premier conferences like PODS (Symposium on Principles of Database Systems), SIGMOD, VLDB (Very Large Databases), ICDE (International Conference on Data Engineering), EDBT (International Conference on Extending Database Technology) and ICDT (International Conference on Database Theory) as well as the broader SIGMOD community are approached for nominations. After careful discussion of the nominated papers in great detail, the awards committee selects the papers that best meet the criteria.
“None of us were expecting this award,” said Laks, “It’s a very high bar in terms of one’s work being widely appreciated within the community, so it was a pleasant surprise and also very gratifying. I am especially happy for Chenhao, as it’s such a feather in his cap. This work represents a key chapter of his thesis.”
A full version of their SIGMOD 2020 paper, Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs, is being considered for a special issue: "Best of SIGMOD," published by ACM Transactions on Database Systems (TODS), one of the top journals on data management. A more “popular” version of their article has been published in the ACM SIGMOD Record.
The challenge: finding dense regions in large graphs of data
The subject of the winning paper is about developing algorithms to vastly improve the speed in finding the highest density region within a large graph of data.
Laks said, “In many applications of AI, machine learning and data science, the underlying data often tends to be best modelled as a graph. Within a huge network, there are many connection points, and in fact, there can be tens of millions of nodes on a graph with billions of connections.” He used Facebook as an example. “Within the Facebook network, there are sub-networks or groups, where interactions are densest. In other words, there is more communication between group members or Facebook friends, than with the rest of the Facebook network, so they are like a tightly knit structure – a community.”
Laks explained how community detection has long been a problem occupying the significant attention of researchers. That is, identifying the most-dense subnetworks of a larger network. “Efficiently finding these communities has a lot of concrete applications – whether it’s biological databases, scientific collaboration networks or even fraud detection,” he explained. “Take something like identifying protein complexes within a complex protein interaction network. One of our previous papers used that topic to explore how we could extract the densest past of that data to better understand how protein complexes work.”
In terms of fraud detection, Laks pointed to online reviews as an example. How does one know when buying a product, that multiple reviews are legitimate and each reviewer is genuine? Perhaps the reviewers are fake, have been bribed or perhaps the reviews are from the same person or are from bots.
“If, within all that data, a highly dense subgraph was discovered, it might be indicative of suspicious activity, which could then be examined in greater detail.” he said. However, he also explained that on graphs beyond a certain scale, it can be notoriously complicated and exact solutions can take a long time to find. “To mitigate this, we have developed fast approximation algorithms that scale to very large networks. Approximation is acceptable since for most applications, the neighbourhood of the dense regions found by the algorithms needs to be explored anyway. Also, the networks already embed a certain amount of noise. The results show that our approximation algorithms are up to six orders of magnitude, or one million times, faster than state-of-the-art.”
Going forward, Laks said that his group is interested in exploring more novel applications.
“I'm a big believer in reality: often reality looks chaotic and unstructured at first. But that’s where exciting opportunities lie for unearthing patterns and creating structure out of the mess.”
Efficient Algorithms for Densest Subgraph Discovery on Large Directed Graphs
Chenhao Ma, The University of Hong Kong; Yixiang Fang, University of New South Wales; Reynold Cheng, The University of Hong Kong; Laks V.S. Lakshmanan, University of British Columbia; Wenjie Zhang; University of New South Wales; Xuemin Lin, University of New South Wales.