On Top-k Structural Similarity Search

IEEE 2012 International Conference on Data Engineering (ICDE 2012)

Abstract

Search for objects similar to a given query object in a network has numerous applications including web search and collaborative filtering. We use the notion of structural similarity to capture the commonality of two objects in a network, e.g., if two nodes are referenced by the same node, they may be similar. Meeting-based methods including SimRank and P-Rank capture structural similarity very well. Deriving inspiration from PageRank, SimRank has gained popularity by a natural intuition and domain independence. Since it’s computationally expensive, subsequent work has focused on optimizing and approximating the computation of SimRank. In this paper, we approach SimRank from a top-k querying perspective where given a query node v, we are interested in finding the top-k nodes that have the highest SimRank score w.r.t. v. The only known approaches for answering such queries are either a naive algorithm of computing the similarity matrix for all node pairs or computing the similarity vector by comparing the query node v with each other node independently, and then picking the top-k. None of these approaches can handle top-k structural similarity search efficiently by scaling to very large graphs consisting of millions of nodes. We propose an algorithmic framework called TopSim based on transforming the top-k SimRank problem on a graph G to one of finding the top-k nodes with highest authority on the product graph G×G. We further accelerate TopSim by merging similarity paths and develop a more efficient algorithm called TopSim-SM. Two heuristic algorithms, Trun-TopSim-SM and Prio-TopSim-SM, are also proposed to approximate TopSim- SM on scale-free graphs to trade accuracy for speed, based on truncated random walk and prioritizing propagation respectively. We analyze the accuracy and performance of TopSim family algorithms and report the results of a detailed experimental study.


Materials

[ Paper in PDF ] [ Poster in PDF ] [ Presentation in PDF ]

BibTex

@inproceedings{DBLP:conf/icde/LeeLY12,
  author    = {Pei Lee and
               Laks V. S. Lakshmanan and
               Jeffrey Xu Yu},
  title     = {On Top-k Structural Similarity Search},
  booktitle = {{IEEE} 28th International Conference on Data Engineering {(ICDE} 2012),
               Washington, DC, {USA} (Arlington, Virginia), 1-5 April, 2012},
  year      = {2012},
  pages     = {774--785},
  crossref  = {DBLP:conf/icde/2012},
  url       = {http://dx.doi.org/10.1109/ICDE.2012.109},
  doi       = {10.1109/ICDE.2012.109},
  timestamp = {Tue, 14 Oct 2014 21:42:09 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/icde/LeeLY12},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}