Incremental Cluster Evolution Tracking from Highly Dynamic Network Data

IEEE 2014 International Conference on Data Engineering (ICDE 2014)

Abstract

Dynamic networks are commonly found in the current web age. In scenarios like social networks and social media, dynamic networks are noisy, are of large-scale and evolve quickly. In this paper, we focus on the cluster evolution tracking problem on highly dynamic networks, with clear application to event evolution tracking. There are several previous works on data stream clustering using a node-by-node approach for maintaining clusters. However, handling of bulk updates, i.e., a subgraph at a time, is critical for achieving acceptable performance over very large highly dynamic networks. We propose a subgraph-bysubgraph incremental tracking framework for cluster evolution in this paper. To effectively illustrate the techniques in our framework, we take the event evolution tracking task in social streams as an application, where a social stream and an event are modeled as a dynamic post network and a dynamic cluster respectively. By monitoring through a fading time window, we introduce a skeletal graph to summarize the information in the dynamic network, and formalize cluster evolution patterns using a group of primitive evolution operations and their algebra. Two incremental computation algorithms are developed to maintain clusters and track evolution patterns as time rolls on and the network evolves. Our detailed experimental evaluation on large Twitter datasets demonstrates that our framework can effectively track the complete set of cluster evolution patterns in the whole life cycle from highly dynamic networks on the fly.


Materials

[ Paper in PDF ] [ Poster in PNG ] [ Presentation in PPT ]
[ Data Set: Tech-Lite (4.7MB), Tech-Full (439MB, sent upon request) ]

BibTex

@inproceedings{DBLP:conf/icde/LeeLM14,
  author    = {Pei Lee and
               Laks V. S. Lakshmanan and
               Evangelos E. Milios},
  title     = {Incremental cluster evolution tracking from highly dynamic network
               data},
  booktitle = {{IEEE} 30th International Conference on Data Engineering, Chicago,
               {ICDE} 2014, IL, USA, March 31 - April 4, 2014},
  year      = {2014},
  pages     = {3--14},
  crossref  = {DBLP:conf/icde/2014},
  url       = {http://dx.doi.org/10.1109/ICDE.2014.6816635},
  doi       = {10.1109/ICDE.2014.6816635},
  timestamp = {Tue, 14 Oct 2014 19:44:49 +0200},
  biburl    = {http://dblp.uni-trier.de/rec/bib/conf/icde/LeeLM14},
  bibsource = {dblp computer science bibliography, http://dblp.org}
}