I am Jian Xu
This is Jian Xu's web page at Computer Science Dept.
UBC Information about my academic life
is updated here. Currently
you can find my resume and publications
About Me
I am a final year PhD student in
CS-UBC and will graduate Fall 2011, I am currently working
in Data Management and mining group.
My supervisor is Prof Rachel Pottinger.
Before I came to UBC, I studied in CS&E,University of New South Wales
and got my master's degree.
I come from Suzhou
China which is famous for its classical gardens. She is a very beautiful small town
and you can find some pictures of her in the Gallary.
My Research
I am interested in many aspects of Computer Science. Currently I am focusing on
topics of data integration. When I was in
Australia, I worked on stream processing.
Click on them to get more detailed information.
I worked in the JIIRP project where I helped researchers to manage and
integrate data for disaster management and response. The project also motivates my research on semantic
integration of domain heterogenous databases and especially query answering issues involved.
Take a look at the publications part for my
writings and papers on these topics.
Data Integration
Data integration is the process to bring data managed in heterogeneous databases together and enable query
answering over them. It distinguishes itself from the traditional database merging approach that one
creates a schema and load data from various sources. Data integration systems avoid duplicating and
reformatting data and allow data managed in different schemas remain in their forms of being managed. It
builds schema mappings between likely distributed schemas and translate queries using these mappings. A query
to the data integration system is written in one schema and is translated to queries written in other schemas
in the system. Translated queries are passed to the corresponding data sources to be processed and answers
are collected and returned.
My research area is relational peer data management (RPDMS). It is a very flexible data integration model in the
big family of data integration architectures. Highlights for an RPDMS are : (1) the peer-to-peer network as
physical networking layer allows flexible organization of data sources; (2) the system uses only pair-wise schema
mappings for query rewriting, thus eliminates burdens of creating and maintaining federated schemas; (3) all peers
in the system are created equally and so do they publish data sources on the semantic overlay network of the PDMS,
which no centralized/crucial node or "hub" like node ensures maximal robustness.
The PDMS integration model is an especially good candidate for disaster management in which an integration system
needs to be set up and deployed very quickly using very limited computational resources, yet allowing adding data
sources at any time during disaster response missions.
I am facing new challenges not yet handled by existing approaches in relational PDMS research and aim to develop
solutions to them. The concrete problems to solve are : (1) To optimize the topology of the semantic overlay network
of a PDMS so that data sources belonging to different domains co-exist in one system minimally affect the query
answering performance and ensure good scalability w.r.t. the number of data sources; (2) to support processing of object
decomposition aggregates which is required in the affiliated JIIRP project. (3) to support continuous query processing
and reasoning with source uncertainty.
The prospective data integration system will extend existing PDMS systems and make it applicable for a much wider class
of data integration scenarios, potentially making database technology more widely used and data sharing and reusing much easier.
Stream processing
This is for Stream processing. To be added.