UBC Logo
Real-Time Human Motion Capture with Multiple Depth Cameras
A. Shafaei, J. J. Little


A. Shafaei, J. J. Little. Real-Time Human Motion Capture with Multiple Depth Cameras. In 13th Conference on Computer and Robot Vision, Victoria, Canada, 2016. (Oral Presentation)
[PDF] [BibTeX]

  author = {Shafaei, Alireza and Little, James J.},
  title = {Real-Time Human Motion Capture with Multiple Depth Cameras},
  booktitle = {Proceedings of the 13th Conference on Computer and Robot Vision},
  year = {2016},
  organization = {Canadian Image Processing and Pattern Recognition Society (CIPPRS)},
  url = {http://www.cs.ubc.ca/~shafaei/homepage/projects/crv16.php}


Commonly used human motion capture systems require intrusive attachment of markers that are visually tracked with multiple cameras. In this work we present an efficient and inexpensive solution to markerless motion capture using only a few Kinect sensors. Unlike the previous work on 3d pose estimation using a single depth camera, we relax constraints on the camera location and do not assume a co-operative user. We apply recent image segmentation techniques to depth images and use curriculum learning to train our system on purely synthetic data. Our method accurately localizes body parts without requiring an explicit shape model. The body joint locations are then recovered by combining evidence from multiple views in real-time. We also introduce a dataset of ~6 million synthetic depth frames for pose estimation from multiple cameras and exceed state-of-the-art results on the Berkeley MHAD dataset.


We use the CNN showed below to generate densely classified depth images (the idea is based on FCNs of Long et al. [1]). The classes of interest are presented in Figure 1. Note that, unlike the previous work on depth-based pose estimation, we differentiate the sides of the body. We generate over 6 million synthetic depth frames from three random viewpoints with varying complexity. Our network is then trained on purely synthetic data while applying the curriculum learning technique [2].

Figure 2. Our convolutional network for dense depth image segmentation.

For more information please see the paper.

Code and Data

  1. UBC3V Dataset Matlab Toolkit github/ubc3v
  2. UBC3V Easy-Pose Dataset 36 GB Download
  3. UBC3V Inter-Pose Dataset 39 GB Download
  4. UBC3V Hard-pose Dataset 14 GB Download
  5. Pre-trained MatConvnet models and Matlab sample code github/dense-depth-body-parts
  6. Pre-trained Caffe models Download
You can find more information about the data in our Github page: github/ubc3v.


  1. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." in CVPR 2015.
  2. Bengio, Yoshua, et al. "Curriculum learning." in ICML 2009.