A benchmark for video-based mocap retrieval

V3dr is the first benchmark for quantitative evaluation of mocap retrieval. V3dr uses 2000 files from CMU-mocap as database and provides a set of video queries. For a more detailed explanation of the benchmark, check out our paper below.

Video queries taken from YouTube with their top retrieved mocap sequences.

Mocap Annotation

To evaluate retrieval, we provide annotated mocap sequences with action labels. We choose 8 day-to-day action classes (pick-up, sit-down, get-up, walk, punch, kick, throw) and annotate 4.5 hours of mocap data per-frame with the above action labels.

An example mocap annotation. The annotations are not necessarily temporally exclusive (e.g., a person might walk and turn at the same time).


We also provide a total of 320 short video sequences as queries that feature the same actions used for annotating mocap. The queries are taken from two sources: the INRIA XMAS dataset and YouTube.


We randomly pick 160 sequences from the IXMAS dataset. These videos have plain backgrounds and uniform clothing, but the viewpoints vary considerably.


We add another 160 query videos from YouTube. These queries have little variation in viewpoint, but the clothing and backgrounds are realistic.

Conference Paper

Efficient video-based retrieval of human motion with flexible alignment

Ankur Gupta, John He, Julieta Martinez, James J. Little, Robert J. Woodham
IEEE Winter Conference on Applications of Computer Vision

PDF Slides Bibtex



The CMU mocap dataset can be downloaded from github or from google sites.


Our per-frame annotations for 2000 CMU mocap sequences are available here. Get the code to read the annotations in matlab.

IXMAS Queries

The videos from the IXMAS dataset can be downloaded from the INRIA page. A subset of queries that we use as a part of the benchmark are available here.

YouTube Queries

Another set of video queries downloaded from Youtube are available here.

