Challenge submissions

The submission website is password-protected to prevent abuse. Please contact the organizers at image-matching@googlegroups.com for the password (please account for short delays in answering and uploading close the deadline). Please upload the results as a zip or tarball containing the JSON file and your features/matches, if applicable. You can also check the status of your submission via the status tracking spreadsheet.

We have evaluated several methods from our baseline repository as part of an upcoming paper — please let us know if you authored these papers and would like to have the submission under your name.

Constraints

Submissions are broken down into categories by number of keypoints and descriptor size.
For number of keypoints: we consider a limited budget of 2048 features, and an unlimited budget (capped to 8000 features per image for practical reasons).
For descriptor size: we focus on 128-dimensional floating-point descriptors (as float32), the standard set by SIFT, which remains the most common value in the literature. We also consider small descriptors (1/4 the baseline size) and descriptors of unlimited size (capped for 4x the baseline size, for practical reasons).
We also allow binary descriptors (as uint8), with equivalent categories: for example, 128 float32 translates to 512 bytes.
For the sake of simplicity, we will not consider other feature types (e.g. more aggressive quantization with float16).

We recommend using built-in methods for the robust estimator in stereo. For practical reasons, we cap the number of iterations:

MAGSAC: 25k.
GC-RANSAC: 25K.
DEGENSAC with the degeneracy check on: 100k.
DEGENSAC with the degeneracy check off (i.e. "PyRANSAC"): 250k.
OpenCV RANSAC: 100k. Note that OpenCV fixes the number of iterations to 1k, but we have a patched version which removes this limitation. We recommend not using this option as it is not available on all the evaluation clusters and typically performs the worst.

These are generous limits, close to saturated performance.

The benchmark is quite heavy in terms of computational cost, so we will enforce a limit on the number of submissions: each group may send two submissions per week. We may adjust this limit in the future. Hyperparameter tuning on the test set is not allowed and may be penalized.

Terms and conditions

The primary focus of this challenge is to provide tools to facilitate the understanding of the wide-baseline image matching problem and advance the state of the art. However, any challenge needs to take measures against unfair practices (and outright cheating). One example: while it would be possible to run RANSAC for tens of millions of iterations, it would not help advance science.

Determining a winner

Each submission will be scored and ranked by the evaluation metric, as stated on the competition website, which is the mean Average Accuracy (mAA) averaged across the stereo and multi-view tasks. The organizers reserve the right to update these and any other terms until April 1, 2020, as more tasks may be added to the benchmark.

Explaining your submission

Participants should provide a short description of their solution, where they detail their method and the data used for training and/or hyperparameter selection (if applicable), and guarantee it does not overlap our test set. To this end we aks them to upload a short, non-anonymous submission to arXiv within one week after the end of the submission period (check the front page for details). A template may be provided. In order to account for third party delays, it is also acceptable to send the PDF by email to image-matching@googlegroups.com before uploading to arXiv. The organizers reserve the right to disqualify participants from the challenge for failing to meet these requirements.

The organizers reserve the right to disqualify participants if there is reasonable cause to suspect of cheating or unfair practices.

The organizers reserve the right to request code from winning submissions to reproduce the results. If necessary, this would be done privately, respecting the license, and organizers would delete the code after such verification. Open-sourcing submissions are welcomed but not required.

Hyperparameter tuning on the test set (via multiple submissions or any other means) is not allowed.

The use of multiple accounts is not allowed.

The LaTeX template for this description can be found here:

Submission spec LaTeX kit

Anonymous submissions

We allow anonymous submissions, but discourage them. The following fields will be anonymized: authors, contact_email, method_description, link_to_website, link_to_pdf. We still require them to be properly filled with enough information to get a basic understanding of the submssion. The following fields will always be public: keypoint, descriptor, method_name. The latter is used as a label for the method and should be self-explanatory (e.g. not a random string). Please note that the organizers will release all information after the challenge deadline. Methods with incomplete descriptions (e.g. description: "Paper under review.") will not be processed. A 1-, 2-line description should not be enough to fully reproduce your method. Metadata may be edited after being processed only to add links (paper, code repository) or to fix errors.

Using pretrained models

The use of pretrained models (e.g. ImageNet, parts of the MegaDepth dataset, and so on) is allowed, as long as they were not trained on data related to the test set: see the next section. For example, using a pretrained model which is trained on the full YCC100M is prohibited, as it overlaps with our test set. Participants must disclose the pretrained models they used in their description and guarantee that any data they used does not overlap with our test data, in terms of images and scenes. According to colleagues, this is the list of overlapping scenes from the MegaDepth dataset: 0024 (British Museum), 0021 (Lincoln Memorial Statue), 0025 (London Bridge), 1589 (Mount Rushmore), 0019 (Sagrada Familia), 0008 (Piazza San Marco), 0032 (Florence Cathedral), 0063 (Milan Cathedral). Additionally, 0015 and 0022 contain our validation scenes (which are not banned, but may bias your models). Participants should ensure that this information is correct: the organizers take no responsibility for it.

Regarding our test set

Using the benchmark’s test set in any way other than to produce the submission as instructed by the organizers is prohibited. Scraping the test data or obtaining it from other sources and using it for training, validation, or any other purpose is prohibited. Using images showing the scenes or landmarks present in the test set is prohibited, even if images are different from ones in the test set. For example, no image depicting London Bridge can be used for training, validation, or any other purpose. This includes pictures or drawings even if they were made by participants themselves.

Prizes

Prizes (to be announced later) are available thanks to our sponsors and will be awarded at the conference, or sent by post after it. Due to trade sanctions, we are unable to award prizes to residents of the following countries/regions: Cuba, Iran, North Korea, Sudan, Syria, and Crimea.

Challenge organizers may participate in the challenge but are not eligible for prizes.

Metadata and configuration

The submission format has changed for the second edition of the challenge. Both submission metadata and method settings will be specified as a single JSON file with the following structure:

{
  "metadata": {
      "publish_anonymously": false,  /* Must be public after deadline to claim prize */
      "authors": "Challenge organizers",
      "contact_email": "image-matching@googlegroups.com",
      "method_name": "RootSIFT with DEGENSAC for stereo", /* Leaderboards label */
      "method_description": "An example with vanilla RootSIFT features, with cycle-consistent matching and ratio test. For stereo, we use DEGENSAC (Chum et al, CVPR'05) with optimal settings.",
      "link_to_website": "https://www.myproject.org",
      "link_to_pdf": "",  /* Empty */
  },
  "config_common": {
      "json_label": "rootsift-degensac", /* Results file: use a safe string */
      "keypoint": "siftdef", /* SIFT detector, default */
      "descriptor": "rootsift", /* SIFT descriptors */
      "num_keypoints": 2048,
  },
  "config_phototourism_stereo": {
     (...)

In this example we use (rooted) OpenCV SIFT, for which the benchmark has wrappers. In particular, we specify siftdef as a detector (SIFT with the default detection threshold), and rootsift as a descriptor (see methods/local_feature for options). Modern features can be computed outside the benchmark and imported with the appropriate script: the baselines repository contains several examples.

Results will be saved into <json_label>.json, and labeled as <method_name> in the website: short and descriptive names are preferred. Additional information can be placed into method_description.

The following fields contain the configuration for every dataset and task:

  (...)
  "config_phototourism_stereo": {
      "use_custom_matches": false,
      "matcher": {
           "method": "nn",  /* see methods/feature_matching/nn.py for options */
           "distance": "l2",  /* L2 or Hamming */
           "flann": true,  /* Fast Library for Approximate Nearest Neighbours: disable for binary features */
           "num_nn": 1,  /* Number of nearest neighbours */
           "filtering": {
               "type": "snn_ratio_pairwise",  /* Standard ratio test */
               "threshold": 0.90,  /* Ratio test threshold */
           },
           "symmetric": {
               "enabled": true,
               "reduce": "both",  /* Symmetric matching with cycle consistency */
           },
      },
      "outlier_filter": {
          "method": "none",  /* No additional filtering */
      },
      "geom": {
          "method": "cmp-degensac-f",  /* DEGENSAC (Chum et al, CVPR 2005) */
          "threshold": 0.75,  /* Inlier threshold */
          "confidence": 0.999999,  /* Confidence threshold */
          "max_iter": 100000,  /* Maximum number of iterations */
          "error_type": "sampson",
          "degeneracy_check": true,
      },
  },
  "config_phototourism_multiview": {
      "use_custom_matches": false,
      "matcher": {
           "method": "nn",
           "distance": "L2",
           "flann": true,
           "num_nn": 1,
           "filtering": {
               "type": "snn_ratio_pairwise",
               "threshold": 0.95,  /* Relax or tighten the ratio test for SfM */
           },
           "symmetric": {
               "enabled": true,
               "reduce": "both",
           },
      },
      "outlier_filter": {
          "method": "none",
      },
      "colmap": {},  /* Currently unused */
  }

This simple baseline is not far from the current state of the art. For more details, such as the option to bypass robust matchers by providing your own set of matches, please refer to the documentation in the benchmark repository.

Finally, you can also specify multiple entries with the same JSON file (which should be an array in any case):

[
   { <method_1> },
   { <method_2> },
]

More examples are available on json/method.

Formatting your features and matches

The submission above does not require a data submission, as it relies on built-in features and algorithms. You will most likely want to provide your own features and, optionally, matches, which are stored into HDF5 files.

Let's consider an example with a toy scene with four images:

$ ls reichstag
06373813_9127207861.jpg
63790741_1504116525.jpg
77023786_7168337563.jpg
92481126_2062182859.jpg

A typical submission will require HDF5 files containing the keypoints, descriptors, scores (and optionally scales and orientations, which are currently unused) and pre-computed matches for submissions with custom matches.

Let's look at an example. Feature-based files (keypoints, descriptors, etc.) must contain one key for each image. The keypoint file must contain an array with N keypoints with 0-indexed (x, y) coordinates, with the origin in the top left corner. We do not use scale or orientation but feel free to include them — we might in the future. If you sort them by score we can run multiple evaluations for the same submission while subsampling the number of keypoints.

>>> import h5py
>>> with h5py.File('keypoints.h5') as f:
>>>     for k, v in f.items():
>>>         print((k, v.shape))

('06373813_9127207861', (512, 2))
('63790741_1504116525', (512, 2))
('77023786_7168337563', (512, 2))
('92481126_2062182859', (513, 2))

Descriptors should be consistent with the list of keypoints. They must be stored as float32, for floating-point descriptors, and uint8 for binary descriptors.

>>> import h5py
>>> with h5py.File('descriptors.h5') as f:
>>>     for k, v in f.items():
>>>         print((k, v.shape))

('06373813_9127207861', (512, 128))
('63790741_1504116525', (512, 128))
('77023786_7168337563', (512, 128))
('92481126_2062182859', (513, 128))

If you want to specify your own matches, you will have to provide them for every possible pair of images in each scene. The match file should contain a key for every match using the convention LARGEST_KEY-SMALLEST_KEY. For instance, for this toy set the file would contain six keys, as follows:

>>> import h5py
>>> with h5py.File('matches-stereo-run0.h5') as f:
>>>     for k, v in f.items():
>>>         print((k, v.shape))

('63790741_1504116525-06373813_9127207861', (2, 512))
('77023786_7168337563-06373813_9127207861', (2, 512))
('77023786_7168337563-63790741_1504116525', (2, 512))
('92481126_2062182859-06373813_9127207861', (2, 513))
('92481126_2062182859-63790741_1504116525', (2, 513))
('92481126_2062182859-77023786_7168337563', (2, 513))

Each of these entries stores a list of matches between the image encoded by the first key and the second key, such that:

>>> with h5py.File('matches.h5') as f:
>>>    print(f['63790741_1504116525-06373813_9127207861'].value)

[[  0   1   2 ... 509 510 511]
 [398  87  18 ... 502 467 458]]

So that keypoint 0 in 63790741_1504116525.jpg matches with keypoint 398 in 06373813_9127207861.jpg and so on.

IMPORTANT NOTE: For stereo, custom matches are taken to be the output of a robust matcher (such as RANSAC) and evaluated directly: you should thus specify cv2-8pt as your geom method in the configuration file (please note this is using least-squares DLT). For multi-view, they are given to Colmap. As such, you might want to provide different settings for each track. You can submit a single match file per scene, as matches.h5, which will be used for both stereo and multiview, or use separate files: matches_stereo_0.h5 for stereo and matches_multiview.h5. For stereo, you can submit a single file or three (suffixed _0, _1, and _2, respectively) if your method is stochastic, which will be run separately and averaged, similarly to how we perform the submissions using RANSAC variants instead of custom matches.

More examples are available in the baseline repository. For up-to-date documentation, please refer to the benchmark documentation here.