Shape Descriptors for Maximally Stable Extremal Regions

By Per-Erik ForssÚn


This work introduces an affine invariant shape descriptor for maximally stable extremal regions (MSER). Affine invariant feature descriptors are normally computed by sampling the original grey-scale image in an invariant frame defined from each detected feature, but we instead use only the shape of the detected MSER itself. This has the advantage that features can be reliably matched regardless of the appearance of the surroundings of the actual region. The descriptor is computed using the scale invariant feature transform (SIFT), with the resampled MSER binary mask as input. We also show that the original MSER detector can be modified to achieve better scale invariance by detecting MSERs in a scale pyramid. We make extensive comparisons of the proposed feature against a SIFT descriptor computed on grey-scale patches, and also explore the possibility of grouping the shape descriptors into pairs to incorporate more context. While the descriptor does not perform as well on planar scenes, we demonstrate various categories of full 3D scenes where it outperforms the SIFT descriptor computed on grey-scale patches. The shape descriptor is also shown to be more robust to changes in illumination. We show that a system can achieve the best performance under a range of imaging conditions by matching both the texture and shape descriptors.

Back to the LCI Forum page