NeurIPS’22: Researchers propose AutoLink, for automatic image annotation in machine learning
This is part 4 of a series featuring some of the UBC CS department’s accepted papers for NeurIPS 2022 (conference runs Nov. 29 – Dec. 9)
Dr. Helge Rhodin, assistant professor of UBC Computer Science and his co-authors, grad student Xingzhe He and postdoc Bastian Wandt (now assistant professor at Linkoping University), have a paper accepted at the premiere conference in machine learning and AI: NeurIPS 2022. They have been invited to give a lighting talk (1minute pre-recorded presentation) at the conference.
AutoLink: Self-supervised Learning of Human Skeletons and Object Outlines by Linking Keypoints
Xingzhe said, “We were working on image editing when we formed the idea for AutoLink. Detailed image editing usually requires corresponding keypoints or segmentation masks. For common objects like faces, there are many datasets in existence containing keypoints. However, for many other datasets, like animals, there are few keypoint labels.” Furthermore, Xingzhe points out that to train a neural network to detect keypoints, you require a large amount of annotated (labelled) data. “We can annotate images by ourselves, but annotating hundreds of images is very tedious. It can take that individual a significant number of weeks or months to complete. So we developed a way to automatically detect the key points without any annotation. We call it AutoLink, as it links keypoints to a skeleton as a by-product.”
Keypoints are key
Xingzhe explained that objects from the same category usually share similar parts, such as eyes on a face, or the head of a human. Being able to identify those key parts in an image is a classic and important topic in computer vision. In machine learning, people usually refer to those key parts as keypoints. The keypoints can then be used to transfer the appearance or pose, for telepresence or for motion analysis in sports and medicine.
No annotation required
“Our work, AutoLink, detects keypoints from images without any annotation, by reconstructing the given images. Consider that, if you cover your eyes with your hands, people can hardly tell if your eyes are open or closed. But what if they could ‘see’ the keypoints (locations) of your upper eye lid and lower eye lid? When your eyes are open, the two keypoints are away from each other. When your eyes are closed, the two keypoints are closer to each other. Thus, if one knows the two keypoints, one can easily tell if the eyes are open or closed, even if a hand is covering them."
"The same logic applies to our approach. If we mask the majority of the object in the image, we can hardly tell what the object looks like. However, if we train a second network to predict the keypoints of the objects, we can reconstruct the original image from the masked image more easily," he said.
Xingzhe says that in the beginning, the detected keypoints are highly inaccurate. They cannot represent the object well. However, after thousands of refinements, the keypoints become meaningful, and the reconstructed image becomes better. One of their key findings is that learning not only the location of keypoints but also their connection improves localization accuracy and provides an object topology model as a by-product, without any annotations on the images.
In terms of real-world practicalities, the researchers imagine AutoLink being helpful in many areas. Take for example, images and videos of wild animals. Modeling their keypoints could help scientists track their healthiness and habits. In the medical field, people could use AutoLink to label the countless unlabelled images and videos that exist, in order to help draw more and better medical conclusions.
Learn more about Helge Rhodin, Xingzhe He and Bastian Wandt.
Learn more about the UBC Computer Vision group.
In total, the department has 13 accepted papers by 9 professors at the NeurIPS conference. Read more about the papers and their authors