Code Shovel

New Tool for Digging up Source Code Takes Seconds

A team of UBC computer science researchers have won an ACM SIGSOFT Distinguished Paper award at the International Conference in Software Engineering (ICSE) in May this year.

The project began as a research project and thesis for a master’s student, and evolved into a tool that has now been recognized at the premiere software engineering conference.

CodeShovel: Constructing Method-Level Source Code Histories  describes the tool ‘CodeShovel,’ to help developers understand how their systems have changed. The research team includes Felix Grund, who spearheaded the work as part of his Master’s thesis, Dr. Reid Holmes, postdoctoral research fellow Shaiful Chowdhury, PhD student Nick Bradley, and master’s student (undergrad at the time) Braxton Hall.

A better tool for digging up source code history


“I called it CodeShovel because it seemed like archaeology in a way,” said Grund. “Like digging through history and revealing more information about fragments of code.”

Source code histories are commonly used by developers and researchers to reason about how software evolves. “Code is continually evolving, and if a software developer wants to change their code, it's often helpful to understand the entire history,” explained Dr. Holmes.

Through their research, the team learned that developers’ current state-of-the art tools are not robust enough to capture the whole story or in a timely manner given the rapid, and often large, changes that are made to source code. CodeShovel is a considerably more efficient tool for uncovering method histories and can quickly produce the complete and accurate change histories for 90% of methods (including 97% of all method changes).
 

CodeShovel’s accuracy across different types of source code transformations. CodeShovel does not exhibit weaknesses on any particular type of change.

Change Type

Occurrence

Accuracy (# failures)

BodyChange

527

99.2% (4)

FileRename

167

100.0% (0)

Introduced

100

98.0% (2)

ParameterChange

73

100.0% (0)

MoveFromFile

41

100.0% (0)

Rename

23

91.3% (2)

ModifierChange

20

100.0% (0)

ReturnTypeChange

17

100.0% (0)

ParameterMetaChange

14

100.0% (0)

ExceptionsChange

8

100.0% (0)

MultiChange

99

97.9% (2)

CodeShovel significantly outperforms the leading tools and helps developers navigate the entire history of source code methods (e.g., modifications, renaming, moved between files and directories) so developers can better understand how the method evolved.

Subsequent field studies with industrial teams conducted by the UBC researchers, confirmed the empirical findings. That is, CodeShovel improves correctness, results in low runtime overheads, and that the approach can be useful for a wide range of industrial development tasks.

The validity and success of CodeShovel was definitely not earned overnight, but the team (and in particular Grund) refused to give up. “We don't shy away from acknowledging that we submitted before, and that the paper was rejected,” said Holmes. “Because ultimately, it was through the peer review process and the valuable feedback of those experts which enabled us to adapt and integrate solutions into the paper. It wasn't the tool itself that needed work. Felix built an amazing tool. But from the feedback, we improved the front end/user interface significantly and conducted an additional study before resubmitting.”

Out of the more than 600 submissions at ICSE this year, only 22% were accepted, and 2% of submissions received the coveted Distinguished Paper award. This paper received great acclaim by the reviewers. A few of the reviewers had this to say:

“The key novelty of CodeShovel is that it can compute method histories on demand (~2 sec) and without any pre-processing while achieving 90% recall and 99% precision.”

 “This work is well-rounded building on a survey of professional developers, does a good comparison to related work, evaluates the tool with a (manually created) oracle, and then closes the loop with a survey of 16 developers (they evaluated whether the tool correctly identified the commit history).”

Output in seconds

Holmes explained that CodeShovel provides the ability to get the history of a source code method extremely quickly. “Past approaches could take hours and Felix has figured out how to do it in less than two seconds,” he said. The real beauty of CodeShovel is that it provides seamless ways to interact with the tool, so you can gather all the history without having to install anything at all on your computer.”

“I try to explain it as this black box that requires you to specify the method you’re interested in,” Felix said. “Then the output from the black box will be the history you need, and done in a second or two.”

Holmes added, “That’s the very thing about this project that’s certainly novel. You can get high precision results very quickly without pre-computation, whereas other approaches usually need to chew on a program for hours in advance.”

The joy of success

When Felix learned they’d won the Distinguished Paper award, he was thrilled. “It just felt really great, mostly because I had already had a few rejections.”CodeShovel Certificate

Holmes agreed saying, “Although it shouldn't feel any different to win this award after the paper was rejected previously, it just does. Also, the fact that this is a Master’s student at a Canadian university building this tool and winning international recognition for software engineering is astounding.”

Bradley said, “It's great to see this level of recognition after all the work that went into it. It came as a bit of a surprise, but is so well deserved.”

The team conducted the research in the Software Practices Lab at UBC Computer Science in Vancouver, Canada.

More about the International Conference in Software Engineering