Non-Essential Changes in Version Histories

David Kawrykow and Martin P. Robillard
McGill University, Canada

Numerous techniques involve mining change data captured in software archives to assist engineering efforts, for example to identify components that tend to evolve together. We observed that important changes to software artifacts are sometimes accompanied by numerous non-essential modifications, such as local variable refactorings, or textual differences induced as part of a rename refactoring. We developed a tool-supported technique for detecting nonessential code differences in the revision histories of software systems. We used our technique to investigate code changes in over 24 000 change sets gathered from the change histories of seven long-lived open-source systems. We found that up to 15.5% of a system’s method updates were due solely to non-essential differences. We also report on numerous observations on the distribution of non-essential differences in change history and their potential impact on change-based analyses.