Mining Software Repositories Using Topic Models

Stephen W. Thomas
Queen's University, Canada

Software repositories, such as source code, email archives, and bug databases, contain unstructured and unlabeled text that is dicult to analyze with traditional techniques. We propose the use of statistical topic models to automatically discover structure in these textual repositories. This discovered structure has the potential to be used in software engineering tasks, such as bug prediction and traceability link recovery. Our research goal is to address the challenges of applying topic models to software repositories.