학술논문

Quality Evaluation of Modern Code Reviews Through Intelligent Biometric Program Comprehension
Document Type
Periodical
Source
IEEE Transactions on Software Engineering IIEEE Trans. Software Eng. Software Engineering, IEEE Transactions on. 49(2):626-645 Feb, 2023
Subject
Computing and Processing
Codes
Software
Computer bugs
Biometrics (access control)
Inspection
Feature extraction
Task analysis
Artificial Intelligence
Biometrics
Code inspections and walkthroughs
Human factors
Language
ISSN
0098-5589
1939-3520
2326-3881
Abstract
Code review is an essential practice in software engineering to spot code defects in the early stages of software development. Modern code reviews (e.g., acceptance or rejection of pull requests with Git) have become less formal than classic Fagan's inspections, lightweight, and more reliant on individuals (i.e., reviewers). However, reviewers may encounter mentally demanding challenges during the code review, such as code comprehension difficulties or distractions that might affect the code review quality. This work proposes a novel approach that evaluates the quality of code reviews in terms of bug-finding effectiveness and provides the reviewers with a clear message of whether the review should be repeated, indicating the code regions that may not have been well-reviewed. The proposed approach utilizes biometric information collected from the reviewer during the review process using non-intrusive biofeedback devices (e.g., smartwatches). Biometric measures such as Heart Rate Variability (HRV) and task-evoked pupillary response are captured as a surrogate of the cognitive state of the reviewer (e.g., mental workload) and inexpensive desktop eye-trackers compatible with the software development settings. This work uses Artificial Intelligence techniques to predict the cognitive load from the extracted biomarkers and classify each code region according to a set of features. The final evaluation considers various factors such as code complexity, time of the code review, the experience level of the reviewer, and other factors. Our experimental results show the approach could predict the review quality with 87.77%±4.65 accuracy and a Spearman correlation coefficient of 0.85 (p-value < 0.001) between the predicted and the actual review performance. This evaluation validates the cognitive load measurement using electroencephalography (EEG) signals as ground truth for the HRV and pupil signals.