학술논문

Challenges of Resolving Merge Conflicts: A Mining and Survey Study
Document Type
Periodical
Source
IEEE Transactions on Software Engineering IIEEE Trans. Software Eng. Software Engineering, IEEE Transactions on. 48(12):4964-4985 Dec, 2022
Subject
Computing and Processing
Codes
Correlation
Task analysis
Time measurement
Tools
Computer science
Analytical models
Merge conflict resolution
collaborative software development
three-way merge
Language
ISSN
0098-5589
1939-3520
2326-3881
Abstract
In collaborative software development, merge conflicts arise when developers integrate concurrent code changes. Practitioners seek to minimize the number of merge conflicts because resolving them is difficult, time consuming, and often an error-prone task. Despite a substantial number of studies investigating merge conflicts, the challenges in merge conflict resolution are not well understood. Our goal is to investigate which factors make merge conflicts longer to resolve in practice. To this end, we performed a two-phase study. First, we analyzed 66 projects containing around 81 thousand merge scenarios, involving 2 million files and over 10 million chunks. For this analysis, we use rank correlation, principal component analysis, multiple regression model, and effect-size analysis to investigate which independent variables (e.g., number of conflicting chunks and files) mostly influence our dependent variable (i.e., time to merge). We found that the number of chunks , lines of code , conflicting chunks , developers involved , conflicting lines of code , conflicting files , and the complexity of the conflicting code influence the merge conflict resolution time. Second, we surveyed 140 developers from our subject projects aiming at cross-validating our results from the first phase of our study. As main results, (i) we found that committing small chunks makes merge conflict resolution faster when leaving other independent variables untouched, (ii) we found evidence that merge scenario characteristics (e.g., the number of lines of code or chunks changed in the merge scenario) are stronger correlated with our dependent variable than merge conflict characteristics (e.g., the number of lines of code or chunks in conflict), (iii) we devise a taxonomy of four types of challenges in merge conflict resolution, and (iv) we observed that the inherent dependencies among conflicting and non-conflicting code is one of the main factors influencing the merge conflict resolution time.