학술논문

An Exploratory Study of Functional Redundancy in Code Repositories
Document Type
Conference
Source
2017 IEEE 17th International Working Conference on Source Code Analysis and Manipulation (SCAM) SCAM Source Code Analysis and Manipulation (SCAM), 2017 IEEE 17th International Working Conference on. :31-40 Sep, 2017
Subject
Computing and Processing
Redundancy
Cloning
Software
Maintenance engineering
Java
Semantics
Syntactics
Language
ISSN
2470-6892
Abstract
In large code repositories, the probability of functions to repeat across projects is high. This type of functional redundancy (FR) is desirable for recent code reuse and repair approaches. Yet, FR is hard to measure because it is closely related to program equivalence, which is an undecidable problem. This is one of the reasons most studies that investigate redundancy focus on syntactic rather than semantic replication (e.g., cloning). In this paper we evaluate the extent of FR in a code repository with 68 Java projects taken randomly from SourceForge. Our technique approximates function similarity by first searching for methods that possess similar interfaces (return type, name, and parameter types). We then execute these methods to verify which candidate pairs have matching outputs for a given sample of inputs. Some recent studies have also focused on this type of semantic replication, but our detection approach is generally cheaper and more precise, because it focuses on methods and uses interfaces to reduce the search space. Although our scope is restricted to static methods, which makes our results conservative, our findings are promising. In particular, we found 984 pairs of redundant methods, and 28 out of the 68 (41.17%) projects in the repository presented redundancy. Moreover, the majority of redundant methods for which we had access to the source code did not refer to textual clones (only one redundant method pair referred to replicated code). Our study also indicates that the proposed redundancy detection approach has high precision and is generally inexpensive (only four executions were required per method to attain 100% precision).