Harman Patil (Editor)

Duplicate code

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Duplicate code

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons. A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection.

Contents

Some of the ways in which two code sequences can be duplicates of each other are, character-for-character identical, character-for-character identical with white space characters and comments being ignored, token-for-token identical, token-for-token identical with occasional variation or functionally identical.

How duplicates are created

Some of the reasons why duplicate code may be created include Copy and paste programming, which in academic settings may be done as part of plagiarism, or scrounging, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code such as renaming variables or inserting/deleting code. The language nearly always provides facilities to allow one copy of the code to serve multiple purposes, but a copy is created due to the programmer not truly knowing the language, not having the time to do it properly, or not caring about the increased active software rot.

It may also contain functionality that is very similar to that in another part of a program is required and a developer independently writes code that is very similar to what exists elsewhere. Studies suggest, that such independently rewritten code is typically not syntactically similar.

Automatically generated code, where having duplicate code may be desired to increase speed or ease of development, is another reason for duplication. Note that the actual generator will not contain duplicates in its source code, only the output it produces.

Problems associated with duplicate code

Inappropriate code duplication generally makes editing more difficult due to unnecessary increases in complexity and length. This may lead to increased maintenance costs, more human error, forgotten or overlooked pieces of code, greater file size and may be indicative of a sloppy design. Small differences between clones can be indications of missed fault fixes leading to the hypothesis that such clones are related to faults. This, however, is still debated in the scientific community. Probably, there are further factors, such as the developers' awareness of clones, which play a role in this relationship. Appropriate code duplication may occur for many reasons, including facilitating the development of a device driver for a device that is similar to some existing device

Detecting duplicate code

A number of different algorithms have been proposed to detect duplicate code. For example:

  • Baker's algorithm.
  • Rabin–Karp string search algorithm.
  • Using Abstract Syntax Trees.
  • Visual clone detection.
  • Count Matrix Clone Detection.
  • Locality-sensitive hashing
  • Example of functionally duplicate code

    Consider the following code snippet for calculating the average of an array of integers

    The two loops can be rewritten as the single function:

    Using the above function will give source code that has no loop duplication:

    Note that in this trivial case, the compiler may choose to inline both calls to the function, such that the resulting machine code is identical for both the duplicated and non-duplicated examples above. If the function is not inlined, then the additional overhead of the function calls will probably take longer to run (on the order of 10 processor instructions for most high-performance languages). This additional could theoretically be a problem.

    References

    Duplicate code Wikipedia