Code duplication� ?is against the software engineering best practice of� ?code reusability.� ? Some of the major� ?disadvantages� ?of code duplication are the following

  1. It Increases the number of� ?Lines of Code (LOC), which impacts the performance of the software.
  2. Need to write� ?extra unit tests� ?to cover each duplicate method� ?to maintain a good coverage.
  3. Needs to� ?make changes in multiple files� ?for a change due to code duplication. This will impact the maintenance� ? cost
  4. Highlights the� ?lack of quality� ?of the software team.

Different types of Duplication

The code duplication can be broadly classified in to three types
  • Different Methods but with Identical LOC
� ? � ? � ? � ? � ? � ? � ?Example: � ?methods -� ?m1 ()� ?and� ?m2()� ?which contain the same identical LOC.
  • Same Method with Identical LOC but in different class files
� ? � ? � ? � ? � ? � ? � ?Example: � ?method� ?m1 ()� ?in class A and class B which has the same LOC
  • Identical set of LOC in multiple methods
� ? � ? � ? � ? � ? � ? � ?Example: � ?Two methods -� ?m1()� ?and� ?m2()� ?in same or in different files ,which contains 30 LOC but 15 LOC are identical
  • Similar LOC
� ? � ? � ? � ? � ? � ? � ? Example: Two methods� ?m1()� ?and� ?m2()� ?which has “almost” similar LOC but parameter and attribute names are different.

This article addresses the� ?first three category� ?of code duplication.

 

Duplication Elimination Procedure

Here are some of the common methods through which we can remove these duplication.

 

Different Methods but with Identical LOC & same Method with Identical LOC but in different class files

Example: � ?methods -� ?m1 ()� ?and� ?m2()� ?which contain the same identical LOC.

Method-1 : Delete and Redirect approach

  • Maintain one method (e.g.:� ?m1 ()� ?) which will be used throughout the software.
  • Delete the contents of other methods (e.g.:� ?m2()� ?and replace it with actual method call -� ?m1()

[code]
public void m1() {
// LOC
}

public void m2() {
// removing its content and replace with m1() call
m1()
}

public void m3() {
// removing its content and replace with m1() call
m1()
}

[/code]

Method-2 : Delete and modify reference approach
  • Maintain one method (e.g.:� ?m1()� ?)which will be used throughout the software.
  • Delete all other identical methods. (e.g.: delete� ?m2(),� ?m3()� ?etc which are all identical methods of� ?m1()� ?)
  • Identify the code location from which the deleted methods are referenced and replace it with the unique method. (e.g.: � ?All calls to� ?m2()� ?and� ?m3()� ?must be replaced with� ?m1()� ?

 

Identical set of LOC in multiple methods

In this case not all LOC of methods are identical but a good percentage is identical.Example: � ?Two methods -� ?m1()� ?and� ?m2()� ?in same or in different files ,which contains 30 LOC but 15 LOC are identicalThe elimination procedure is slightly complicated than the previous ones for this scenario.

  1. Identify a less complex method which contains this identical code and make sure that it has Unit tests with good coverage.� ? e.g.: � ?m1 ()
  2. Create a new method and copy all the identical LOC to that method. e.g.:� ?mn()
  3. Check whether these LOC is using any parameter / attribute reference which were a part of the parent method and if so add that to the method signature. � ?e.g.: if the LOC in mn() is referencing to an amount parameter then re-define the method signature as� ?mn (int amount)
  4. Replace the LOC in parent method with the new method reference and passing the relevant parameters. Example� ?mn( 100)
  5. Run all the unit tests for the parent method (e.g.:� ?m1()� ?) and make sure it all got passed.
  6. Now apply step-4 and step-5 to other duplicate methods sharing the same identical LOC. e.g.: if� ?m2()� ?and� ?m3()� ?also has the same
  7. LOC as in� ?m1()� ?which was moved to� ?mn(int amount), then delete those LOC from� ?m2()� ?and� ?m3()� ?and replace it with� ?mn()� ?call.

Where to create the new methods

In the above mentioned elimination approaches we are creating a new method.� ? However where to maintain this new method depends on the nature of the method.� ? However here are some generic guidelines.

  1. If it is a common method like Date formatting, it can be maintained in a library or utility class which can be used by all classes.
  2. If it is a method in a derived class, then move to the base class.
  3. If they are methods in two different classes, then check for the feasibility of introducing a base class. If the base class is not meaningful, consider it moving to a utility class.

There is one more kind of code duplication -� ?Similar LOC.� ? They are not identical but behavior is similar. This will be addressed in a separate article.