Support technician: Will the formatting be fixed in excel export too?
Developer: I don’t know. Will have to check… No, I will have to change the code there too.
Support technician: … OK, that will have to be done, they use that all the time.
Developer: Yeah sure, I will check it in as soon as possible. Will get back to you when I am done.
(later, in a planning meeting with whole team)
Product owner: What about the formatting issue, was that problem solved?
Developer: I had to update the code at several different places, so excel export worked in all scenarios.
Product owner: So is it fixed in all places now?
Developer: Well I hope so. I did look for similar strings in the whole code base…
Product owner: I sure hope so. This is getting embarrassing—we keep hitting the same bug about formatting in several different places.
Lead developer: The basic problem is that the code is copy-pasted all over the place. If we had the formatting in one class, all other places could use that class instead of reproducing the same thing everywhere.
Product owner: Even I could understand that…
Lead developer: We really could do better than this. We could adhere to the Don’t Repeat Yourself principle.
DRY is the opposite of copy-paste programming. It tells us to avoid duplicating code. Instead we should use the abstraction mechanisms our programming language provides to generalize code.
For example, when we feel the urge to copy an old function, because it is so similar to what we need right now, we should try and modify the old function to cover the new case too.
DRY is hard…
Modifying an old function for a new use case may sound simple in theory, but it is tricky to get right in practice. One reason is that modifying an old method in the most intuitive way is by adding a parameter. This will make the method harder to read and maintain, something we do not want to contribute to.
But modifying the method by any more long term mechanism than adding a parameter increases the risk of breaking the old use cases. Especially if we need to employ something a little less ubiquitous than plain-old-code, things like abstractions, design patterns or non-trivial language features to achieve a DRY-enough end result.
One possible answer to this “I am afraid to change the old code”-problem is unit tests. If they are not already in place for the method in question, this is an opportunity to add some. After you have done that, you will feel a lot more confident in doing the change to the code.
I recommend Michael Feathers book “Working Effectively with Legacy Code” for inspiration on retro-fitting unit tests to what he calls legacy code. By legacy code he means any code that is not covered by unit tests.
If you add unit tests before you refactor the old code, make sure you write readable unit tests. Otherwise you are going to break another principle of mine – Delete unreadable unit tests (upcoming topic).
Another issue I often struggle with when DRY:ing up code is that of naming and physical location of the new code. When you factor out common code, and thus avoid duplication, you have to name and place the new code somewhere appropriate, following the convention of the rest of the code base and so on. It is easy to slip into avoiding a structured and good-named solution by creating small utility classes with common functionality. Even if I see names ending with “Util” or “Tools” as a symptom that I still could do better when it comes to naming and structure, I prefer having such classes as a temporary measure just to avoid breaking the DRY principle.
On a very abstract level, DRY is important because developer time is the most expensive aspect of developing software. Programmers are not cheap.
Why does not following the DRY principle increase development costs? Because the cost of changing code roughly equals the number of places you have to change something in order to modify its behavior:
Cost of change ~= cost/h * number of necessary modifications
To understand why this equation is plausible, go back to the initial example. If the formatting string was DRY enough, that is, if it was residing in a single class at a specific place in the source code, the change would be trivial. Let’s say it would be done in one hour.
However, if the formatting was spread out in N places in the source, the time to change formatting would be at least N hours. The reason I say it may be more than N hours is because the programmer making the change at most finds the N places. If he does not find all places, each such additional place will require 1) someone else to find it for him 2) reporting the bug 3) the bug creeping all the way from support into the team 4) the bug being fixed. That process will take more time than an hour.
DRY at different scales
Duplication can occur on all scales of software development.
On the micro scale, it can be duplication within a method, like a series of very similar strings being repeated over and over again in a long method. Please read about PITS and remove that micro duplication, it will be fun!
Another example of micro duplication, which is much more severe, is the one of a small piece of code scattered all over the code base. The example described in the beginning of this text is that kind of micro duplication.
On a small to medium scale, the duplication can be similar, or even identical, with algorithms scattered all over the place. It may even be whole subsystems being duplicated at different layers of the system: search logic being repeated in the UI layer, the logic layer and then also in the data access layer.
In the worst case, duplication can be a copy-pasted version of the whole solution for different clients. I am not kidding you here—this is something I have seen done. If you really do not care about the long-term relationship with customers—or remaining sane—I recommend this method of developing software.
DRY is one of my ideals. I say ideal because it is not always easy to follow the principle. I quite often fall down to being pragmatic and living with a less than 100% DRY system. I first read about the DRY principle in The Pragmatic Programmer, a good read for more reasons than this principle.
This was the first blog post about DRY. I have more to write on the topic, but this post is getting long enough already. What is left to write about is some more philosophical aspects of DRY, as well as a big risk with following DRY too strictly: YAGNI (also a separate, upcoming topic of its own).