I’ve been writing code for over ten years, and I’ve seen the entire spectrum of code quality.
During that time, I’ve seen plenty of projects that are in a constant state of decay, and I’ve seen projects that are in a state of constant growth.
What’s the difference? The difference is that the companies that are constantly pushing their code forward are the ones that can compete in the market. On the other hand, companies that don’t improve their code are stuck with legacy code, which slows them down.
What is a legacy code? Legacy code is a code that has the following characteristics:
- hard to understand and change
- doesn’t have tests
- written by someone else
- everyone is afraid to touch it
- has business value
Let’s face the facts: legacy code is everywhere. That’s why it is important to understand how to tackle it and how to improve it.
What is legacy code?
You might have heard the term “legacy code” before, but what does it mean?
Well, there are multiple answers to that.
In his book, Working Effectively with Legacy Code, Michael Feathers states that:
To me, legacy code is simply code without tests.
– Michael Feathers, Working Effectively with Legacy Code
The problem is that this definition doesn’t capture in full the characteristics of legacy code that you are fighting against. Let’s go explore them a bit more.
Legacy code is code without tests. You can’t easily refactor the code without tests, which means that is more likely you will make a mistake. As a result, changes to this code can be very risky. Changing a legacy system without tests and extensive knowledge of the code is like walking blindfolded in a maze full of traps.
Legacy code doesn’t have tests
Testing is a form of documentation, so the lack of tests is a lack of documentation. Without documentation, developers must inspect the code, debug it, and try to understand what it does.
But not all code is easy to understand.
Legacy code is hard to understand and change
Legacy code is a term given to the existing code that is difficult to understand. This is primarily because it was written earlier, and programming practices have changed since then.
Look, developers don’t write bad code because they are evil. They write it because they lack time to write a better code or don’t know how to do it better.
And like any other, the legacy code has once started as a greenfield project. In the beginning, it was in pretty decent shape. But over time, as more code changes occur, the code shape starts to change for the worse.
The code becomes outdated. And now you have to deal with it.
Legacy code was (mostly) written by someone else
Developers often need to maintain a legacy project written by developers that are now gone. Often, it is hard to figure out how to fix the code because of its complexity and age. Many developers may not have been a part of the team that originally wrote the code to make matters worse.
Though the developer’s goal is to change the legacy application to suit his needs, this is often easier said than done. Legacy code can be hard to decipher since another software developer wrote it, and the next person doesn’t have all the knowledge and experience that came from writing that code. And if you don’t understand something, then it becomes more difficult to change it.
You might think that the best thing would be to rewrite the old system from scratch. But on large codebases, that can be costly and dangerous since the legacy code has business value. It makes money.
Legacy code has business value
Legacy code is code with business value. It’s the code that has been tweaked over the years and has served well the customers who have used the application in the past. And will continue to serve them.
Many companies have bad legacy code because they have been in business for a long time or developed a project a long time ago. But the code age doesn’t always mean that the code is bad. Some codebases out there are not that old, but still, the code is a mess.
There is always an age-old debate about what is more important, code maintainability or business value. However, the more you know about software, the more you will realize the following conclusion.
For example, if you go to your boss and tell him that you want to spend a few weeks improving the code, the answer you will most likely get is no. Why? Because the business people don’t care about code health. They care about company health. They care that you implement the new features and fix the existing bugs so that the customers are happy.
So, who should care about code health? You. How? By continuously refactoring your code. You will learn more about refactoring in the later section.
Legacy code – updated definition
So the updated definition could be:
Legacy code is code that works and makes money, has no tests, and everyone is afraid to touch it.
– Kristijan Kralj, How to Have a Healthier Relationship With Legacy Code
How does the normal code become a legacy code?
You know how all those fairytales start the same: Once upon a time…?
Well, once upon a time, there was a new code that some programmer just freshly wrote. After that, things start to happen to the code:
- There was not enough time to improve it
- The new features had to be added quickly
- A junior developer who worked on it didn’t understand it completely
- The original developers who worked on it moved to another company
- An excuse was made that there was no time to write tests
- Technology and software trends have changed
And all that resulted that, once new, that code becomes a legacy code. One common source of the legacy code is something known as technical debt.
Technical debt
Technical debt refers to the amount of time and money you save now by taking shortcuts with code and design, but in turn, this creates a heavier burden in the future, which can be difficult and expensive to manage. Instead of investing the time and resources to do it properly, an individual or organization will use a cheap hack to get the job done and move on. Unfortunately, each time someone uses a hack, a debt is created, and it adds up quickly. Eventually, the debt negatively affects the individual or organization as more time and resources are needed to keep up with the debt.
Technical debt issues arise when development teams want to release a product as quickly as possible. As a result, they might cut certain corners or simply write the code as they go, and while this might work in the short term, if they try to change that code later, they will face delays.
For example, if a programmer writes code that is difficult to maintain but easy to develop, they will need to spend much more time and effort keeping it updated and bug-free.
There are two types of technical debt: explicit and implicit. Explicit is when you make a conscious decision to go ahead with the shortcut, and implicit happens when you go ahead with the shortcut without realizing it.
How to fix the issue?
This leaves a developer with two options: find a way to improve the code or rewrite it. To improve legacy code, it is best to apply incremental changes to the code base. You can make small changes to the code without having to drastically redesign the entire project. Of course, it will take longer, but it will be much more manageable and reduce the chance of rewriting code.
Another way is to rewrite the code. There are some negatives to rewrites. Rewrites take a lot of time and resources that you may not have. Another concern is that once the team completes the rewrite, it may not be as stable or have as many test cases as the legacy code. The other danger is that new code can also become a legacy code by the time you finish rewriting.
What is refactoring?
Refactoring is the process of updating the structure of code without changing its functionality. It’s a way of improving your code without changing how it works, but rather improving how it looks.
A good analogy is house renovation: you may not need to change how the house looks, but you may want to update the paint, install a new kitchen, and put in a new bathroom.
Your code may still have all of the same functionality but with improvements in its organization or readability. Every time you need to change your code, you can think of the refactoring process as an opportunity to make it easier to change.
Why can refactoring be dangerous?
Refactoring is an excellent thing, as it yields big improvements to code, which makes it easier to read and understand. Code is much more likely to be used and maintained if it is easy to understand. That being said, refactoring can also lead to problems. Refactoring is dangerous if you don’t have enough code coverage for the code you would like to change.
Code coverage is a metric in software development for measuring the testing coverage of an application. It measures what percentage of lines in methods and modules are covered by tests. In other words, this is the percentage of source code lines executed by running the tests. This metric is also known as test coverage.
If you don’t have enough code coverage for the code you will modify, you run the risk of introducing side effects or bugs that might go unnoticed.
That’s why the first step in changing any legacy code is to write some tests for it. These tests are known as characterization tests or golden master testing.
What is a characterization test?
Characterization tests are tests that focus on the actual behavior of an existing piece of software. You use these tests to validate that specific behavior is working as expected.
For example, let’s say that there is a requirement that says a file must be deleted when the application closes. One possible characterization test would be to ensure that the application deletes the file without error. In this way, you can see whether or not the requirement is being met by determining whether the file is deleted.
Remember, in this step, you are not chasing existing bugs. Or trying to change how the feature works. You create a test that, for a given input, creates an output. Characterization tests are your safety net.
Tests like these are called “characterization tests” because they characterize an aspect of the system. You should use these tests as a starting point before making any refactoring changes to the legacy code. In this way, you can protect yourself and the code from making the changes that introduce bugs.
Characterization tests usually cover a bigger part of the legacy codebase. For example, they can be integration or UI tests. Characterization tests are not unit tests because it’s usually hard to start writing unit tests before you refactor the code a little bit.
Action plan
The first thing you need to remember is that changing a legacy code is a long-term process. This is not something you can do in several weeks. Instead, if the whole team is focused, it can pass several months before you start seeing the actual results.
Introduce a continuous integration system
Remember all those characterization tests you wrote earlier? Good. Now it’s time to put them into action. First, you need to set up a continuous integration system that will run the tests often.
What is a continuous integration system? Continuous integration (CI) is a software development practice where developers push their code to the main branch as often as possible, ideally at least once a day. Before the code ends up in the main branch, the automated build checks that the code still works correctly. CI also executes the tests you have to detect problems as soon as possible.
The goal of tests is to ensure that the code continues to behave as expected and that you don’t introduce regression bugs.
Start with easy or hard legacy code?
Do you start with changing the easy or hard legacy code? You need to consider several things, but the most important criterion is how experienced your team is with testing and experienced in general.
A team with very little experience in tests may struggle with testing hard and complex legacy code. This is where easy (and ideally testable) code comes in. It will let the team start with something familiar, non-threatening, and in controlled manner.
On the other hand, if your team is experienced, and knows how to write unit and integration tests, starting with hard and complex code may be more rewarding because the complex code usually gets changed a lot. But, without tests, it tends to break more. That’s why there are more benefits in refactoring the complex legacy code. It will be easier to change it in the future. And less painful once you write some tests for it.
Changing the legacy code
Once you have the continuous integration system up and running and decide whether you want to refactor simple or complex logic first, the next thing is doing the necessary work to improve the code.
Here are some general tips:
- Know how to refactor it – To change the legacy code in the best possible way, you need to spend some time beforehand learning about refactoring. The good books about this topic are: Working Effectively with Legacy Code, Refactoring, and Refactoring to Patterns.
- Don’t try to change too many things at once – Once you start refactoring the old legacy code, it can be tempting to refactor as much as possible in one go. After all, the sooner you change all the code, the sooner all your problems will disappear, right? Well, not exactly. When you change too many things at once, you will have a bigger chance of breaking something. Especially if your tests don’t cover all scenarios.
- Don’t add new features to old code – If possible, create new testable classes that will contain the new business logic. And call the new classes from the old code. That way, you don’t keep adding a new code on top of the existing legacy code.
- Encapsulate old code that’s doing too much work into a class you can reuse – Legacy code often comes with static classes and static methods. If you need to call a static class in the new code, the best way to do it is by wrapping the static class into a thin wrapper that implements an interface. Then in the new code, inject the interface and use the functionality you need.
- Don’t forget to run the test suite – After spending countless hours refactoring code, it can be tempting to skip the testing phase. But that can often result in bugs that could have been caught early in the process. To avoid these mistakes, it’s important to spend the time to test your code before proceeding with the next task.
- Use the test-driven development (TDD) approach – An excellent technique for preventing additional legacy code is by using test-driven development. Test-driven development starts with writing test cases for the expected behavior of the system and then proceeds to write code that will satisfy those needs. TDD improves code quality, saves time and costs, and reduces the risk of creating additional legacy code.
Conclusion
Legacy code is the kind of code that makes developers jump ship. Creating a plan of action to replace it will make all of your teams happier, more productive and improve the quality of your product.
It often seems that the word “legacy” has a negative connotation, especially when applied to software. But what if we’re talking about the recipes your grandma handed down to you? They’re completely invaluable, and your family is glad you still use them every Christmas. Both legacy code and recipes are time-tested and serve a purpose.
Legacy code: Is it important? Yes. Is it hard to change? Yes. But you can make legacy code easier to maintain by taking some time to refactor and write tests. Refactoring and tests will improve your delivery time and keep legacy bugs from popping up at a later date.