You should be able to touch any part of your code
There’s a popular saying that goes by:
The First Law of Programming: If it works, don’t touch it
It might sounds wise to some, but over time, this could eventually rot your entire system.
Courtesy: ProgrammerHumor subreddit
Have ever joined a company and when you were assigned to a task/project and you have to touch a codebase that everyone is afraid of? The last commit was 1 year ago, running on a 10 years old runtime that never get updated, the dependencies were never updated since they were first installed, unit test coverage is only at 10%, 0 integration tests, and the last person worked on this codebase left the company 6 months ago. I have been there, a few times. It wasn’t a pleasant place to be. I believe, this kind of situation happens when the engineers take the saying “If it works, don’t touch it” too literally.
“We propose that “software engineering” encompasses not just the act of writing code, but all of the tools and processes an organization uses to build and maintain that code over time” - Software Engineering at Google
It’s easy to write a throw away code, or writing a code for a school assignment. But writing a code for an organization that you want to last for years or decades, it takes a totally different mindset.
Consequences
Here are some of the reasons why I think “If it works, don’t touch it” is not a good mindset:
You’ll likely fail to innovate#
Building a software is a never finished process. Your code and your system should be able to evolve over time.
The inspiring video below shows how the proprietary database used internally at New Relic evolved from using local storage to cloud storage, running on local process on VMs to running in cloud-native architecture, coupled architecture to decoupled architecture, migrating to cell architecture and more.
This ability to evolve give New Relic a lot of advantage in term of cost saving and reliability. Now, imagine if they keep the first iteration of the system (keep all the data in local disk) as it is because “If it works, don’t touch it”, they wouldn’t be able to compete with Datadog. Storing exabytes of data in local disk is not realistic in many ways. In short, they would not be able to compete in the industry if they fail to innovate.
It becomes a liability#
When everyone don’t want to touch the codebase, the codebase will slowly rot. The dependencies that full of CVEs are never getting an update. Whenever you have to work on the codebase, you have a limited choices of libraries you can use, and you cannot use the new language feature. For example, Java 24 introduced virtual thread. If you are on Java 18, you cannot use this new concurrency feature in Java or any libraries that depends on it. This will then slows down the developer productivity or worst, you cannot compete with your competitors.
Symptoms & its cure
From my experience, there are a couple of symptoms when you could forsee the component/system could end up in this bad situation.
Bad test coverage#
Shipping without a good tests feels like playing a whack-a-mole. Writing tests should build your confidence in shipping a stable code. When you a good test coverage, regardless how you change your code, you are can always be sure that the existing flow & the new features works as expected, and in the future if someone touches it, it will works the same.
The Story of Google Web Server is the classic example of this scenario:
Back in 2005, as the project swelled in size and complexity, productivity had slowed dramatically. Releases were becoming buggier, and it was taking longer and longer to push them out. Team members had little confidence when making changes to the service, and often found out something was wrong only when features stopped working in production …. To address these problems, the tech lead (TL) of GWS decided to institute a policy of engineer-driven, automated testing. As part of this policy, all new code changes were required to include tests, and those tests would be run continuously. Within a year of instituting this policy, the number of emergency pushes dropped by half - Software Engineering at Google
Bad code organization or programming interface#
Writing a good & clean is not easy. Some people might just slap all the logics in application controller, doing cyclomatic imports all over the place, duplicating code blocks, and write bad code interface and abstraction.
The cure? Ensure the code quality and review the PR thoroughly. Make sure the coding rules are exactly followed.
Bad deployment process#
In my previous team, we also agreed to schedule a build every week to ensure that our pipeline is always run-able. You don’t want to wait until there is an incidents happens and when you want to deploy a hotfix, you discovered that the last time the pipeline ran was 6 months ago. In the period of 6 months, the CI vendor might have introduced a breaking change, or your command no longer works because you did not pin the version of the tools you are using in the pipeline job. Deploying a hotfix usually takes 10 minutes, could then takes 2 more hours because you have to figure out something else related to the CI pipeline.
Smoke test is a very valuable practice you can adopt in your team. In short, smoke test ensures that your new changes deployed passed the basic tests before start routing the traffic the newly deployed version of your service. If the smoke test failed, no real user traffic is affected since no real user traffic is routed to the broken version of your service. In my previous team, we perform smoke test extensively as part of deployment process and because of this, we are able to deploy any time of the day, any time of the week. Nobody really care if you accidentally ship broken code to production because if the smoke test failed, no real user is impacted, just revert your changes later.
Dependencies/runtime are not updated#
I have encountered many times where the project stuck at using the framework from 5 years ago, runtime for 3 years ago, language from 10 years ago. When they’re approaching EOL, then everyone starts planning for massive upgrades. Some succeeded, some don’t.
In the same Software Engineering at Google book (I have forgotten which chapter), they suggests to always update your dependencies & runtime over time. Don’t wait until many of your dependencies become so far behind. Upgrading your framework from Django 3 -> Django 5 is so different than migrating from Django 3.1 -> Django 3.2. Frequently migrating your dependencies, framework and runtime/language gives you the confidence to do it over and over again incrementally.
Conclusion
In conclusion, always make sure that your code base is touch-able :)