The Tech Debt Snowball Method

There's this thing that happens where software teams get "stuck."

You know the kinds of teams I'm talking about. There are still features the business wants from them, but all their time gets sucked away by support and maintenance and bug fixes. Changes that "should" be simple take forever. The tests suck. The build takes forever. They're stuck with weird architectural decisions made by people who left the company years ago. The code is crammed with complexity from one-off features that only one or two clients use.

When I land on these teams I often find the developers on them fixated on making big changes. They need to write an entirely new system using event sourcing. They need to break the monolith into micro services. They need permission to stop all feature development for a few months. Above all, they need more staff allocated to the project.

But big changes rarely help a team that's really stuck. What really helps is small changes— and a lot of them. Inline that misleading abstraction. Streamline the configuration. Document how to set up a test environment. Fix the flakiest test.

Then, maybe, you're ready for the big rewrite that will fundamentally address the architectural problems at the heart of your project. But you've got to be in the habit of cleaning your workspace first.

A fool-proof method for solving technical debt forever

Make a list of all the stuff I want to change about the project
Order that list from "easiest to finish" to "hardest to finish"
Work on the list for at least a few hours every week
Use the time that you save by fixing the easy things to work on the harder things.

You start as a tiny refactoring snowball, and eventually become a mighty, tech-debt eliminating avalanche.

Why highest impact first fails

People tend to focus on making big changes for a reason. Often the folks who are planning that big project to e.g. break out all the hot-path components from the MySQL-driven monolith that's deployed with a pile of fragile Chef scripts and get them into DynamoDB-backed micro-services on Kubernetes are right that this is the biggest lever that you can pull in the system.

The problem with making big changes is that a team that's stuck like this is not very good at making big changes, and often the company around them is not very good at making big changes, and they're not very good at shipping. They're out of practice. They have a hard time aligning as a team on a single direction. They're not good at making the political case for the work. They probably have a hard time breaking it down into distinct pieces that can survive interruptions by inevitable emergencies. They probably struggle with learned helplessness— they may not believe that it's really possible to make improvements to their environment. Management won't "let" them.

So starting with hard, high impact work means setting themselves up to fail, which then makes them even more stuck. "We tried to fix something. It failed. Why bother?"

Small improvements reverse this. It can be surprisingly satisfying to stop and fix a few paper cuts in your workflow, especially if you pick things that you deal with all the time. Those stupid warning messages in the tests. That half-implemented repository pattern that confuses every new person who joins the team. That handful of tests that are sensitive to the order they run in, so you can't speed the suite up by running a bunch of tests in parallel.

My personal favorite place to start is flaky tests. Most teams have at least a few of these. Instrument them, figure out which ones fail the most, and then work down the list. (Or if you're brave— set up an agent session to keep running and fixing the tests until the suite can entirely pass, say, 1000 times in a row.) Once you have tests you can rely on, everything else you can think of doing gets easier.

You do this kind of work, a little bit every day, for long enough, and you stop being a team that's helpless in the face of forces outside of its control, and you start being a team that takes care of itself.

A fool-proof method for solving technical debt forever

Why highest impact first fails

Sign up for Simpler Machines