The Rewrite Impulse
Every engineer who has spent time in a legacy codebase has felt it: the conviction that if we just rewrote this from scratch, correctly this time, everything would be better. Joel Spolsky wrote about this impulse in 2000 and called it "the single worst strategic mistake that any software company can make." Twenty-five years later, engineering teams are still making it.
The reason rewrites fail is not that engineers are incompetent. It is that the existing codebase contains years of accumulated business rules, edge case handling, and implicit requirements that are not in any specification document. A rewrite starts from scratch and rediscovers all of them, expensively, usually after going live.
When Refactoring Is the Right Answer
Refactoring is always the default. The question is not "should we refactor or rewrite" but "is this situation one of the rare cases where a rewrite is genuinely justified." For most situations, it is not.
Refactoring is the right answer when:
- The application works correctly but the code is hard to change. The business logic is correct, the edge cases are handled, users are not experiencing failures. The problem is developer experience and velocity, not correctness.
- The architectural problems are localized. A messy payment module, a hard-to-test authentication layer, a tightly coupled data access layer. Each of these can be refactored independently without touching the rest of the codebase.
- The team understands the existing behavior. If the team can describe what the application does and why, incremental changes can be made safely with tests verifying behavior is preserved.
When a Rewrite Is Genuinely Justified
Rewrites are justified in a small number of specific situations:
Fundamental architecture mismatch
The existing system was built for a problem that has changed so significantly that the architecture cannot accommodate the new requirements without constant workarounds. A synchronous batch processing system being asked to support real-time streaming. A single-tenant application being asked to support multi-tenancy. These are genuine architectural mismatches where refactoring cannot bridge the gap at a reasonable cost.
Security requirements impossible to retrofit
Some security requirements cannot be added to an existing system without rebuilding it. End-to-end encryption for data at rest when the existing data model stores sensitive fields in plaintext requires a migration that is functionally a rebuild of the data layer. Zero-trust architecture on a system built on implicit trust between services. If the security requirement is fundamental and the existing architecture cannot accommodate it, a targeted rewrite of the affected components is justified.
Cost of incremental change exceeds rewrite cost
Track the velocity trend. If features are taking 4x longer than they did 18 months ago because every change requires understanding and working around accumulated technical debt, and this trend is continuing, the compound cost of living with the codebase will eventually exceed the cost of replacing it. This is a calculation, not a feeling. Run the numbers.
The Strangler Fig: The Pragmatic Middle Ground
The strangler fig pattern lets you replace a legacy system incrementally without a big-bang cutover. New functionality is built in the new system. Existing functionality is migrated one piece at a time. The legacy system gradually shrinks as the new system grows around it.
This is how we approached the UN-DGACM migration: new REST endpoints ran alongside the existing WCF services, with a small percentage of traffic routed to the new layer for validation. The WCF services were decommissioned only after the REST layer had proven itself in production over months of parallel operation.
The strangler fig is the right approach for most situations where a rewrite looks tempting. It delivers business value incrementally, reduces the risk of a big-bang cutover, and allows course corrections before the full investment is committed.
Making the Business Case
For stakeholders, frame the decision in business terms: what is the current cost of the technical debt per sprint (in delayed features and incidents), what is the projected cost over the next 12 to 24 months if nothing changes, and what is the investment required to address it. Technical debt is a liability with a carrying cost. The business case is the difference between the carrying cost and the remediation cost.
FriendsBit has executed both refactoring engagements and targeted rewrites using the strangler fig pattern for enterprise clients. If you are facing this decision and need a senior engineer's perspective on the right approach for your specific codebase, get in touch.