Test Wars – Episode VII: Test Coverage Rebels
TL;DR: Test-coverage numbers feel comforting, but they can hide mission-critical gaps. What you test matters far more than how much you test, so shift focus to end-to-end, revenue-driving scenarios; exactly the kind AI can automate.
Introduction
Ask any engineering team how “good” their tests are and someone will quote a percentage. Line-coverage dashboards offer an easy-to-grasp score that dazzles CEOs and calms CTOs. But coverage is only a proxy for quality, not a guarantee. When vanity metrics become goals, they distort priorities, leaving the very flows your customers (and your top line) depend on unchecked.
Picture the scene: a CTO is in a board meeting, presenting a dashboard that glows with a reassuring "Test Coverage: 95%." Confidence is high. Hours later, the illusion shatters. A key enterprise customer is unable to complete a core transaction, and that beautiful 95% coverage means absolutely nothing. This isn't hypothetical. The recent CrowdStrike software bug that cost Delta Air Lines an estimated $380 million wasn't a failure of a single, isolated component; it was a failure of the system as a whole—the exact kind of disaster that code coverage metrics are fundamentally incapable of predicting.
1. Why Coverage Fetishes Persist
The allure of a single number is powerful, but it often masks a dangerous reality. Leaders naturally crave simple, quantifiable metrics, but this simplicity is a flaw when it abstracts away the immense complexity of modern software.
The Allure | The Reality |
---|---|
Simple scorecard executives can track. | A number is not evidence that the right behaviours are tested. |
Gamification: teams race to 80% or 90%. | Metrics can be inflated by trivial tests that execute code without asserting behaviour. |
Tooling built into IDEs and CI pipelines. | Easy instrumentation ≠ meaningful assurance. |
“Having a high coverage gives you a sense of security, possibly making you blind to deeper issues.”
Martin Fowler echoes the warning: coverage is “useful for finding untested parts… but of little use as a numeric statement of how good your tests are.”
2. Evidence: High Coverage ≠ Fewer Bugs
The data is clear: a high coverage percentage does not reliably lead to more stable software.
- Empirical research: A 2015 study of two large open-source systems found only a weak correlation between higher statement coverage and real-bug detection.
- Eclipse crash data: Analysing 2 million crash reports, researchers discovered that unit-tested code crashed about as often as untested code.
- Industry post-mortems: GitLab’s infamous 2017 database outage wiped six hours of customer data; unit and component tests all passed, yet an end-to-end restore path was never rehearsed.
- Real-world outages: The July 2024 CrowdStrike software update grounded airlines worldwide. The failure wasn't in a single, isolated component but in the system as a whole—a blind spot for unit test coverage that cost some companies hundreds of millions.
3. Where Coverage Falls Short
Code coverage fails because it cannot see the most critical sources of failure in modern, distributed systems.
- Inter-service seams: Microservices magnify the “gaps between the boxes.” Most unit tests stub network calls, so failures in contracts, timeouts, or auth headers go unnoticed until production. Test Wars Episode V dives into this problem in detail.
- Critical user journeys: The checkout path that drives 80% of revenue may involve five services, two external APIs, and a feature flag. Line coverage doesn’t know how valuable that path is. Test Wars Episode VI was all about this. For an e-commerce business, this is: "Can a new user find a product, add it to their cart, and pay?" For a SaaS platform: "Can an enterprise admin invite a new team member and see the charge on their bill?" A metric that can't distinguish between a cosmetic flaw and a revenue-killing catastrophe is not a metric that can guide strategic decisions.
- Data-dependent edge cases: Tests that merely touch a line cannot assert behaviour across weird encodings, leap-second timestamps, or country-specific tax rules.
Steve Grunwell sums it up: “100% code coverage, but no protection against the things most likely to cause issues.”
4. The Rebel Strategy: From Code Coverage to Business Risk Coverage
The rebellion against code coverage is not a call for anarchy; it is about having the right metrics. We must shift our focus from a purely technical, inside-out view to a business-centric, outside-in perspective.
From | To |
---|---|
Counting executed lines | Covering end-to-end scenarios users rely on |
Chasing an arbitrary % target | Mapping business-risk to test depth |
Writing more unit mocks | Generating production-like data & environments |
To lead this rebellion, you must replace the old, misleading dashboard with one that reflects business reality. The following table provides a clear guide for translating the language of the old empire into the language of the rebel alliance.
The Executive's Guide to QA Metrics: From Vanity to Value
Vanity Metric (The Old Empire) | Value Metric (The Rebel Alliance) | The Business Question It Answers |
---|---|---|
Code Coverage Percentage | Critical User Journey (CUJ) Pass Rate | "Is our code being exercised by tests?" vs. "Can our most important customers achieve their most important goals right now?" |
Total Number of Tests | Bug Escape Rate to Production | "Is our team busy writing tests?" vs. "Is our QA process effective at preventing customer-facing pain?" |
Unit Test Pass Rate | Change Failure Rate (DORA Metric) | "Do individual components work in a lab?" vs. "Can we ship new value to customers quickly and safely?" |
Test Execution Speed | Mean Time to Resolution (MTTR) for a CUJ Failure | "How fast do our tests run?" vs. "When a critical workflow breaks, how fast can we fix it to protect revenue?" |
Desplega’s approach auto-generates real-world flows: spanning APIs, UIs, data stores, and message queues. That way you validate what actually happens in prod. Instead of asking “How many lines did we hit?”, we aim to show you which revenue-critical journeys are green.
5. Practical Steps for Tech Leaders
- Inventory value journeys. Identify at least the top 5 user journeys or ops processes that directly impact revenue or SLA penalties. This is where your QA team will excel.
- Instrument observability first. Trace each journey; the trace becomes the spec for an end-to-end test.
- Automate scenario generation. Figure out how to replay traces with production-like data, spinning up ephemeral environments when needed.
- Track “Scenario Coverage.” Report the percentage of high-value flows exercised on every build and fail pipelines when coverage slips.
- Use code coverage as a smoke detector, not a KPI. Let low coverage highlight untouched code, but never treat the number itself as success.
A final note
Coverage percentages belonged to a simpler time. Modern systems fail at the integrations and edge cases that numbers can’t see. By pivoting from how much you test to what you test, you protect customer trust, prevent seven-figure outages, and keep revenue flowing.
As a leader, you set the strategy. The next time you review your engineering dashboard, resist the allure of the simple percentage. Instead, ask the questions that truly matter: "What percentage of our critical, revenue-generating user journeys are we testing with every release?" and "When a critical journey fails, how quickly can we resolve it?"
The rebels have spoken: may your tests be with the user, not the metric.
References
- Marc-G. G., “Code Coverage Is a Vanity Metric,” 2015 — marcgg.com
- Martin Fowler, “Test Coverage,” 2012 — martinfowler.com
- M. Zhao, L. Kaufman, S. Spacco, “Code Coverage and Test-Suite Effectiveness: Empirical Study with Real Bugs,” SANER 2015 — doi.org
- E. Chioteli, I. Batas, D. Spinellis, “Does Unit-Tested Code Crash? A Case Study of Eclipse,” arXiv 2019 — arxiv.org
- GitLab Engineering, “Postmortem of the 31 Jan 2017 Database Outage,” 2017 — about.gitlab.com
- Steve Grunwell, “The True Meaning of Code Coverage,” 2025 — stevegrunwell.com
- Delta Air Lines September Quarter 2024 Results. Delta Investor Relations, Oct 2024 — delta.com