Back to Blog
December 30, 2025

The Hidden Cost of Flaky Tests: Why Your CI/CD Pipeline is Lying to You

That green checkmark doesn't mean what you think it means

TL;DR: Flaky tests erode developer trust, slow deployments, and hide real bugs. They're not inevitable—they're symptoms of poor wait strategies, test isolation failures, and brittle selectors. Learn practical strategies to eliminate flakiness in Playwright, Selenium, and Cypress, and build a culture where test reliability is a core quality metric.


MS Paint illustration showing a cracked CI/CD pipeline with confused developers

You've seen it happen. The test passes. Then it fails. You re-run it without changing a single line of code, and it passes again. Your CI/CD pipeline shows green, you deploy to production, and then everything breaks. Sound familiar?

Flaky tests are the silent killers of software quality. They're like that friend who says "I'll be there in 5 minutes" but shows up whenever they feel like it. Unreliable. Unpredictable. And slowly destroying your team's trust in the entire testing process.

The Real Cost: It's Not Just About Time

When developers see a failing test, they have to make a choice: is this a real bug, or is the test just being flaky again? If your team has lost confidence in your test suite, they'll start clicking that "re-run" button instead of investigating. That's when real bugs slip through. This is exactly the kind of non-determinism menace that plagues modern development teams.

The hidden costs compound quickly:

  • Lost developer time: Every flaky test failure costs 15-30 minutes of investigation time, multiplied by every developer who encounters it
  • Deployment delays: Teams start ignoring test failures or requiring manual approval for every deploy
  • Eroded trust: When tests can't be trusted, developers stop writing them or stop caring about test quality
  • Hidden bugs: Real issues get dismissed as "probably just flaky" until they reach production

The Root Causes: Why Tests Become Flaky

Understanding why tests become flaky is the first step to fixing them. Here are the most common culprits:

1. Timing Issues and Race Conditions

The classic "it works on my machine" problem. Your local dev environment is fast, but CI runs on slower infrastructure. Network requests take longer. Animations don't complete. Elements aren't ready when your test expects them.

In Playwright, Selenium, and Cypress, this often looks like:

  • Clicking elements before they're interactive
  • Asserting on data before API calls complete
  • Reading DOM content while it's still updating
  • Not waiting for animations or transitions to finish

2. Environmental Dependencies

Tests that depend on external state are tests waiting to fail. This includes:

  • Shared test databases without proper cleanup
  • Tests that run in different orders producing different results
  • Reliance on third-party APIs that occasionally timeout
  • Date/time dependencies that fail when run at different times

3. Brittle Selectors

When selectors break because of minor UI changes, you get intermittent failures that seem random but are actually caused by CSS class changes, dynamic IDs, or DOM structure shifts.

Strategies to Eliminate Flakiness

Strategy 1: Use Auto-Waiting and Proper Wait Strategies

Modern testing frameworks have built-in auto-waiting, but you need to use it correctly:

Playwright (best auto-waiting):

  • Automatically waits for elements to be actionable before interacting
  • Use waitForSelector for explicit waits
  • Leverage waitForLoadState('networkidle') for complex page loads

Selenium:

  • Always use explicit waits over implicit waits or hard-coded sleeps
  • Use WebDriverWait with expected conditions
  • Never use Thread.sleep() - it's a flakiness generator

Cypress:

  • Automatically retries assertions until they pass or timeout
  • Use cy.intercept() to wait for specific network requests
  • Leverage cy.wait('@aliasName') for explicit API waits

Strategy 2: Implement Test Isolation

Every test should be completely independent. No shared state. No order dependencies. No leftover data from previous tests.

  • Use database transactions that rollback after each test
  • Clear browser storage, cookies, and cache between tests
  • Use unique test data generators (timestamps, UUIDs) to avoid conflicts
  • Mock external dependencies so tests don't rely on third-party availability

Strategy 3: Build Resilient Selectors

Stop relying on CSS classes and brittle XPath. Use data attributes specifically for testing:

  • Add data-testid attributes to important elements
  • Use role-based selectors when possible (getByRole('button'))
  • Prefer text content selectors for human-readable tests
  • Avoid dynamic IDs or nth-child selectors that break with UI changes

Strategy 4: Detect and Quarantine Flaky Tests

You can't fix what you can't measure. Set up systems to identify flaky tests automatically:

  • Track test failure rates over time - any test with inconsistent results is a candidate
  • Use tools like pytest-flakefinder or CI analytics to identify patterns
  • Quarantine flaky tests (mark them with @flaky tags) so they don't block deployments
  • Create a dedicated "fix flaky tests" rotation or sprint goal

Strategy 5: Use Retry Logic Strategically

Retries are a double-edged sword. They can mask problems, but when used correctly, they can handle legitimate environmental variance:

  • Good use: Retry network requests that might timeout due to infrastructure issues
  • Bad use: Retrying entire tests to hide timing problems
  • Rule of thumb: If a test needs more than 2 retries to pass, it's flaky and needs fixing

In Playwright, you can configure retries per test:

  • Set retries in playwright.config.ts
  • Use test.only.retry(2) for specific flaky tests while you fix them
  • Monitor retry rates - if tests consistently need retries, that's a red flag

Building a Culture of Test Reliability

Technical solutions only work if your team prioritizes test quality. Here's how to build that culture:

  • Make flaky tests a priority: Treat them like production bugs, not technical debt to ignore
  • Add test reliability to your definition of done: A feature isn't complete until its tests are stable
  • Review test code like production code: Tests deserve the same scrutiny as application logic
  • Track and celebrate improvements: Measure test reliability metrics and recognize teams that improve them

The Bottom Line

Flaky tests are not inevitable. They're a symptom of technical decisions, architectural choices, and team priorities. Every flaky test in your suite is quietly eroding trust, slowing deployments, and hiding real bugs.

The good news? You can fix this. Start by identifying your flakiest tests, understanding their root causes, and applying the strategies above. Use proper wait strategies, isolate your tests, build resilient selectors, and create systems to detect and quarantine flakiness before it spreads.

Your CI/CD pipeline should be a source of confidence, not anxiety. When that green checkmark appears, it should mean something. Make your tests reliable, and you'll ship faster, deploy with confidence, and catch real bugs before they reach production.

Stop accepting flaky tests as a cost of automation. Start treating test reliability as a core quality metric. Your future self (and your team) will thank you. For more on building reliable test suites, see Test Wars Episode VII: Test Coverage Rebels.