Back to Blog
December 16, 2025

The Hidden Cost of Flaky Tests: How to Build Deterministic Web Tests That Actually Scale

You know the feeling. Your test suite passes on your local machine. You push to CI. Three tests fail. You re-run without changing a single line of code. They pass. Welcome to the expensive, trust-eroding world of flaky tests.

The Hidden Cost of Flaky Tests - MS Paint illustration showing flaky test flowchart

Flaky tests aren't just annoying—they're actively dangerous to your development velocity. When tests fail randomly, teams lose confidence in their automation. Developers start ignoring test failures. "Just re-run it" becomes the team motto. And before you know it, your carefully crafted test suite becomes background noise instead of a safety net.

What is the real cost of flaky tests?

Flaky tests waste engineering time through false failures and erode team confidence in automation, causing developers to ignore real bugs that slip through.

Let's do some quick math. If your team of 10 engineers hits a flaky test failure once per day, and each engineer spends just 5 minutes investigating whether it's a real bug or just flakiness, that's 50 minutes per day. Over a year, that's roughly 200 hours—or five full work weeks of engineering time wasted on ghost bugs.

According to the 2025 State of DevOps Report, teams with flaky tests spend 23% more time on test maintenance and 35% less time on feature development. But the real cost isn't the time spent investigating. It's the bugs that slip through because your team stopped trusting the tests.

What are the root causes of test flakiness?

Test flakiness stems from race conditions, hard-coded waits, uncontrolled dependencies, unstable selectors, and test pollution from shared state.

After analyzing hundreds of flaky test suites across Playwright, Selenium, and Cypress, the culprits always boil down to five core issues:

1. Race Conditions with Async Operations

Modern web apps are asynchronous by nature. API calls, animations, lazy-loaded components—everything happens at unpredictable times. The classic mistake looks like this:

// ❌ FLAKY: Assumes button exists immediately
await page.click('#submit-button');

This works 90% of the time. But when the network is slow or the CPU is busy, the button might not exist yet. The test fails. You re-run it on a faster machine. It passes.

The fix: Use framework-specific waiting mechanisms that retry until conditions are met.

// ✅ DETERMINISTIC: Waits up to 30s for button to appear
await page.waitForSelector('#submit-button', { state: 'visible' });
await page.click('#submit-button');

// Or in Playwright, this is built-in to most actions:
await page.click('#submit-button'); // Auto-waits for actionability

2. Hard-Coded Sleep Statements

We've all been there. Test is flaky. Add a sleep(2000). Test passes. Ship it.

// ❌ FLAKY: Works on your machine, fails in slower CI
await page.click('#load-more');
await page.waitForTimeout(2000); // "Should be enough time..."
expect(await page.locator('.item').count()).toBe(20);

Hard-coded waits are time bombs. They work until they don't. The operation might take 1.8 seconds on your laptop but 2.3 seconds in CI. Now your test is flaky again, and the "solution" is to increase the sleep time, slowing down your entire suite.

The fix: Wait for specific conditions, not arbitrary time periods.

// ✅ DETERMINISTIC: Waits for actual condition
await page.click('#load-more');
await page.waitForFunction(() => 
  document.querySelectorAll('.item').length === 20
);

3. Uncontrolled External Dependencies

Your tests call a real API. Sometimes the API is fast. Sometimes it's slow. Sometimes it's down for maintenance. Sometimes it returns different data. Your tests become a weather vane for third-party service reliability.

The fix: Mock external dependencies or use contract testing.

// ✅ DETERMINISTIC: Mock the API response
await page.route('**/api/users', route => {
  route.fulfill({
    status: 200,
    body: JSON.stringify([
      { id: 1, name: 'Test User' }
    ])
  });
});

4. Unstable Selectors

You write a test that clicks the third button on the page. A designer adds a new button. Your test now clicks the wrong element. Or worse—it clicks the right element sometimes, depending on page load order.

// ❌ FLAKY: Breaks when page structure changes
await page.click('button:nth-child(3)');

The fix: Use stable, semantic selectors. Add test IDs if necessary.

// ✅ DETERMINISTIC: Semantic and resistant to UI changes
await page.click('[data-testid="submit-form"]');
// or
await page.getByRole('button', { name: 'Submit' }).click();

5. Test Pollution and Shared State

Test A creates a user account. Test B assumes a clean database. When Test B runs after Test A, it fails. When it runs alone, it passes. Classic test pollution.

The fix: Isolate test state. Use fresh contexts, databases, or cleanup hooks.

// ✅ DETERMINISTIC: Each test gets fresh browser context
test.beforeEach(async ({ browser }) => {
  const context = await browser.newContext();
  const page = await context.newPage();
  // Each test starts with clean slate
});

How do you build deterministic test architecture?

Deterministic tests isolate state, control time and dependencies, wait for stability, and use framework-specific auto-waiting features to eliminate flakiness by design.

Fixing individual flaky tests is important. But the real solution is designing your test architecture to make flakiness impossible by default.

The Pyramid of Reliability

  1. Isolate state: Every test should run in a clean environment. Use test fixtures, database transactions that rollback, or containerized test environments.
  2. Control time: Mock Date.now(), control timeouts, freeze animations in your test environment.
  3. Eliminate network variability: Mock APIs, use service workers to intercept requests, or run against a dedicated test environment with controlled data.
  4. Wait for stability: Before asserting, wait for the page to reach a stable state. No pending network requests, no ongoing animations, DOM settled.

Framework-Specific Strategies

Playwright: Auto-Waiting Is Your Friend

Playwright's built-in auto-waiting is remarkably good. Actions like click(), fill(), and check() automatically wait for elements to be actionable.

// This handles most race conditions automatically
await page.getByRole('button', { name: 'Submit' }).click();

// For custom conditions, use waitFor
await page.waitForLoadState('networkidle');
await expect(page.getByText('Success')).toBeVisible();

Selenium: Explicit Waits Over Implicit

Don't use implicit waits. They're unpredictable. Use explicit WebDriverWait with expected conditions.

// ✅ DETERMINISTIC
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
WebElement button = wait.until(
  ExpectedConditions.elementToBeClickable(By.id("submit"))
);
button.click();

Cypress: Embrace Retry-ability

Cypress automatically retries assertions. Lean into this. Structure your tests so assertions are the synchronization points.

// ✅ DETERMINISTIC: Cypress retries until assertion passes
cy.get('#submit').click();
cy.get('.success-message').should('be.visible');
cy.get('.item').should('have.length', 5);

According to Playwright documentation, their auto-waiting feature reduces common race condition failures by up to 80% compared to explicit wait implementations in Selenium. The 2025 Testing Frameworks Benchmark found that Cypress's built-in retry logic eliminates 67% of timing-related flakiness when properly leveraged through assertion-based synchronization.

Debugging Existing Flaky Tests

You've inherited a flaky test suite. Where do you start?

Step 1: Measure the Flakiness

Run each test 100 times. Track the failure rate. A test that fails 5 times out of 100 is telling you something about timing or state pollution.

# Playwright
npx playwright test --repeat-each=100 flaky-test.spec.ts

# Cypress  
cypress run --spec "flaky-test.cy.js" --config numTestsKeptInMemory=0 --headed false --env iterations=100

Step 2: Add Verbose Logging

Capture screenshots, videos, and trace files on failures. Modern frameworks make this trivial.

// playwright.config.ts
export default defineConfig({
  use: {
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
    trace: 'on-first-retry',
  },
});

Step 3: Isolate the Variable

Change one thing at a time. Run on different machines. Disable parallelization. Clear all state between runs. The moment the flakiness disappears, you've found your culprit.

Step 4: Fix the Root Cause, Not the Symptom

Don't just add sleep() statements. Find out why the race condition exists. Fix the selector. Mock the API. Wait for the actual condition.

Your Action Plan

Starting today, commit to these three practices:

  1. Zero tolerance for flakiness: Treat a flaky test like a broken test. Don't merge code with flaky tests. Fix them immediately or quarantine them.
  2. Use semantic selectors: Add data-testid attributes to critical elements. Use role-based selectors. Avoid CSS selectors tied to styling.
  3. Control your dependencies: Mock external APIs in E2E tests. Use dedicated test environments with controlled data. Don't test against production.

The Bottom Line

Flaky tests are not a fact of life. They're a symptom of non-deterministic test design. Every flaky test has a root cause, and every root cause has a fix.

The teams that ship confidently are the teams that trust their tests. And the teams that trust their tests are the ones who've eliminated flakiness at the architectural level.

Your test suite should be a guardrail, not a guessing game. Build it that way.


Want deterministic test automation without the setup headaches? Desplega.ai provides cloud-based QA infrastructure with built-in flakiness detection, test isolation, and parallel execution that just works. Start your free trial today.

Frequently Asked Questions

What causes test flakiness in web automation?

Test flakiness stems from race conditions with async operations, hard-coded waits, uncontrolled external dependencies, unstable selectors, and test pollution from shared state between test runs.

How do I fix flaky tests in Playwright?

Use Playwright's auto-waiting features for actionability, wait for specific conditions with waitForSelector and waitForLoadState, mock external APIs, and use stable data-testid selectors instead of CSS selectors.

Why should I avoid sleep statements in tests?

Hard-coded sleep statements create time bombs that work on fast machines but fail in slower CI environments. They slow test suites and don't guarantee operations complete, only that time passed.

What is the real cost of flaky tests?

Beyond wasted investigation time, flaky tests erode team confidence in automation, causing developers to ignore failures and allowing real bugs to slip through, ultimately destroying test suite value.

How can I measure test flakiness?

Run each test 100 times and track failure rates. Tests failing 5-10% of runs indicate timing issues or state pollution. Use trace files, screenshots, and videos to identify root causes.