Back to Blog
December 16, 2025 • Foundation

The Hidden Cost of Flaky Tests: How to Build Deterministic Web Tests That Actually Scale

You know the feeling. Your test suite passes on your local machine. You push to CI. Three tests fail. You re-run without changing a single line of code. They pass. Welcome to the expensive, trust-eroding world of flaky tests.

The Hidden Cost of Flaky Tests - MS Paint illustration showing flaky test flowchart

Flaky tests aren't just annoying—they're actively dangerous to your development velocity. When tests fail randomly, teams lose confidence in their automation. Developers start ignoring test failures. "Just re-run it" becomes the team motto. And before you know it, your carefully crafted test suite becomes background noise instead of a safety net.

The Real Cost of Flakiness

Let's do some quick math. If your team of 10 engineers hits a flaky test failure once per day, and each engineer spends just 5 minutes investigating whether it's a real bug or just flakiness, that's 50 minutes per day. Over a year, that's roughly 200 hours—or five full work weeks of engineering time wasted on ghost bugs.

But the real cost isn't the time spent investigating. It's the bugs that slip through because your team stopped trusting the tests.

The 5 Root Causes of Test Flakiness

After analyzing hundreds of flaky test suites across Playwright, Selenium, and Cypress, the culprits always boil down to five core issues:

1. Race Conditions with Async Operations

Modern web apps are asynchronous by nature. API calls, animations, lazy-loaded components—everything happens at unpredictable times. The classic mistake looks like this:

// ❌ FLAKY: Assumes button exists immediately
await page.click('#submit-button');

This works 90% of the time. But when the network is slow or the CPU is busy, the button might not exist yet. The test fails. You re-run it on a faster machine. It passes.

The fix: Use framework-specific waiting mechanisms that retry until conditions are met.

// ✅ DETERMINISTIC: Waits up to 30s for button to appear
await page.waitForSelector('#submit-button', { state: 'visible' });
await page.click('#submit-button');

// Or in Playwright, this is built-in to most actions:
await page.click('#submit-button'); // Auto-waits for actionability

2. Hard-Coded Sleep Statements

We've all been there. Test is flaky. Add a sleep(2000). Test passes. Ship it.

// ❌ FLAKY: Works on your machine, fails in slower CI
await page.click('#load-more');
await page.waitForTimeout(2000); // "Should be enough time..."
expect(await page.locator('.item').count()).toBe(20);

Hard-coded waits are time bombs. They work until they don't. The operation might take 1.8 seconds on your laptop but 2.3 seconds in CI. Now your test is flaky again, and the "solution" is to increase the sleep time, slowing down your entire suite.

The fix: Wait for specific conditions, not arbitrary time periods.

// ✅ DETERMINISTIC: Waits for actual condition
await page.click('#load-more');
await page.waitForFunction(() => 
  document.querySelectorAll('.item').length === 20
);

3. Uncontrolled External Dependencies

Your tests call a real API. Sometimes the API is fast. Sometimes it's slow. Sometimes it's down for maintenance. Sometimes it returns different data. Your tests become a weather vane for third-party service reliability.

The fix: Mock external dependencies or use contract testing.

// ✅ DETERMINISTIC: Mock the API response
await page.route('**/api/users', route => {
  route.fulfill({
    status: 200,
    body: JSON.stringify([
      { id: 1, name: 'Test User' }
    ])
  });
});

4. Unstable Selectors

You write a test that clicks the third button on the page. A designer adds a new button. Your test now clicks the wrong element. Or worse—it clicks the right element sometimes, depending on page load order.

// ❌ FLAKY: Breaks when page structure changes
await page.click('button:nth-child(3)');

The fix: Use stable, semantic selectors. Add test IDs if necessary.

// ✅ DETERMINISTIC: Semantic and resistant to UI changes
await page.click('[data-testid="submit-form"]');
// or
await page.getByRole('button', { name: 'Submit' }).click();

5. Test Pollution and Shared State

Test A creates a user account. Test B assumes a clean database. When Test B runs after Test A, it fails. When it runs alone, it passes. Classic test pollution.

The fix: Isolate test state. Use fresh contexts, databases, or cleanup hooks.

// ✅ DETERMINISTIC: Each test gets fresh browser context
test.beforeEach(async ({ browser }) => {
  const context = await browser.newContext();
  const page = await context.newPage();
  // Each test starts with clean slate
});

Building Deterministic Test Architecture

Fixing individual flaky tests is important. But the real solution is designing your test architecture to make flakiness impossible by default.

The Pyramid of Reliability

  1. Isolate state: Every test should run in a clean environment. Use test fixtures, database transactions that rollback, or containerized test environments.
  2. Control time: Mock Date.now(), control timeouts, freeze animations in your test environment.
  3. Eliminate network variability: Mock APIs, use service workers to intercept requests, or run against a dedicated test environment with controlled data.
  4. Wait for stability: Before asserting, wait for the page to reach a stable state. No pending network requests, no ongoing animations, DOM settled.

Framework-Specific Strategies

Playwright: Auto-Waiting Is Your Friend

Playwright's built-in auto-waiting is remarkably good. Actions like click(), fill(), and check() automatically wait for elements to be actionable.

// This handles most race conditions automatically
await page.getByRole('button', { name: 'Submit' }).click();

// For custom conditions, use waitFor
await page.waitForLoadState('networkidle');
await expect(page.getByText('Success')).toBeVisible();

Selenium: Explicit Waits Over Implicit

Don't use implicit waits. They're unpredictable. Use explicit WebDriverWait with expected conditions.

// ✅ DETERMINISTIC
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
WebElement button = wait.until(
  ExpectedConditions.elementToBeClickable(By.id("submit"))
);
button.click();

Cypress: Embrace Retry-ability

Cypress automatically retries assertions. Lean into this. Structure your tests so assertions are the synchronization points.

// ✅ DETERMINISTIC: Cypress retries until assertion passes
cy.get('#submit').click();
cy.get('.success-message').should('be.visible');
cy.get('.item').should('have.length', 5);

Debugging Existing Flaky Tests

You've inherited a flaky test suite. Where do you start?

Step 1: Measure the Flakiness

Run each test 100 times. Track the failure rate. A test that fails 5 times out of 100 is telling you something about timing or state pollution.

# Playwright
npx playwright test --repeat-each=100 flaky-test.spec.ts

# Cypress  
cypress run --spec "flaky-test.cy.js" --config numTestsKeptInMemory=0 --headed false --env iterations=100

Step 2: Add Verbose Logging

Capture screenshots, videos, and trace files on failures. Modern frameworks make this trivial.

// playwright.config.ts
export default defineConfig({
  use: {
    screenshot: 'only-on-failure',
    video: 'retain-on-failure',
    trace: 'on-first-retry',
  },
});

Step 3: Isolate the Variable

Change one thing at a time. Run on different machines. Disable parallelization. Clear all state between runs. The moment the flakiness disappears, you've found your culprit.

Step 4: Fix the Root Cause, Not the Symptom

Don't just add sleep() statements. Find out why the race condition exists. Fix the selector. Mock the API. Wait for the actual condition.

Your Action Plan

Starting today, commit to these three practices:

  1. Zero tolerance for flakiness: Treat a flaky test like a broken test. Don't merge code with flaky tests. Fix them immediately or quarantine them.
  2. Use semantic selectors: Add data-testid attributes to critical elements. Use role-based selectors. Avoid CSS selectors tied to styling.
  3. Control your dependencies: Mock external APIs in E2E tests. Use dedicated test environments with controlled data. Don't test against production.

The Bottom Line

Flaky tests are not a fact of life. They're a symptom of non-deterministic test design. Every flaky test has a root cause, and every root cause has a fix.

The teams that ship confidently are the teams that trust their tests. And the teams that trust their tests are the ones who've eliminated flakiness at the architectural level.

Your test suite should be a guardrail, not a guessing game. Build it that way.


Want deterministic test automation without the setup headaches? Desplega.ai provides cloud-based QA infrastructure with built-in flakiness detection, test isolation, and parallel execution that just works. Start your free trial today.