The Hidden Cost of Test Flakiness: How to Build Deterministic Web Tests
Stop fighting unreliable tests and start building automation you can trust

You've been there: the test suite passes locally, fails in CI, then passes on retry. Your team starts adding "just run it again" to their vocabulary. Developers stop trusting the pipeline. Deployments slow down. This is test flakiness, and it's costing your organization more than you think.
According to Google's research, even a 1% flake rate means a 10-test suite has a 10% chance of failure on any given run. Scale that to 1,000 tests and you're looking at false failures on nearly every build. The hidden cost? Developer time, delayed releases, and eroded confidence in your QA process.
The Five Root Causes of Test Flakiness
Before we fix flakiness, we need to understand where it comes from. Here are the top five culprits that plague modern web testing:
1. Race Conditions and Timing Issues
The most common cause: your test runs faster than your application. You click a button, the test immediately checks for a result, but the API call hasn't completed yet. Sometimes it's fast enough, sometimes it's not.
// ❌ FLAKY: Race condition
await page.click('#submit-button');
const result = await page.textContent('#result'); // Might not be ready yet
expect(result).toBe('Success');
// ✅ DETERMINISTIC: Wait for the specific condition
await page.click('#submit-button');
await page.waitForSelector('#result:has-text("Success")');
const result = await page.textContent('#result');
expect(result).toBe('Success');2. Non-Deterministic State
Tests that depend on external state (databases, APIs, cached data) inherit that state's unpredictability. If your test assumes user@example.com exists in the database, it works until someone else's test deletes it.
// ❌ FLAKY: Depends on unknown database state
test('should display user profile', async ({ page }) => {
await page.goto('/profile/user@example.com');
await expect(page.locator('h1')).toContainText('John Doe');
});
// ✅ DETERMINISTIC: Create the state you need
test('should display user profile', async ({ page, request }) => {
// Setup: Create test user via API
const user = await request.post('/api/users', {
data: { email: 'test-user@example.com', name: 'Test User' }
});
await page.goto(`/profile/${user.email}`);
await expect(page.locator('h1')).toContainText('Test User');
// Cleanup: Delete test user
await request.delete(`/api/users/${user.id}`);
});3. Resource Contention
Parallel test execution is great for speed, but terrible when tests compete for the same resources. Two tests trying to modify the same database record or file simultaneously create unpredictable outcomes.
4. Environment Variability
Different environments produce different results. Network latency varies between local and CI. Clock skew affects time-based logic. Screen resolution changes element visibility. Browser versions introduce subtle behavior differences.
5. Weak Waiting Strategies
The classic mistake: using fixed delays (sleep) instead of dynamic waits. await page.waitForTimeout(3000) is a code smell that screams "I don't know when this element appears, so I'm guessing."
Strategy 1: Master Proper Waiting
Modern testing frameworks provide sophisticated waiting mechanisms. Use them. Here's the hierarchy from worst to best:
- Fixed delays (worst):
sleep(5000)- Slow and unreliable - Implicit waits: Framework waits for elements automatically - Better, but not specific enough
- Explicit waits: Wait for specific conditions - Good for most cases
- Smart waits (best): Wait for network idle, animations complete, etc. - Most reliable
// Playwright: Wait for network to be idle
await page.goto('/dashboard', { waitUntil: 'networkidle' });
// Playwright: Wait for specific network response
await Promise.all([
page.waitForResponse(resp => resp.url().includes('/api/data')),
page.click('#load-data')
]);
// Cypress: Wait for API call
cy.intercept('GET', '/api/data').as('getData');
cy.get('#load-data').click();
cy.wait('@getData');
// Selenium: Explicit wait with custom condition
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(10));
wait.until(d -> d.findElement(By.id("result")).getText().length() > 0);Strategy 2: Design for Determinism
Deterministic tests produce the same result every time. Here's how to architect for that:
Isolate Test Data
Each test should create its own data and clean up afterward. Use unique identifiers (UUIDs, timestamps) to prevent collisions:
test('should create new project', async ({ page, request }) => {
const uniqueId = `test-${Date.now()}-${Math.random().toString(36).slice(2)}`;
const projectName = `Project ${uniqueId}`;
// Create isolated test data
await request.post('/api/projects', {
data: { name: projectName, owner: `owner-${uniqueId}` }
});
// Test against that specific data
await page.goto('/projects');
await expect(page.locator(`text=${projectName}`)).toBeVisible();
});Use Test Fixtures and Factories
Build reusable fixtures that guarantee consistent starting state:
// fixtures.ts
export const test = base.extend({
authenticatedUser: async ({ page, request }, use) => {
// Setup: Create user and authenticate
const user = await request.post('/api/auth/register', {
data: { email: `test-${Date.now()}@example.com`, password: 'password123' }
});
await page.goto('/login');
await page.fill('#email', user.email);
await page.fill('#password', 'password123');
await page.click('#submit');
await page.waitForURL('/dashboard');
// Provide authenticated context to test
await use(user);
// Cleanup: Delete user
await request.delete(`/api/users/${user.id}`);
}
});
// Use in tests
test('should access protected dashboard', async ({ page, authenticatedUser }) => {
// Page is already authenticated
await expect(page.locator('h1')).toContainText('Dashboard');
});Mock Time and Random Values
Tests that depend on the current time or random values are inherently non-deterministic. Mock them:
// Playwright: Mock time
await page.addInitScript({
path: './mock-time.js',
content: `
Date.now = () => 1609459200000; // Fixed timestamp
Math.random = () => 0.5; // Fixed random value
`
});
// Cypress: Use clock
cy.clock(new Date('2026-01-04').getTime());
cy.visit('/dashboard');
// Time is now frozen at 2026-01-04Strategy 3: Control Your Environment
Environment differences are a major source of flakiness. Make your test environment as consistent as possible:
- Use containerization: Docker ensures identical environments across local and CI
- Pin browser versions: Don't let auto-updates surprise you mid-sprint
- Set viewport size explicitly: Don't depend on default window dimensions
- Disable animations: They add unpredictable timing
- Mock external services: Third-party APIs introduce latency and failures
// playwright.config.ts
export default defineConfig({
use: {
viewport: { width: 1280, height: 720 }, // Fixed viewport
deviceScaleFactor: 1,
// Reduce motion for determinism
reducedMotion: 'reduce',
// Set consistent timezone
timezoneId: 'America/Los_Angeles',
// Set consistent locale
locale: 'en-US',
// Block external resources
serviceWorkers: 'block',
},
});Strategy 4: Implement Retry Logic Carefully
Automatic retries mask problems rather than fix them, but they're sometimes necessary for legitimately flaky external dependencies. Use them strategically:
// ❌ BAD: Retry everything
test.describe.configure({ retries: 3 }); // Hides all flakiness
// ✅ GOOD: Retry only when interacting with external services
test('should fetch data from third-party API', async ({ page }) => {
test.setTimeout(45000); // Longer timeout for external calls
// Add specific retry logic for this operation
await page.route('**/api/external/**', async route => {
let attempts = 0;
const maxAttempts = 3;
while (attempts < maxAttempts) {
try {
await route.continue();
break;
} catch (error) {
attempts++;
if (attempts === maxAttempts) throw error;
await page.waitForTimeout(1000 * attempts); // Exponential backoff
}
}
});
await page.goto('/dashboard');
});Measuring and Monitoring Flakiness
You can't improve what you don't measure. Track these metrics to understand your flakiness problem:
- Flake rate: (Tests that pass on retry / Total test runs) × 100
- Top flaky tests: Which specific tests fail most often?
- Flakiness by environment: Does CI have more flakes than local?
- Time to stability: How long until a new test becomes reliably deterministic?
Most CI/CD platforms and test frameworks provide flakiness reporting. In Playwright, enable the HTML reporter to see detailed flake analysis:
// playwright.config.ts
export default defineConfig({
reporter: [
['html', { open: 'never' }],
['json', { outputFile: 'test-results.json' }]
],
// Run each test 3 times to detect flakiness
fullyParallel: true,
retries: process.env.CI ? 2 : 0,
});The Path to 99%+ Reliability
Achieving deterministic tests isn't a one-time fix—it's a continuous practice. Here's your action plan:
- Audit your current flakiness: Run your suite 10 times and track which tests fail inconsistently
- Fix the worst offenders first: Apply the 80/20 rule—fix the 20% of tests causing 80% of failures
- Establish a zero-tolerance policy: Don't merge code that introduces new flaky tests
- Make flakiness visible: Add flake rate to your team dashboard
- Review test design patterns: Create team guidelines for deterministic testing
The investment pays off quickly. Teams that eliminate flakiness report:
- 50% reduction in time spent investigating test failures
- Increased developer confidence in CI/CD pipeline
- Faster deployment cycles (no more "run it again" delays)
- Better test coverage (developers write more tests when they trust them)
Your Next Steps
Start small. Pick one flaky test this week and apply these strategies. Document what you learn. Share it with your team. Build a culture where deterministic testing is the standard, not the exception.
Test flakiness isn't inevitable. With proper waiting strategies, deterministic design, and controlled environments, you can build test suites that run reliably every single time. Your future self (and your team) will thank you.
Building QA automation for your team in Spain?
Desplega.ai helps engineering teams in Barcelona, Madrid, Valencia, and beyond implement rock-solid test automation strategies. We specialize in eliminating flakiness and building CI/CD pipelines you can trust.
Let's talk about your testing challenges →