Cypress vs Playwright: Battle-Tested Architectures for Eliminating Flaky Tests
Flakiness is not a tooling bug — it is a topology problem. Each framework's internal model decides what classes of flake you will fight in CI.

It is 02:47 UTC. The same test that has passed 412 times this month fails on a clean rerun: expected element to be visible, found 0 elements. By 09:30 someone has scheduled a “Flaky Test Office Hours.” By Friday, your CI dashboard has a permanent yellow band labelled @flaky, and your team has stopped trusting red builds.
Every QA team eventually arrives at the Cypress vs Playwright debate, and almost every comparison piece online frames it as a feature checklist — who supports Firefox properly, whose fixture API is nicer, who has prettier traces. That framing misses the real predictor of your flake budget, which is architectural: where does the test runner sit relative to the browser, and what does that distance do to your assertions?
This is a Foundation-series deep dive aimed at QA and software engineers who already write E2E tests and want to stop guessing why they fail. We will go past the marketing pages into the protocols, the wait models, and three production-grade patterns that survive contact with real CI. If you want a primer on stabilising one specific suite first, see our how-to on debugging flaky E2E tests before continuing.
The Three Root Causes of Flakiness
Before naming frameworks, it helps to taxonomise what “flaky” actually means at the protocol level. In our experience triaging E2E suites for SaaS and fintech teams across Barcelona, Madrid, Valencia and Malaga, virtually every flake we have debugged falls into one of three buckets:
- Race conditions between the runner and the DUT: the test asserts before the application has committed the DOM mutation, network response, or animation transition.
- Network-layer nondeterminism: third-party scripts, analytics beacons, CDN edge variance, or unmocked APIs that occasionally take longer than the implicit timeout.
- Environmental drift: CI machine load, headless rendering differences, time zones, locales, and the perennial “works on my laptop” pathology.
Bucket 1 is mostly a tooling problem. Bucket 2 is mostly a discipline problem. Bucket 3 is mostly an infrastructure problem. Cypress and Playwright disagree most sharply on how to solve bucket 1, and that disagreement starts at their process model.
How does Playwright's auto-waiting actually work under the hood?
Playwright polls via CDP until the element is attached, visible, stable, receives events, and enabled — then performs the action atomically.
Cypress runs inside the browser. Your spec file is bundled and injected into a hidden iframe alongside the application under test. When you call cy.get('.submit'), there is no inter-process call — the lookup resolves against the live DOM in the same JavaScript runtime that owns the page. Cypress's architecture page on cypress.io documents this design choice explicitly: in-browser execution is the foundation that powers their automatic retry-ability.
Playwright runs out-of-process. Your test code lives in Node.js (or Python, Java, or .NET). Browser interactions travel through the Chrome DevTools Protocol (CDP) for Chromium, and Playwright's patched remote-debugging protocols for Firefox and WebKit, sent over a WebSocket. Every page.click() is an asynchronous RPC.
This single architectural choice is the source of almost every “Cypress can do X but Playwright cannot” (and vice versa) claim you have ever read. Here it is on one screen:
| Dimension | Cypress | Playwright |
|---|---|---|
| Runner location | Inside the browser (iframe alongside AUT) | Out-of-process Node/Python/Java/.NET |
| Wire protocol | Direct JS calls + privileged automation bridge | CDP (Chromium) + patched protocols (Firefox/WebKit) over WebSocket |
| Wait model | Implicit retry-ability on assertions and queries | Actionability checks before every action |
| Multiple tabs / origins | Single tab; cross-origin via cy.origin since v12 | First-class via BrowserContext and page.context().newPage() |
| Network interception | cy.intercept on XHR/fetch in-browser | page.route via CDP at the network layer (incl. WebSocket, service workers) |
| Parallelism model | Per-spec across containers (Cypress Cloud or sharding) | Worker processes within a single run, plus sharding |
Cypress's Retry-Ability: The Implicit Loop
Cypress's retry-ability is not a feature you opt into — it is how every query and assertion works by default. Per the official Cypress retry-ability docs, Cypress wraps every command in an implicit retry loop bounded by a 4-second default timeout (defaultCommandTimeout). When you write cy.get('.row').should('have.length', 3), Cypress re-runs the query and the assertion on each animation frame until both pass or the timeout expires.
The catch: only the last query in a chain is retried. The moment you introduce a non-Cypress promise or a chained .then(() => ...) that performs an external lookup, the retry window collapses. Here is the canonical Cypress pattern, the broken version, and the fix:
// cypress/e2e/checkout.cy.ts
// Realistic scenario: a checkout flow that re-renders the cart row
// after the price API resolves. The test asserts the total updates.
describe('Checkout total updates after async price refresh', () => {
beforeEach(() => {
// Edge case: deterministic geo & locale to avoid currency-format flake
cy.intercept('GET', '/api/cart', { fixture: 'cart-eur.json' }).as('cart')
cy.intercept('POST', '/api/price-refresh', { fixture: 'price-refresh.json' })
.as('refresh')
cy.visit('/checkout', {
onBeforeLoad(win) {
Object.defineProperty(win.navigator, 'language', { value: 'es-ES' })
},
})
cy.wait('@cart') // gate the test on the network event we care about
})
it('BROKEN — retry window collapses inside .then()', () => {
cy.get('[data-testid=refresh-prices]').click()
// Anti-pattern: the inner cy.get fires once; if .total has not re-rendered
// by the time .then() executes, you get a false negative.
cy.get('[data-testid=cart-row]').then(($row) => {
expect($row.find('.total').text()).to.eq('€42.00')
})
})
it('FIXED — chain assertions so Cypress retries them', () => {
cy.get('[data-testid=refresh-prices]').click()
cy.wait('@refresh') // explicit network-event gate
cy.get('[data-testid=cart-row] .total', { timeout: 8000 })
.should('have.text', '€42.00')
.and('be.visible')
})
})The fix has three load-bearing pieces. First, cy.wait('@refresh') turns an implicit time-based race into an explicit network event. Second, the selector retargets the deepest element so each retry re-queries from the live DOM. Third, the chained .should().and() keeps both assertions inside the retry envelope. Anything you can put on the right side of .should() is retried; anything you stuff inside .then() is not.
Playwright's Actionability Checks: The Explicit Gate
Playwright takes the opposite stance. Instead of retrying assertions, it retries the preconditions of an action. Per the Playwright docs on actionability, before any click, fill, or press, Playwright auto-waits until the element is: attached to the DOM, visible, stable (no in-flight animation), receives pointer events (not covered by another element), and enabled. Only then does the action fire.
Because these checks run over CDP rather than inside the page, Playwright can ask the browser whether an element is mid-animation (via Animation.animationFinished-equivalent CDP signals) without polling the DOM from inside the runtime. That is why a Playwright click() on a Framer-Motion modal that is still sliding in does not race the way a Selenium WebElement.click() does.
// tests/login.spec.ts
import { test, expect } from '@playwright/test'
test.describe('Animated login modal', () => {
test.beforeEach(async ({ page, context }) => {
// Edge case: deterministic clock + locale prevents subtle date-format flakes
await context.addInitScript(() => {
const fixed = new Date('2026-01-15T09:00:00Z').valueOf()
Date.now = () => fixed
})
await page.route('**/api/auth/login', async (route) => {
// Realistic scenario: real backend, but force a 250ms delay to expose races
await new Promise((r) => setTimeout(r, 250))
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ token: 'eyJ...', user: { id: 'u_42' } }),
})
})
await page.goto('/login')
})
test('logs in even while the modal is still animating', async ({ page }) => {
// Click triggers a 350ms slide-in. Locator.click() will wait for the
// overlay to settle before firing — no manual sleep required.
await page.getByRole('button', { name: /sign in/i }).click()
const dialog = page.getByRole('dialog', { name: /sign in/i })
await expect(dialog).toBeVisible()
await dialog.getByLabel('Email').fill('jane@example.com')
await dialog.getByLabel('Password').fill('correct horse battery')
// Defensive: cap the action with an explicit timeout so the test fails fast
// if something blocks the button (e.g., a stray cookie banner).
await dialog.getByRole('button', { name: /^sign in$/i })
.click({ timeout: 5000 })
// Auto-retrying expect — equivalent to Cypress's chained .should().and()
await expect(page.getByTestId('user-menu')).toContainText('Jane')
await expect(page).toHaveURL(/\/dashboard/)
})
test('does NOT click a covered button (edge case)', async ({ page }) => {
// A cookie banner appears 200ms after load and covers the CTA.
// Playwright's "receives events" actionability check refuses to click
// until the banner is dismissed — we want a clear failure, not a flake.
await page.getByRole('button', { name: 'Accept cookies' }).click()
await page.getByRole('button', { name: /sign in/i }).click()
await expect(page.getByRole('dialog', { name: /sign in/i })).toBeVisible()
})
})Two details worth surfacing. First, addInitScript runs before any page script, so freezing Date.now kills an entire class of date-format flakes that would otherwise vary by CI clock skew. Second, the cookie-banner test deliberately exposes Playwright's “receives events” check: if the actionability gate did not exist, the test would silently click the banner and the “Sign in” click would land on a stale coordinate. Instead, Playwright refuses to click and the test fails loudly — which is what you want.
Network Interception: Where the Architectures Diverge Hardest
Network mocking is where the in-browser vs out-of-process split becomes operationally visible. Cypress's cy.intercept patches window.fetch and XMLHttpRequest from inside the page, which means it cannot see requests issued by service workers, WebSocket frames, or — historically — preflight OPTIONS that some browsers handle below the JS layer. Playwright's page.route hooks the browser's network stack directly via CDP's Fetch.requestPaused events, so it sees everything the renderer would have sent on the wire.
Here is the same scenario — a paginated search endpoint that occasionally returns a 503 and must be retried — written in both frameworks. Mock both the happy path and the failure path so the test exercises the retry logic deterministically:
// cypress/e2e/search-retry.cy.ts
describe('Search retries once on transient 503', () => {
it('shows results after a 503 then 200', () => {
let calls = 0
cy.intercept('GET', '/api/search*', (req) => {
calls += 1
if (calls === 1) {
req.reply({ statusCode: 503, body: { error: 'try again' } })
} else {
req.reply({ fixture: 'search-results.json' })
}
}).as('search')
cy.visit('/search?q=playwright')
cy.wait('@search') // first call: 503
cy.wait('@search') // second call: 200
cy.get('[data-testid=result-row]').should('have.length.greaterThan', 0)
cy.get('[data-testid=retry-banner]').should('not.exist')
})
})
// tests/search-retry.spec.ts (Playwright)
import { test, expect } from '@playwright/test'
test('Search retries once on transient 503', async ({ page }) => {
let calls = 0
await page.route('**/api/search*', async (route) => {
calls += 1
if (calls === 1) {
await route.fulfill({ status: 503, body: JSON.stringify({ error: 'try again' }) })
} else {
await route.fulfill({
status: 200,
contentType: 'application/json',
body: JSON.stringify({ results: [{ id: 1, title: 'Playwright' }] }),
})
}
})
await page.goto('/search?q=playwright')
await expect(page.getByTestId('result-row')).toHaveCount(1)
await expect(page.getByTestId('retry-banner')).toBeHidden()
expect(calls, 'expected exactly two calls — one 503, one 200').toBe(2)
})The shape is similar, but the failure modes are not. Cypress's cy.wait('@search') implicitly orders the network events for you, which is convenient until the application issues a third call you did not expect — at which point the test passes but a real bug ships. Playwright forces you to assert the count explicitly, which is more typing but harder to fool.
Can you really eliminate flaky tests entirely?
No. Auto-wait races vanish with both frameworks, but environmental drift and unmocked third parties still need network gating to stay green.
Be honest about the scope of the win. Switching from Selenium to Cypress or Playwright will erase race-condition flakes (bucket 1) almost completely, because both frameworks ship retry-on-failure semantics that Selenium does not. Buckets 2 and 3 require discipline, not tooling: mock every third-party call by default, freeze the clock, pin locales, and isolate browser context per test.
Debugging the Flakes That Survive
Once you have adopted retry-ability or actionability, the flakes that remain are almost always one of these five patterns. Treat them as a diagnostic checklist:
- Animation racing the assertion. Symptom: passes in headed, fails in headless. Fix: in Cypress, chain
.should('be.visible').and('not.have.class', 'is-animating'); in Playwright, rely onLocator.click()stability check or addpage.emulateMedia({ reducedMotion: 'reduce' }). - Network call you did not mock. Symptom: passes locally, fails in CI. Fix: in both frameworks, turn unintercepted calls into errors. Cypress:
cy.intercept('**', (req) => req.reply(500))as a catch-all; Playwright:page.route('**', r => r.abort())at the top of the test and explicitly allow-list each call. - Time-zone or locale drift. Symptom: passes in Europe, fails in US-East CI runners. Fix: freeze the clock and pin the locale at the page level (see the Playwright example above). For Cypress, use
cy.clock()with a Unix epoch. - Worker isolation leaks. Symptom: pass when run alone, fail when run with the rest of the suite. Fix: in Playwright, ensure tests do not share a
BrowserContextunless explicitly intended; in Cypress, avoid module-scoped variables that survive between specs and confirmtestIsolation: trueis on. - Selector overspecification. Symptom: fails after a UI refactor that did not change behaviour. Fix: prefer role and label selectors (
page.getByRole('button', { name: /save/i })orcy.findByRole('button', { name: /save/i })) over CSS class chains. The Accessible Name and Description Computation algorithm (W3C ARIA spec) is what backs these selectors — it is far more stable than DOM structure.
When something flakes despite the checklist, your next move is the same in both frameworks: capture a trace. Playwright's --trace on-first-retry records a self-contained .zip with DOM snapshots, network log, console output, and screenshots; open it with npx playwright show-trace trace.zip. Cypress's equivalent is the test replay in Cypress Cloud, plus the local --record + video output. Do not stare at the failing assertion. Stare at the frame before the failure — that is where the race lives.
Edge Cases and Gotchas We See in the Wild
- Cross-origin redirects in Cypress. Even with
cy.origin()in v12+, a chain of three or more origin hops still trips state-machine bugs in Cypress's spec bridge. If your auth flow bounces SSO → IdP → callback → app, prefer Playwright or programmatic login via API. - WebSocket assertions. Cypress cannot intercept WebSocket frames directly; Playwright can via
page.on('websocket', ws => ws.on('framesent', ...)). If you test a chat or trading app, this is decisive. - iframe-heavy editors. CKEditor, TinyMCE, and Stripe Elements all live inside iframes. Cypress requires the
cypress-iframeplugin or manualcy.get('iframe').its('0.contentDocument.body')dancing; Playwright treats iframes as first-class viapage.frameLocator('iframe[name=stripe]'). - Service workers. If your PWA caches API responses through a service worker, Cypress's in-page intercept will miss them — the cached responses never round-trip through
window.fetch. Playwright sees these because it hooks the network stack below the SW. - File uploads with drag-and-drop. Both frameworks support file inputs, but only Playwright reliably synthesises the full
DataTransferdrag-and-drop event sequence. Cypress requires thecypress-file-uploadplugin and still struggles with some drop zones.
A Framework-Agnostic Stability Checklist
No matter which framework you pick, these five rules are the minimum bar for a green CI. The Cypress and Playwright communities agree on most of them; the State of JS 2024 survey results — published by Devographics and freely available on stateofjs.com — show Playwright leading in retention and satisfaction while Cypress still leads in name recognition, which roughly tracks our experience advising teams that adopted one and then the other.
- Mock every third-party network call by default; allow-list, do not deny-list.
- Freeze
Date.nowand pinnavigator.languageper test. - Use role- or label-based selectors backed by the accessibility tree.
- Cap every action with an explicit timeout — never the default — so failures are fast and legible.
- Capture a trace or video on first retry and review it before re-running.
If you want to see how the same checklist looks applied to a real e-commerce checkout flow, walk through our Playwright network-mocking deep dive for the exhaustive version with fixture loaders and CI artifact retention.
So Which One Should You Pick?
If you are starting a new project today and your application talks to multiple origins, drives multiple tabs, runs a service worker, or needs to test WebSocket payloads — pick Playwright. The out-of-process model gives you protocol-level access that Cypress's in-page model cannot match without invasive workarounds.
If your team is already shipping with Cypress, your application is a single same-origin SPA, and your test authors lean on the live Test Runner UI and time-travel debugger to onboard junior engineers — keep Cypress. The retry-ability semantics are excellent, the developer experience is unmatched for that shape of app, and switching costs are real.
Either way: the framework is not what makes your tests reliable. Your discipline around network mocking, clock control, and selector stability is. The framework just chooses which class of flake you no longer have to think about.
Ready to strengthen your test automation?
Desplega.ai helps QA teams build robust test automation frameworks that survive CI and stop wasting engineering hours on flaky reruns.
Get StartedFrequently Asked Questions
Is Playwright always faster than Cypress in CI?
Not always. Playwright parallelizes across worker processes by default and is usually faster on multi-spec suites, but a single long Cypress spec on one container can match or beat it.
Does Cypress still struggle with iframes and multiple tabs?
Cypress improved cross-origin support in v12, and same-origin iframes work via plugins, but it still cannot drive a second tab. Playwright drives multiple tabs and contexts natively.
Can I share test fixtures between Cypress and Playwright?
Yes. Extract fixtures into a framework-agnostic layer — plain JSON, factories returning POJOs, or HTTP seed endpoints — and avoid coupling them to cy.* or page.* APIs.
How do I pick between Cypress and Playwright for a new project today?
Prefer Playwright for multi-browser, multi-tab, and API-heavy flows. Prefer Cypress when your team values the Test Runner UI, time-travel debugger, and live-reload feedback loop.
Will switching frameworks actually fix my flaky tests?
Sometimes. If your flakes are auto-wait races, both frameworks erase that class. If they are environmental drift or unmocked third parties, switching only renames the incidents.
Related Posts
Hot Module Replacement: Why Your Dev Server Restarts Are Killing Your Flow State | desplega.ai
Stop losing 2-3 hours daily to dev server restarts. Master HMR configuration in Vite and Next.js to maintain flow state, preserve component state, and boost coding velocity by 80%.
The Flaky Test Tax: Why Your Engineering Team is Secretly Burning Cash | desplega.ai
Discover how flaky tests create a hidden operational tax that costs CTOs millions in wasted compute, developer time, and delayed releases. Calculate your flakiness cost today.
The QA Death Spiral: When Your Test Suite Becomes Your Product | desplega.ai
An executive guide to recognizing when quality initiatives consume engineering capacity. Learn to identify test suite bloat, balance coverage vs velocity, and implement pragmatic quality gates.