Back to Blog
May 19, 2026

Unit Testing vs. Vibes Testing: Why 100% Coverage is a Cultural Lie

The 100% coverage badge is corporate cosplay — here is what to do instead when you are shipping solo.

Unit testing vs vibes testing — coverage badge versus a Playwright smoke test

You have a side project quietly pulling $2K MRR. Three users emailed this week because Stripe is double-charging on retry. You open the repo and the CI badge is green. Coverage is 96%. Tests pass. The bug ships anyway.

Welcome to the lie. The lie is not unit tests. The lie is that the coverage number on your README means anything about whether your app actually works. This post is about what to do instead — call it vibes testing — and how to build a tiny, opinionated test suite that catches the bugs that take your users (and your weekend) offline.

I am writing this for indie hackers and solopreneurs who have caught themselves staring at Istanbul output at 1am wondering why they spent the night chasing branch coverage on an error path that no real user will ever hit. If that is you: take a sip of yerba mate, read on, delete some tests by Friday.

What is vibes testing, actually?

Vibes testing: a small, opinionated suite that protects the business outcomes you care about — signup, checkout, payouts — instead of chasing per-line coverage metrics.

It is not anti-testing. It is anti-ceremony. The shape of a vibes suite for a typical indie SaaS:

  • One golden-path end-to-end test per critical flow (Playwright or Cypress) — signup, checkout, primary feature, account deletion.
  • Property-based tests for any code with real math, parsing, or invariants (pricing, billing, date ranges, slug generators).
  • Contract tests at every external-API boundary (Stripe, Postmark, OpenAI, your DB) so you find out when a vendor changes a field, not when a user does.
  • Almost no traditional unit tests for thin glue code, controllers, or DTO mappers — they have a low bug-density and a high churn rate.

The Stack Overflow 2024 Developer Survey reports unit testing as the most common developer practice across respondents — so the cultural-default is already pointing solo devs toward unit tests as a first reach. That makes vibes testing a deliberate choice, not a default. You are explicitly opting out of the line-coverage ritual in favor of outcome-coverage.

Why does 100% coverage lie about quality?

Because coverage measures execution, not correctness. A line ran. That is it. Coverage cannot see ordering, retries, concurrency, missing branches, stale mocks, or assertions you forgot to write.

The mechanical truth is simple. A coverage tool like Istanbul, c8, or JaCoCo instruments your code, counts which AST nodes were visited during a test run, and divides by total. That is the entire signal. It does not know whether your assertion was expect(result).toBeTruthy() on a function that always returns true. It does not know that you mocked the boundary your bug lives behind. It does not know that two correctly-tested functions, composed, can deadlock.

Martin Fowler wrote about this almost twenty years ago in his TestCoverage essay — high coverage is a necessary but not sufficient condition for confidence. The cultural lie is treating coverage as the goal, instead of the byproduct of writing the tests that actually matter. We are going to lean on three concrete examples below to make that bite.

The coverage-theatre trap (real example)

Here is a Stripe webhook handler that looks fine, ships with 100% line coverage, and double-charges every customer the first time Stripe retries a delivery. Read it once, then we will dissect why coverage cannot see the bug.

// app/api/stripe/webhook/route.ts
import { NextRequest } from 'next/server';
import Stripe from 'stripe';
import { provisionAccess } from '@/lib/entitlements';

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function POST(req: NextRequest) {
  const sig = req.headers.get('stripe-signature');
  if (!sig) {
    return new Response('No signature', { status: 400 });
  }

  // Stripe requires the RAW body for signature verification — not JSON.
  const raw = await req.text();
  let event: Stripe.Event;

  try {
    event = stripe.webhooks.constructEvent(
      raw,
      sig,
      process.env.STRIPE_WEBHOOK_SECRET!,
    );
  } catch (err) {
    console.error('Signature verification failed:', err);
    return new Response('Invalid signature', { status: 400 });
  }

  // BUG: nothing here checks event.id against an already-processed log.
  // Stripe retries failed deliveries for up to 3 days. If our DB write
  // succeeds but our 200 is dropped (timeout, deploy, cold start), Stripe
  // resends — and we provision access AGAIN.
  if (event.type === 'checkout.session.completed') {
    const session = event.data.object as Stripe.Checkout.Session;
    await provisionAccess(session.customer as string, session.id);
  }

  return new Response('ok');
}

// ----- The "100% covered" unit test that ships the bug -----

import { POST } from './route';

test('webhook returns 200 on valid event', async () => {
  const event = makeFakeEvent('checkout.session.completed');
  jest
    .spyOn(stripe.webhooks, 'constructEvent')
    .mockReturnValue(event);

  const res = await POST(makeRequest(event));
  expect(res.status).toBe(200);
});

test('webhook 400s on missing signature', async () => {
  const res = await POST(makeRequest(null, { withSig: false }));
  expect(res.status).toBe(400);
});

test('webhook 400s on bad signature', async () => {
  jest.spyOn(stripe.webhooks, 'constructEvent').mockImplementation(() => {
    throw new Error('Invalid signature');
  });
  const res = await POST(makeRequest({}));
  expect(res.status).toBe(400);
});

Coverage report: 100%. Every branch hit. The shape of the suite even looks responsible — happy path, missing header, bad signature. And yet the entire idempotency story — the actual contract between Stripe and your service per the Stripe webhook best-practices docs — is untested.

Why does coverage miss it? Because the missing code does not exist yet. There is no line for "check the events table", so Istanbul has nothing to count. Coverage is a mirror of what you already wrote; it cannot reflect the lines you forgot. This is the coverage-theatre trap in its purest form. For a deeper rabbit-hole on webhook idempotency patterns, see our deep dive on Stripe webhook idempotency — but for now, here is the vibes-test equivalent of this same flow.

Vibes testing in practice (Playwright + replay)

Vibes testing is honest about the layer the bug actually lives at. Webhook idempotency is not a function-level concern; it is a cross-system invariant. So you write the test at the system layer. Here is the same flow, covered by a single Playwright smoke that catches the double-grant bug above:

// e2e/checkout.smoke.spec.ts
import { test, expect } from '@playwright/test';
import {
  replayLastWebhook,
  countEntitlements,
  cleanupUser,
} from './helpers';

test.describe('Checkout — golden path + retry safety', () => {
  test.afterEach(async ({ page }, info) => {
    // Edge case: ensure we don't leak Stripe customers between runs.
    // The 'currentTestEmail' is set in storage; bail safely if missing.
    const email = await page
      .evaluate(() => localStorage.getItem('vibes:email'))
      .catch(() => null);
    if (email) await cleanupUser(email);
  });

  test('sign up, pay, land on dashboard with Pro access', async ({ page }) => {
    const email = `vibes-${Date.now()}@desplega.test`;
    await page.evaluate((e) => localStorage.setItem('vibes:email', e), email);

    await page.goto('/signup');
    await page.getByLabel('Email').fill(email);
    await page.getByLabel('Password').fill('VibesTest!42');
    await page.getByRole('button', { name: 'Create account' }).click();

    await page.getByRole('button', { name: 'Upgrade to Pro' }).click();

    // Stripe test mode — using the documented 4242 success card.
    const card = page.frameLocator('iframe[name*="card"]');
    await card.getByPlaceholder('Card number').fill('4242 4242 4242 4242');
    await card.getByPlaceholder('MM / YY').fill('12 / 30');
    await card.getByPlaceholder('CVC').fill('123');
    await page.getByRole('button', { name: 'Pay' }).click();

    // Outcome assertion #1: user is on dashboard with Pro flag.
    await expect(page.getByText(/pro plan/i)).toBeVisible({ timeout: 15_000 });

    // Outcome assertion #2 (THE BUG GUARD): replay the webhook.
    // In prod, Stripe will retry on 5xx, deploy timeouts, or cold-start
    // failures. We simulate the retry and assert we provision exactly once.
    await replayLastWebhook(email);
    await replayLastWebhook(email);
    expect(await countEntitlements(email)).toBe(1);
  });

  test('card decline keeps user on free plan', async ({ page }) => {
    const email = `vibes-decline-${Date.now()}@desplega.test`;
    await page.evaluate((e) => localStorage.setItem('vibes:email', e), email);

    await page.goto('/signup');
    await page.getByLabel('Email').fill(email);
    await page.getByLabel('Password').fill('VibesTest!42');
    await page.getByRole('button', { name: 'Create account' }).click();
    await page.getByRole('button', { name: 'Upgrade to Pro' }).click();

    const card = page.frameLocator('iframe[name*="card"]');
    // Stripe's documented generic-decline test card.
    await card.getByPlaceholder('Card number').fill('4000 0000 0000 0002');
    await card.getByPlaceholder('MM / YY').fill('12 / 30');
    await card.getByPlaceholder('CVC').fill('123');
    await page.getByRole('button', { name: 'Pay' }).click();

    // Edge case: we MUST not provision Pro on declines, even if the UI
    // briefly flickered "Processing…". This catches optimistic-UI bugs.
    await expect(page.getByText(/declined/i)).toBeVisible();
    expect(await countEntitlements(email)).toBe(0);
  });
});

Two tests. Twenty-ish assertions across both. Cover signup, checkout, webhook retries, optimistic-UI on declines, and the entitlement invariant. Total line coverage of the underlying repo will probably sit somewhere between 40% and 70%. That is fine. The relevant question is: can the double-grant bug ship? No. The test fails. You fix it. You merge.

And note the explicit cleanupUser in afterEach. Vibes tests live longer when they clean up after themselves — Stripe test-mode customers stack up otherwise, hit the 100-customer pagination wall, and your test starts failing for unrelated reasons in week three.

Unit tests vs. vibes tests: side-by-side

DimensionClassic unit testVibes test
What it assertsA function returns X for input YA business outcome holds end-to-end
LayerPure function or classHTTP, DB, browser, webhook
Survives refactorsRarely — couples to internalsUsually — couples to behavior
Cost per testMilliseconds, ~5 LOCSeconds, ~30-50 LOC
Bug-catch densityHigh for pure logic, low for glueHigh for integration bugs
Best forPricing math, parsers, validatorsSignup, checkout, webhooks, auth
Worst forDTO mappers, thin controllersTight inner loops (slow CI feedback)

Vibes testing does not replace unit testing — it sits on top of it. Unit tests are still the right tool for pricing logic, parsers, slug generators, and anything with real math. Which is a great segue into the third example.

Property-based testing for the vibes-pilled

For the small slice of your codebase that genuinely deserves unit tests — the math-heavy core — example-based unit tests are still a weak signal. You write three examples, you cover three branches, you ship the fourth-branch bug. The fix is property-based testing: instead of asserting "computeInvoice([item], 21) === 121", you assert "for any list of items and any tax percentage in a sane range, the invariants hold." QuickCheck (Haskell, 1999) invented this; fast-check is the modern TS library.

Here is a real pricing test, with three invariants, error handling for floating-point drift, and a deliberate seed so failures are reproducible:

// lib/pricing.test.ts
import fc from 'fast-check';
import { computeInvoice, InvoiceError } from './pricing';

const itemArb = fc.record({
  unitPriceCents: fc.integer({ min: 0, max: 1_000_000 }),
  quantity: fc.integer({ min: 0, max: 100 }),
  discountBps: fc.integer({ min: 0, max: 10_000 }), // 0-100% in basis points
});

const taxPctArb = fc.integer({ min: 0, max: 25 });

describe('computeInvoice — invariants', () => {
  test('total is monotonic in quantity (no negative-line bugs)', () => {
    fc.assert(
      fc.property(itemArb, taxPctArb, (item, tax) => {
        const oneUnit = computeInvoice([{ ...item, quantity: 1 }], tax);
        const twoUnits = computeInvoice([{ ...item, quantity: 2 }], tax);
        expect(twoUnits.totalCents).toBeGreaterThanOrEqual(oneUnit.totalCents);
      }),
      { numRuns: 200, seed: 42 },
    );
  });

  test('zero-quantity items contribute zero to the total', () => {
    fc.assert(
      fc.property(
        fc.array(itemArb, { minLength: 1, maxLength: 20 }),
        taxPctArb,
        (items, tax) => {
          const zeroed = items.map((i) => ({ ...i, quantity: 0 }));
          expect(computeInvoice(zeroed, tax).totalCents).toBe(0);
        },
      ),
    );
  });

  test('rejects negative tax cleanly (no NaN cascade)', () => {
    fc.assert(
      fc.property(
        fc.array(itemArb, { minLength: 1, maxLength: 5 }),
        fc.integer({ min: -100, max: -1 }),
        (items, badTax) => {
          // EDGE CASE: a single negative tax slipping through a form
          // historically caused our invoices to show NaN. We assert
          // a typed error instead of a silent corruption.
          expect(() => computeInvoice(items, badTax)).toThrow(InvoiceError);
        },
      ),
    );
  });
});

The win is not the line count — it is the search space. With 200 randomized runs per property, fast-check explores combinations a human would never write by hand. When it finds a counterexample, it automatically shrinks it to the smallest failing case (for example, "quantity = 2, discountBps = 10000") so you have a one-line repro the moment the suite goes red. The deterministicseed: 42 means anything fast-check finds today, your CI will find tomorrow.

In our experience, swapping ten example-based pricing tests for three property tests typically increases the bug-find rate on the pricing module while reducing test code roughly by half. That ratio is anecdotal — we have not benchmarked it across orgs — but the search-space argument is well-established and goes back to Claessen and Hughes' original 2000 QuickCheck paper.

Troubleshooting: when vibes testing goes wrong

Vibes testing is leaner, but it has its own failure modes. Here are the four we hit most often shipping our own indie projects, and how to triage each one fast.

  • "My smoke test is flaky." Almost always a timing or selector problem, not a real bug. Replace page.waitForTimeout with expect(...).toBeVisible({ timeout }); replace nth-child selectors with role-based locators (getByRole). Run npx playwright test --repeat-each=20 locally — if it passes 20/20, you have stabilized it.
  • "The smoke passed in CI but the bug shipped." Your test is asserting the wrong layer. Check your mocks — if you are stubbing the boundary the bug lives behind (Stripe API, your DB, an auth provider), the test is decorative. Move the assertion to the layer that is closest to the user-visible outcome.
  • "Coverage dropped after the refactor." Good. That means dead code is no longer being counted, or that tests were coupled to internals you deleted. The number to watch is escaped-bug count per month, not coverage. If escaped bugs are trending down, lower coverage is healthy.
  • "Property tests find a failure I can't reproduce." You forgot the seed. Always pin { seed: <integer> } in fc.assert so failures are deterministic. fast-check also prints aCounterexample: ... line — copy-paste that into a one-off test to lock the regression before fixing.

A common refrain we hear from indie hackers in Barcelona, Madrid, and Valencia is "but my CI is red half the week." Nine times out of ten it is the first item — timing in smoke tests — not a deep product bug. Treat flake hunting as a first-class task, not a chore you do between features. A flaky vibes suite is worse than no suite: it teaches the team to ignore red builds.

Gotchas and edge cases nobody warns you about

  • Snapshot tests are not vibes tests. They are regression markers. Useful for "did the HTML change?" tests; useless for "does signup work?" If your suite is mostly snapshots, you are mostly catching diffs, not bugs.
  • Vibes tests need a real database. Use a real Postgres in CI via a service container. SQLite-in-memory will let constraint violations and JSON-operator bugs slip through.
  • Test parallelism amplifies state bugs. Playwright and Jest default to parallel workers. If your vibes test relies on a single shared row (test user, fixed Stripe customer), you will see racey failures only under load. Sharded prefixes (`vibes-${Date.now()}-${process.pid}`) fix the worst of it.
  • Cold starts on serverless lie about timing. A Playwright smoke against a Vercel preview deployment that just built will report 3-5s response times that do not exist for warm requests. Either warm the deployment with a curl before the suite, or bump timeouts so first-request latency does not flake the run.
  • Don't mock what you can launch in a container. Postgres, Redis, MinIO, even a fake Stripe — all run in <5s in GitHub Actions services. Mocks rot; containers don't. The minute Stripe ships a new event type, your mocked test still passes; a contract test against the real API fails on day one.

How to migrate without a multi-quarter refactor

You don't need to delete your unit tests on Monday morning. Most of the indie teams we have seen migrate run a two-week experiment:

  • Week 1: Add one Playwright smoke per critical flow. Keep all existing tests. Lower coverage gate to whatever the current number actually is — no more "must be 90%" fictions.
  • Week 2: Track escaped-bug count and CI time. If the smoke suite catches a bug your unit suite missed, the experiment is already paying for itself.
  • From week 3 on: Delete unit tests only when you delete or refactor their target. Do not rewrite — let attrition do the work. Coverage will drift down; bug count will too. That gap is the vibes-testing dividend.

We have shipped four indie projects to this playbook in the last eighteen months and not regretted it once. The biggest unlock is not the test code itself — it is the meeting time you reclaim by not arguing about coverage targets in a sprint review.

Stop measuring lines. Start measuring outcomes.

100% coverage is a cultural artifact from an era where shipping software took six months and a release manager. The shape of indie work — fast iteration, small surface area, one or two engineers, critical flows that the founder personally walks through every morning — calls for a different testing posture. Vibes testing is that posture: small suite, real boundaries, business outcomes, property tests where math matters, smokes everywhere else.

Delete a test this week. Add a smoke. Repeat. If your suite gets smaller and your bug count goes down, you are on the vibes. If your bug count goes up, you removed the wrong tests — put the math-heavy ones back and try again. The goal is signal, not theatre.

Ready to ship your next project faster?

Desplega.ai helps indie hackers and solopreneurs build and ship faster — without drowning in test bureaucracy or 12-month coverage migrations.

Get Started

Frequently Asked Questions

Is 100% code coverage actually bad?

It is not bad, it is misleading. Coverage proves a line ran during a test, not that the behavior is correct. Optimize for outcome coverage: did the bug ship, or not?

What is vibes testing really?

Vibes testing is the 5-10 tests that protect the business outcomes you care about — signup, checkout, payouts — instead of chasing per-line metrics. Heuristic, not bureaucratic.

When should a solo dev add a unit test?

When a function has tricky math, parsing, or invariants — pricing, date math, validators, parsers. Skip unit tests for thin glue code; that is what end-to-end smokes are for.

Will I get burned skipping tests?

Sometimes, yes. But you get burned harder by 18 months of coverage theatre while competitors ship. Carry a real smoke suite and let coverage drift up naturally.

How do I drop a coverage gate without scaring my team?

Replace the gate with a smoke gate: the PR must keep the golden-path Playwright suite green. Track escaped-bug count monthly. The number will go down, not up.