Should Sinon replace Playwright or Cypress assertions?

No. Use Sinon inside unit or component seams to control collaborators. Keep Playwright and Cypress for browser behavior, user journeys, and integration confidence.

Why do Sinon stubs leak between tests?

Leaks usually happen when a test replaces a method and never restores it. Create fakes through a sandbox and call restore in afterEach, even when tests throw.

When is a Sinon mock too brittle?

A mock is too brittle when it verifies incidental calls instead of a contract. Prefer stubs plus result assertions unless the interaction is the behavior under test.

Can Sinon work with ESM modules?

Yes, but it cannot rewrite read-only live bindings. Stub injectable objects, adapters, or dependency parameters instead of imported constants or frozen exports.

Advanced Sinon.js: Architecting Resilient Test Spies, Stubs, and Mocks in Modern JS Suites

Sinon.js is not just a library for making assertions prettier. Used well, it is a way to isolate unstable boundaries, document collaboration contracts, and make failures easier to interpret. Used casually, it can create tests that pass for the wrong reason, fail after unrelated refactors, or poison the next file in the suite because a stub was never restored.

That distinction matters in modern JavaScript teams. The Stack Overflow 2024 Developer Survey reports JavaScript at 64.6% usage among professional developers, and the State of JavaScript 2024 testing section shows 11,667 respondents for its professional testing-tools question, with Jest, Storybook, Vitest, Playwright, Cypress, Testing Library, Mocha, and Selenium all appearing in the same ecosystem. In other words: JavaScript testing is broad, mixed, and full of seams where unit, component, API, and browser automation need different kinds of doubles.

This guide focuses on the concrete problem behind many flaky or low-signal suites: how do you use Sinon spies, stubs, and mocks without overfitting to implementation details? If your team already runs browser suites, pair this with our flaky test debugging deep dive and the framework selection notes in our test automation strategy guide.

What Problem Does Sinon.js Actually Solve?

Use spies for observation, stubs for controlled behavior, and mocks only when the interaction itself is the contract under test.

A test double stands in for a real collaborator. Sinon gives you several flavors, but the architectural decision is not about naming. It is about the risk you are trying to control. A spy observes calls while letting behavior continue. A stub replaces behavior so the test can force a branch or prevent an external side effect. A mock combines replacement with predeclared expectations and fails when the expected interaction does not occur.

The deepest mistake is using a stronger double than the test needs. If the observable outcome is a returned value, DOM state, persisted command, or emitted domain event, asserting that an internal method was called twice is usually noise. If the behavior is the interaction itself, such as a payment gateway being charged exactly once with an idempotency key, the call contract is the signal and a mock or strict stub assertion can be justified.

Double	Use it when	Avoid it when	Failure signal
Spy	You need to observe calls without changing behavior.	The real collaborator is slow, random, destructive, or remote.	Call history: arguments, order, return values, thrown errors.
Stub	You need deterministic collaborator behavior.	You are asserting every internal step instead of the outcome.	Configured branch plus post-act assertions.
Mock	The expected interaction is part of the public contract.	A refactor could change call shape while preserving behavior.	Expectation verification before or during restore.

Example 1: Spying on an Observable Contract Without Mutating Behavior

A production spy should prove a meaningful contract while preserving the original behavior. The example below tests an analytics dispatcher that must notify every subscriber, continue after one subscriber throws, and report subscriber failures to an error logger. The edge case is important: a brittle implementation would stop at the first exception and hide downstream telemetry.

// analytics-dispatcher.spec.js
import { expect } from 'chai';
import sinon from 'sinon';

function createAnalyticsDispatcher({ logger }) {
  const subscribers = new Set();
  return {
    subscribe(fn) {
      if (typeof fn !== 'function') throw new TypeError('subscriber must be a function');
      subscribers.add(fn);
      return () => subscribers.delete(fn);
    },
    publish(event) {
      if (!event || typeof event.name !== 'string') throw new TypeError('event.name is required');
      for (const subscriber of subscribers) {
        try {
          subscriber(event);
        } catch (error) {
          logger.error('analytics subscriber failed', { eventName: event.name, error });
        }
      }
    },
  };
}

describe('analytics dispatcher', () => {
  const sandbox = sinon.createSandbox();
  let logger;

  afterEach(() => sandbox.restore());

  it('continues notifying subscribers and logs failures', () => {
    logger = { error: sandbox.spy() };
    const dispatcher = createAnalyticsDispatcher({ logger });
    const first = sandbox.spy(() => {
      throw new Error('crm timeout');
    });
    const second = sandbox.spy();

    dispatcher.subscribe(first);
    dispatcher.subscribe(second);
    dispatcher.publish({ name: 'checkout.completed', orderId: 'ord_123' });

    sinon.assert.calledOnce(first);
    sinon.assert.calledOnce(second);
    sinon.assert.calledWith(second, sinon.match({ name: 'checkout.completed' }));
    sinon.assert.calledOnce(logger.error);
    expect(logger.error.firstCall.args[1].error.message).to.equal('crm timeout');
  });

  it('throws a useful validation error for malformed events', () => {
    logger = { error: sandbox.spy() };
    const dispatcher = createAnalyticsDispatcher({ logger });

    expect(() => dispatcher.publish({ orderId: 'missing-name' })).to.throw(TypeError, 'event.name');
    sinon.assert.notCalled(logger.error);
  });
});

Notice what the test does not assert: it does not verify loop mechanics, Set usage, or exact log object shape beyond the error that matters. Sinon records call arguments, exceptions, and order, but you still choose which observations are contractual. That choice is what keeps a spy from turning into a refactor alarm.

Example 2: Stubbing Async Dependencies for Retry and Idempotency

Stubs earn their keep when real collaborators are nondeterministic, expensive, or destructive. Payment, email, queue, and identity providers are classic examples. The key is to stub the dependency boundary, not the method under test. The production code should still execute the retry, idempotency, error classification, and response mapping logic.

// payment-service.spec.js
import { expect } from 'chai';
import sinon from 'sinon';

async function chargeOrder({ gateway, audit }, order) {
  if (!order || order.totalCents <= 0) throw new RangeError('order total must be positive');
  const idempotencyKey = 'order:' + order.id;
  for (let attempt = 1; attempt <= 2; attempt += 1) {
    try {
      const result = await gateway.charge({
        amount: order.totalCents,
        currency: order.currency || 'EUR',
        idempotencyKey,
      });
      await audit.record({ orderId: order.id, gatewayId: result.id, attempt });
      return { status: 'paid', gatewayId: result.id };
    } catch (error) {
      if (attempt === 2 || error.code !== 'ETIMEDOUT') {
        await audit.record({ orderId: order.id, failed: true, reason: error.message });
        throw error;
      }
    }
  }
}

describe('chargeOrder', () => {
  const sandbox = sinon.createSandbox();
  afterEach(() => sandbox.restore());

  it('retries transient gateway timeouts without changing idempotency key', async () => {
    const timeout = Object.assign(new Error('gateway timeout'), { code: 'ETIMEDOUT' });
    const gateway = { charge: sandbox.stub() };
    const audit = { record: sandbox.stub().resolves() };
    gateway.charge.onFirstCall().rejects(timeout);
    gateway.charge.onSecondCall().resolves({ id: 'ch_456' });

    const result = await chargeOrder({ gateway, audit }, { id: 'ord_123', totalCents: 4299 });

    expect(result).to.deep.equal({ status: 'paid', gatewayId: 'ch_456' });
    sinon.assert.calledTwice(gateway.charge);
    expect(gateway.charge.firstCall.args[0].idempotencyKey).to.equal('order:ord_123');
    expect(gateway.charge.secondCall.args[0].idempotencyKey).to.equal('order:ord_123');
    sinon.assert.calledWith(audit.record, sinon.match({ orderId: 'ord_123', gatewayId: 'ch_456', attempt: 2 }));
  });

  it('does not retry non-transient gateway errors and still audits failure', async () => {
    const gateway = {
      charge: sandbox.stub().rejects(Object.assign(new Error('card declined'), { code: 'CARD_DECLINED' })),
    };
    const audit = { record: sandbox.stub().resolves() };

    try {
      await chargeOrder({ gateway, audit }, { id: 'ord_124', totalCents: 1999, currency: 'EUR' });
      throw new Error('expected chargeOrder to throw');
    } catch (error) {
      expect(error.message).to.equal('card declined');
    }

    sinon.assert.calledOnce(gateway.charge);
    sinon.assert.calledWith(audit.record, sinon.match({ failed: true, reason: 'card declined' }));
  });
});

Sinon stubs support consecutive behavior with onFirstCall, onSecondCall, and onCall, which maps naturally to retry logic. The gotcha is that repeated calls to returns, throws, resolves, or rejects overwrite behavior unless you use the consecutive-call API. That is why the example explicitly models the first timeout and the second success.

Why Should Every Sinon Suite Use a Sandbox?

A sandbox restores every fake from one test boundary, preventing leaked methods, clocks, and call history from poisoning later tests.

Sinon's official sandbox documentation says sandboxes remove the need to track every fake manually. The design reason is simple: fakes mutate objects. A stubbed method is not a harmless local variable; it is a replacement installed on a collaborator. If a test throws before restore, the replacement can survive and change later tests. In parallel runners, random file order can make that leak look like a race condition.

Treat the sandbox as the ownership boundary for the test. Create fakes through it, restore in afterEach, and prefer verifyAndRestore when mocks are involved. Use resetHistory only when the same fake intentionally spans multiple assertions inside one test. Do not use resetHistory as a substitute for restoring between tests; it clears call records while leaving behavior installed.

Example 3: Fake Timers Without Freezing the Wrong Clock

Fake timers are powerful because they replace time APIs. They are also dangerous for the same reason. If the system under test schedules with setTimeout but resolves work in microtasks, you need to advance timers and allow promises to settle. If another library depends on real timers, faking the global clock can break it. Keep timer tests narrow and restore aggressively.

// backoff-worker.spec.js
import { expect } from 'chai';
import sinon from 'sinon';

function createBackoffWorker({ queue, logger, baseDelayMs = 100 }) {
  return {
    async process(job) {
      if (!job || !job.id) throw new TypeError('job.id is required');
      let lastError;
      for (let attempt = 1; attempt <= 3; attempt += 1) {
        try {
          return await queue.handle(job);
        } catch (error) {
          lastError = error;
          logger.warn('job attempt failed', { jobId: job.id, attempt, message: error.message });
          if (attempt < 3) {
            await new Promise((resolve) => setTimeout(resolve, baseDelayMs * attempt));
          }
        }
      }
      throw lastError;
    },
  };
}

describe('backoff worker', () => {
  const sandbox = sinon.createSandbox();
  let clock;

  beforeEach(() => {
    clock = sandbox.useFakeTimers();
  });

  afterEach(() => sandbox.restore());

  it('advances exponential-ish backoff and resolves after transient failures', async () => {
    const queue = { handle: sandbox.stub() };
    const logger = { warn: sandbox.spy() };
    queue.handle.onCall(0).rejects(new Error('locked'));
    queue.handle.onCall(1).rejects(new Error('still locked'));
    queue.handle.onCall(2).resolves({ ok: true });
    const worker = createBackoffWorker({ queue, logger, baseDelayMs: 50 });

    const promise = worker.process({ id: 'job-7' });
    await Promise.resolve();
    await clock.tickAsync(50);
    await clock.tickAsync(100);

    expect(await promise).to.deep.equal({ ok: true });
    sinon.assert.calledThrice(queue.handle);
    sinon.assert.calledTwice(logger.warn);
  });

  it('surfaces validation errors without scheduling timers', async () => {
    const queue = { handle: sandbox.stub() };
    const logger = { warn: sandbox.spy() };
    const worker = createBackoffWorker({ queue, logger });

    try {
      await worker.process({});
      throw new Error('expected validation failure');
    } catch (error) {
      expect(error.message).to.equal('job.id is required');
    }

    sinon.assert.notCalled(queue.handle);
    sinon.assert.notCalled(logger.warn);
    expect(clock.countTimers()).to.equal(0);
  });
});

The subtle part is tickAsync. Synchronous tick can advance macro timers but still leave promise continuations pending. In modern async code, a fake clock test that does not flush microtasks can pass locally and hang in CI under a different scheduler. If your runner has its own fake timers, do not mix them with Sinon timers in the same test file.

Example 4: Using a Mock Only When the Interaction Is the Contract

Sinon mocks are intentionally stricter. The official docs describe mocks as fake methods with pre-programmed behavior and pre-programmed expectations, and warn that they enforce implementation details. That is not a reason to ban them. It is a reason to reserve them for interactions that are externally meaningful.

// email-consent.spec.js
import { expect } from 'chai';
import sinon from 'sinon';

async function updateMarketingConsent({ crm, consentLog }, user, accepted) {
  if (!user.email || !user.id) throw new TypeError('user id and email are required');
  const payload = { userId: user.id, email: user.email, accepted, source: 'settings-page' };
  await crm.setMarketingConsent(payload);
  await consentLog.append({ userId: user.id, accepted, at: new Date().toISOString() });
  return payload;
}

describe('marketing consent contract', () => {
  const sandbox = sinon.createSandbox();
  afterEach(() => sandbox.verifyAndRestore());

  it('sends exactly one consent update to the CRM with the required identity fields', async () => {
    const crm = { setMarketingConsent: async () => undefined };
    const consentLog = { append: sandbox.stub().resolves() };
    const crmMock = sandbox.mock(crm);
    crmMock
      .expects('setMarketingConsent')
      .once()
      .withExactArgs(
        sinon.match({
          userId: 'usr_42',
          email: 'qa@example.com',
          accepted: true,
          source: 'settings-page',
        })
      )
      .resolves();

    const result = await updateMarketingConsent(
      { crm, consentLog },
      { id: 'usr_42', email: 'qa@example.com' },
      true
    );

    expect(result.accepted).to.equal(true);
    sinon.assert.calledOnce(consentLog.append);
  });

  it('rejects incomplete users before touching CRM or consent log', async () => {
    const crm = { setMarketingConsent: sandbox.stub().resolves() };
    const consentLog = { append: sandbox.stub().resolves() };

    try {
      await updateMarketingConsent({ crm, consentLog }, { id: 'usr_42' }, false);
      throw new Error('expected validation failure');
    } catch (error) {
      expect(error.message).to.equal('user id and email are required');
    }

    sinon.assert.notCalled(crm.setMarketingConsent);
    sinon.assert.notCalled(consentLog.append);
  });
});

This is a justified mock because duplicate or malformed consent updates can create compliance and customer-preference risk. The exact interaction matters. By contrast, if you were testing a formatting helper that happens to call normalizeEmail internally, a mock would likely be too strict.

Design Rules for Resilient Sinon Architecture

Stub at process boundaries: network clients, queues, clocks, file systems, email, payment, identity, and analytics adapters.
Do not stub the function you are trying to test. If you must, the unit boundary is probably wrong.
Assert outcomes first, interactions second. Interaction assertions should explain a contract, not mirror implementation.
Prefer dependency injection for ESM code. Imported live bindings can be read-only, while object methods and adapter parameters are replaceable.
Keep one sandbox per test or suite boundary and restore it in afterEach.
Use withExactArgs only when extra fields would be harmful. Otherwise, sinon.match keeps tests robust to harmless payload growth.

These rules also help when combining Sinon with browser automation. Playwright and Cypress should usually validate user-visible effects, network routing, and browser integration. Sinon belongs closer to unit and component boundaries, where you can isolate collaborators without launching a full browser. Selenium suites can still benefit indirectly: many slow browser checks disappear when lower-level Sinon tests cover retry, formatting, queue, and adapter logic with more precise failure messages.

Troubleshooting and Debugging Common Sinon Failures

When Sinon tests fail, inspect the failure mode before changing assertions. Most issues fall into a small set of patterns.

Already wrapped method: A previous test stubbed the same method and did not restore it. Search for direct sinon.stub calls outside a sandbox and add afterEach restore.
Expected call never happened: Confirm the code path reached the collaborator. Add a result assertion or temporary spy on the branch guard before tightening the interaction assertion.
Stub returns undefined: The argument matcher may not match. Compare stub.getCalls().map(call => call.args) with the withArgs or withExactArgs configuration.
Async test finishes too early: Return or await the promise under test. A stubbed rejection after the test exits becomes an unhandled rejection instead of a useful assertion.
Fake timer test hangs: Advance timers with tickAsync when promises are involved and verify there are no pending timers after the edge case.
ESM export will not stub: Do not fight immutable module namespace objects. Wrap the dependency in an adapter object or pass it as a parameter.
Mock fails after harmless refactor: Replace the mock with a stub and assert the final state unless the call shape is truly a public contract.

Debugging tip: before changing production code, print or inspect fake.callCount, fake.getCalls(), fake.firstCall.args, fake.exceptions, and fake.returnValues. Sinon already captured the timeline; use it to decide whether the test setup, the branch condition, or the implementation is wrong.

Edge Cases and Gotchas Senior Testers Watch For

The sharp edges show up in real suites, not tutorials. Stubbing Date.now but not new Date creates split-brain time. Replacing a method on a shared singleton makes test order matter. Using calledWith on a mutable object can be misleading if production mutates the object after the call. Matching full payloads makes harmless metadata additions look like regressions. And using mocks everywhere creates tests that know more about the implementation than the user or API contract does.

There is also a module-system gotcha. CommonJS exports are often mutable object properties. ESM bindings are live and can be read-only from the importing module. Sinon cannot stub what JavaScript itself will not let you assign. The resilient architecture is to keep side-effecting services behind injectable adapters. That design improves testability and makes production dependencies clearer.

A Practical Sinon Review Checklist

Can the test fail for a real user-facing or contract-facing reason?
Does each fake have a clear owner and restore path?
Are stubs modeling realistic success, failure, and edge-case behavior?
Are mocks limited to interactions that would matter outside the implementation?
Are async failures awaited and timer queues fully advanced?
Would a harmless refactor break the assertion? If yes, assert a higher-level outcome.

Sinon is most valuable when it makes your test architecture more honest. A good spy tells you whether an observable collaboration happened. A good stub makes an unstable dependency deterministic while preserving the behavior under test. A good mock documents a strict contract and nothing more. Build around those boundaries and your JavaScript suite becomes easier to debug, less flaky, and more useful to the engineers who have to trust it during release pressure.

FAQs

The FAQ entries are provided to BlogArticle through the faqs prop so they can be rendered consistently and included in structured schema where supported.

Advanced Sinon.js: Architecting Resilient Test Spies, Stubs, and Mocks in Modern JS Suites

Stop treating spies, stubs, and mocks as interchangeable tricks and start using them as resilient design pressure for modern JavaScript suites.