What is the difference between a fixture file and a factory?

A fixture is a static snapshot loaded once before tests run. A factory builds rows on demand with sensible defaults. Factories scale; fixtures rot the moment your schema changes.

Do I really need Testcontainers if I already use docker-compose?

Compose is great for local dev. Testcontainers wins in CI because it gives each test process a private container, with health-gated startup and automatic cleanup via the Ryuk sidecar.

Can I run idempotent factories against a shared dev database?

Technically yes, but you lose isolation. Idempotency keeps writes safe to repeat; it does not stop other engineers from mutating the same rows mid-test. Use ephemeral containers in CI.

How do I keep factory data realistic without bloating my repo?

Generate values with Faker seeded by a fixed key per test. Realism comes from variety in shapes, not gigabytes of fixtures. Snapshot only what you need to diff in assertions.

When should I cache a container vs spin up a fresh one?

Reuse the container across tests in one process for speed, but truncate or rollback inside a transaction between cases. Spin a brand-new container only when schemas mutate.

Stop Wasting CI Credits: Building Idempotent Test Data Factories with Ephemeral Containers

It's 11pm in a flat in Valencia. You push a tiny refactor. CI starts. Twelve minutes later it fails — not your code, the seed script. A duplicate key value violates unique constraint error stares back at you from the logs. You click rerun. Twelve more minutes. Same error. You realize the previous job leaked rows into your "test" database because the cleanup hook never ran. Your weekly CI bill ticks up another few euros for nothing.

This is the silent tax of bad test data hygiene, and it scales nonlinearly with team size. Today we're leveling up. We're going to replace your fragile seed_test_data.sql with idempotent test data factories running against ephemeral containers. The result: tests you can run a hundred times in a row without manual intervention, CI runs that hold their wall-clock budget, and a billing line that stops creeping.

By the end of this post you'll have a production-ready factory module, a Testcontainers harness that survives reruns, a working comparison of the four patterns most teams use, and a debugging cheat sheet for when things still go sideways.

Why Does Your CI Bill Bleed When Tests Touch a Real Database?

Because most seed scripts are write-only and order-sensitive — any rerun trips a unique constraint and your suite quietly fails open or fails closed.

That's the short answer. The longer one starts with how CI providers actually charge. GitHub Actions, GitLab CI, CircleCI, and Buildkite all bill per minute of compute on hosted runners — the public pricing pages spell it out. Every retry, every "just rerun flaky" click, every cold-start of a 4 GB Postgres image is a billable minute. The Stack Overflow 2024 Developer Survey reported that more than half of professional developers use Docker as part of their daily workflow, which means most of us are already paying these minutes — we just don't see the bleed line by line.

In our experience, the three habits that quietly eat CI budgets are:

Non-idempotent seed scripts. Pure INSERT INTO ... lines that explode on the second run, forcing engineers to nuke the database and start over — sometimes by deleting the container and starting it again.
Shared mutable databases in CI. One Postgres for the whole pipeline. Tests pollute each other. Cleanup is a single TRUNCATE that ignores foreign keys and breaks silently.
Container cold-starts on every test file. Spinning a fresh Postgres container per test file feels "clean" but adds 3–8 seconds of startup × N files. On a 60-file suite that's an extra ~4 minutes per run.

The cure is two complementary ideas: write factories that are safe to call twice, and run them against containers that are safe to share within a test process. Get both right and the "rerun the pipeline" reflex stops being a daily ritual.

What Is an Idempotent Test Data Factory?

A function that returns the same logical row whether you call it once or a thousand times, keyed on a stable identifier with upsert semantics underneath.

Concretely: when you ask the factory for a user named alice@example.com, it either returns the existing row or creates one — never errors. Same for the accounts, posts, and webhooks that hang off it. The implementation underneath is usually PostgreSQL's INSERT ... ON CONFLICT (added in PostgreSQL 9.5, released January 2016), MySQL's INSERT ... ON DUPLICATE KEY UPDATE, or an SQLite INSERT OR REPLACE. The factory contract is: call me as many times as you need; I converge.

Let's start with what an actual factory looks like in Python — because if you can read Python you can translate this to any language you ship in.

Example 1: The Naive Factory That Will Bite You

Here's the pattern most teams write the first time. It looks clean, ships fast, and breaks on the second CI run.

# factories_v1_naive.py
# What NOT to do. Included for diagnostic value.
import psycopg
from dataclasses import dataclass


@dataclass
class User:
    id: int
    email: str
    name: str


def make_user(conn: psycopg.Connection, email: str, name: str) -> User:
    """Naive: a plain INSERT. Fine on a virgin DB; explodes on rerun."""
    with conn.cursor() as cur:
        cur.execute(
            "INSERT INTO users (email, name) VALUES (%s, %s) RETURNING id",
            (email, name),
        )
        row = cur.fetchone()
        if row is None:
            raise RuntimeError("insert returned no row")
        user_id = row[0]
    return User(id=user_id, email=email, name=name)


# Demo: this works once and only once.
if __name__ == "__main__":
    with psycopg.connect("postgresql://app:app@localhost:5432/app") as conn:
        # First call: creates row id=1.
        u1 = make_user(conn, "alice@example.com", "Alice")
        print("first call ok:", u1)
        # Second call in the same process — or same DB across reruns — raises:
        # psycopg.errors.UniqueViolation: duplicate key value violates
        #   unique constraint "users_email_key"
        try:
            u2 = make_user(conn, "alice@example.com", "Alice")
        except psycopg.errors.UniqueViolation as e:
            print("second call BROKE:", e.diag.message_primary)

Run this twice against the same database and the second call throws a UniqueViolation. In CI, that translates to: every developer who pushed before you left their alice row behind. The naive factory is fundamentally incompatible with shared or persistent databases. Every rerun becomes a coin flip.

The error gets worse with foreign keys. If your posts table requires a user_id and you wrap a half-finished test in a rollback, you can leave dangling FK errors that confuse subsequent test runs for hours.

Example 2: The Idempotent Factory That Saves You

Now the level-up version: an upsert-based factory with deterministic generation, foreign key support, and explicit error handling. This is the file you actually want in your repo.

# factories_v2_idempotent.py
# The level-up version. Safe to call any number of times.
from __future__ import annotations

import hashlib
from dataclasses import dataclass
from typing import Optional

import psycopg
from psycopg.rows import dict_row


@dataclass(frozen=True)
class User:
    id: int
    email: str
    name: str


@dataclass(frozen=True)
class Post:
    id: int
    user_id: int
    slug: str
    title: str


class FactoryError(RuntimeError):
    """Raised when a factory cannot converge — usually a schema mismatch."""


def _stable_int(seed: str, modulus: int = 10_000_000) -> int:
    """Deterministic integer from a stable seed. Useful for FK keys."""
    return int(hashlib.sha256(seed.encode()).hexdigest(), 16) % modulus


def make_user(
    conn: psycopg.Connection,
    email: str,
    name: Optional[str] = None,
) -> User:
    """Idempotent: upsert on email. Returning row guaranteed to exist."""
    name = name or email.split("@")[0].title()
    sql = """
        INSERT INTO users (email, name)
        VALUES (%(email)s, %(name)s)
        ON CONFLICT (email) DO UPDATE
          SET name = EXCLUDED.name
        RETURNING id, email, name
    """
    try:
        with conn.cursor(row_factory=dict_row) as cur:
            cur.execute(sql, {"email": email, "name": name})
            row = cur.fetchone()
    except psycopg.errors.UndefinedColumn as e:
        raise FactoryError(
            f"schema mismatch on users.email/name: {e}. "
            f"Did you forget to run migrations?"
        ) from e
    if row is None:
        raise FactoryError("upsert returned no row — broken unique index?")
    return User(id=row["id"], email=row["email"], name=row["name"])


def make_post(
    conn: psycopg.Connection,
    user: User,
    slug: str,
    title: Optional[str] = None,
) -> Post:
    """Idempotent post tied to a user. Composite uniqueness on (user_id, slug)."""
    title = title or slug.replace("-", " ").title()
    sql = """
        INSERT INTO posts (user_id, slug, title)
        VALUES (%(user_id)s, %(slug)s, %(title)s)
        ON CONFLICT (user_id, slug) DO UPDATE
          SET title = EXCLUDED.title
        RETURNING id, user_id, slug, title
    """
    with conn.cursor(row_factory=dict_row) as cur:
        cur.execute(sql, {"user_id": user.id, "slug": slug, "title": title})
        row = cur.fetchone()
    if row is None:
        raise FactoryError("upsert returned no row for post")
    return Post(id=row["id"], user_id=row["user_id"], slug=row["slug"], title=row["title"])


# Demo: run it twice. No errors. Same row IDs.
if __name__ == "__main__":
    with psycopg.connect("postgresql://app:app@localhost:5432/app") as conn:
        for run in (1, 2):
            alice = make_user(conn, "alice@example.com")
            hello = make_post(conn, alice, slug="hello-world")
            conn.commit()
            print(f"run {run}: user={alice.id} post={hello.id}")
        # Output (both runs identical):
        #   run 1: user=1 post=1
        #   run 2: user=1 post=1

Why this works: PostgreSQL's ON CONFLICT turns the operation into an atomic upsert backed by the unique index — there's no read-then-write race. The EXCLUDED pseudo-table refers to the row that would have been inserted, which is how you update fields like name without losing the existing id. The RETURNING clause gives you back the canonical row whether it was inserted or updated.

Two subtler choices matter. The composite (user_id, slug) conflict target means two users can each have a post with slug hello-world— exactly what production needs. The FactoryError wrapping turns cryptic Postgres errors (an undefined column, for instance) into a message that points at the actual fix ("run migrations"), which saves hours of head-scratching when the schema and the factory drift apart.

Example 3: Wrapping It in Testcontainers for Truly Ephemeral CI Runs

Idempotent factories are necessary but not sufficient. The other half is making sure each CI run starts from a known database state. That's where Testcontainers earns its place — and yes, Docker Inc. acquired AtomicJar (Testcontainers' core maintainer company) in December 2023, so the project has serious backing now.

The pattern below uses testcontainers-python with pytest, but the same idea applies to testcontainers-go, testcontainers-node, and the Java original. Note the savepoint pattern — it's what makes container reuse between tests safe.

# tests/conftest.py
# pytest harness: one Postgres container per pytest session, one
# transaction per test, automatic rollback. Combine with idempotent
# factories for full safety.
from __future__ import annotations

import logging
import os
from collections.abc import Generator
from pathlib import Path

import psycopg
import pytest
from testcontainers.postgres import PostgresContainer

log = logging.getLogger(__name__)

# Pin the image — floating tags break reproducibility.
PG_IMAGE = "postgres:16.4-alpine"

# Edge case: in CI, set TESTCONTAINERS_RYUK_DISABLED=true ONLY if your
# runner already cleans up containers (e.g. GitHub Actions ephemeral
# runners do). Locally always leave Ryuk enabled or you'll leak containers.
RYUK_DISABLED = os.environ.get("TESTCONTAINERS_RYUK_DISABLED", "false")


@pytest.fixture(scope="session")
def pg_container() -> Generator[PostgresContainer, None, None]:
    """One container for the whole test session. ~3s startup amortized."""
    container = (
        PostgresContainer(PG_IMAGE)
        .with_env("POSTGRES_INITDB_ARGS", "--data-checksums")
    )
    try:
        container.start()
    except Exception as e:
        # Common failure modes: Docker daemon not running, port collision,
        # image pull blocked by network policy.
        raise RuntimeError(
            f"could not start {PG_IMAGE}; is Docker running and reachable? "
            f"underlying error: {e}"
        ) from e

    # Run schema migrations once per session. Replace with your tool of choice.
    migrations_dir = Path(__file__).parent.parent / "migrations"
    with psycopg.connect(container.get_connection_url()) as conn:
        for path in sorted(migrations_dir.glob("*.sql")):
            log.info("applying migration %s", path.name)
            conn.execute(path.read_text())
        conn.commit()

    yield container
    container.stop()


@pytest.fixture()
def db(pg_container: PostgresContainer) -> Generator[psycopg.Connection, None, None]:
    """One transaction per test. Rollback gives perfect isolation."""
    conn = psycopg.connect(pg_container.get_connection_url(), autocommit=False)
    try:
        yield conn
    finally:
        # Critical: rollback even on test failure, so the next test
        # sees a virgin schema. Idempotent factories survive either way;
        # this just keeps assertion data clean.
        conn.rollback()
        conn.close()


# tests/test_user_flow.py
from factories_v2_idempotent import make_post, make_user


def test_user_can_publish_post(db: psycopg.Connection) -> None:
    alice = make_user(db, "alice@example.com")
    post = make_post(db, alice, slug="hello-world")
    assert post.user_id == alice.id
    # No teardown needed — the db fixture rolls back automatically.


def test_two_users_share_slug(db: psycopg.Connection) -> None:
    # Demonstrates the composite-unique behavior. Without idempotency,
    # this test would conflict with the previous one across reruns.
    alice = make_user(db, "alice@example.com")
    bob = make_user(db, "bob@example.com")
    p1 = make_post(db, alice, slug="welcome")
    p2 = make_post(db, bob, slug="welcome")
    assert p1.id != p2.id
    assert p1.slug == p2.slug == "welcome"

The three design choices here are worth lingering on. First, the container has scope="session"— one start per pytest invocation, not per test. A typical Postgres 16 alpine container cold-starts in roughly 2–4 seconds on a modern runner; multiply that by 60 tests and you save real time. Second, each test gets a fresh connection in a transaction that is always rolled back at the end. Combined with the savepoint behavior of Postgres, this gives you true per-test isolation without dropping and recreating tables. Third, Ryuk — Testcontainers' sidecar reaper container — guarantees cleanup if your process dies mid-suite, which is the failure mode that leaks containers on dev laptops.

Four Approaches to Test Data: Side-by-Side

If you're still weighing whether to invest in idempotent factories plus Testcontainers, this comparison should make the trade-offs crisp:

Approach	Setup cost	Rerun safe?	Isolation	Best for
Static SQL fixtures (`seed.sql`)	Low	No (INSERT conflicts)	Poor	Toy demos only
Naive factories (plain INSERT)	Low	No	Poor	Greenfield local dev
Idempotent factories on shared DB	Medium	Yes	Weak (other engineers mutate)	Small solo projects
Idempotent factories + Testcontainers	Medium	Yes	Strong (per-process)	Production CI suites

Notice that idempotent factories alone are a meaningful upgrade — even on a shared dev database, they make local reruns survivable. Pairing them with Testcontainers is what unlocks parallel CI shards without manual cleanup. That combination is the actual level-up.

Troubleshooting and Debugging

When this stack misbehaves, the symptoms are almost always one of these five categories. Here's how to diagnose each in under five minutes.

ON CONFLICT does nothing / silently inserts duplicates. Almost always a missing or misnamed unique index. Run \d+ users in psql and verify there's a UNIQUE constraint matching your ON CONFLICT target. Without one, Postgres raises 42P10: there is no unique or exclusion constraint matching the ON CONFLICT specification.
Testcontainers hangs on startup. Usually a host-side networking issue. Set TESTCONTAINERS_HOST_OVERRIDE=localhost when running inside another container (Docker-in-Docker, GitHub Actions matrix jobs). Watch docker logs $(docker ps -q -f ancestor=testcontainers/ryuk) for clues.
Rollback leaves rows behind. You committed inside a factory by accident — usually a stray conn.commit() in the factory body. Factories must never commit; the calling test owns the transaction boundary. Grep your factory module for commit calls.
Connection refused mid-suite. The container restarted because of OOM. Postgres on a 1 GB runner is tight. Bump runner memory, or set shared_buffers=128MB and max_connections=20 via .with_command(...) on the container.
Slow first test in CI. The image is being pulled. Pre-pull in a setup step (docker pull postgres:16.4-alpine) before tests start, or use a GitHub Actions cache action on ~/.docker. First-test latency drops from ~10s to ~2s in our experience.

Edge Cases and Gotchas

These are the things that will bite you between week 2 and week 8 of running this in production. File them away now and save the late-night debugging sessions.

Sequences keep climbing. ON CONFLICT still increments the sequence on a conflict path in Postgres — so your id column will skip values rapidly across reruns. Don't assert on absolute IDs in tests; assert on relationships (post.user_id == user.id) instead.
Partial unique indexes. If your unique constraint has a WHERE clause (e.g. only-active users), Postgres requires the matching index in the ON CONFLICT target via ON CONFLICT (email) WHERE deleted_at IS NULL. Get this wrong and the upsert silently becomes an insert.
Time-sensitive data. Factories that stamp created_at = now() on update path will rewrite the original timestamp every rerun. Use COALESCE(users.created_at, now()) in the DO UPDATE clause to preserve creation time.
Foreign key cascade in cleanup. Even with rollback, if a test commits mid-flow (some frameworks auto-commit migrations), cascade deletes can clobber rows another test relies on. Run migrations once per session, never inside a test.
Docker rate limits on hub.docker.com. Anonymous pulls are capped at 100 per 6 hours per IP. Hosted runners often share IPs. Authenticate the pull in CI even for public images — it's a one-line fix that prevents weird Friday-afternoon failures.
Faker seeding. Seeded fakers (Faker('es_ES').seed_instance(42)) only stay deterministic within a single process. Cross-process tests (Playwright + backend) will diverge unless you pass the seed explicitly.

The leveling-up mindset: a fixture is a noun; a factory is a verb. Fixtures decay because schemas change. Factories evolve with your code because they live next to it. Once you ship one well-built factory module, you wonder how you ever tested without it.

Where to Go From Here

You now have everything you need to retire your seed scripts: a naive baseline to learn from, an idempotent factory with proper error handling, a Testcontainers harness, a comparison table to convince your tech lead, and a list of gotchas. Start small. Pick the test file you rerun the most. Convert its setup to a factory. Watch the rerun click stop being a habit.

From here, the next moves stack naturally. Add a --reuse flag so local devs can keep one container across multiple pytest invocations. Build a snapshot/restore helper for the rare tests that genuinely need a precomputed dataset (analytics pipelines, mostly). Wire the factory module into your Playwright fixtures so end-to-end and integration tests share the same data layer. None of these require a platform team; they just require treating test data as a first-class module of your codebase.

Ship the first factory this week. Your future self — and every engineer in Barcelona, Madrid, Valencia, and Malaga who'll never have to click "rerun flaky" again — will quietly thank you on payday when the CI bill comes in flat.

Stop Wasting CI Credits: Building Idempotent Test Data Factories with Ephemeral Containers

Every flaky seed script is a small tax on your CI bill — let's stop paying it.