Stop Wasting CI Credits: Building Idempotent Test Data Factories with Ephemeral Containers
Every flaky seed script is a small tax on your CI bill — let's stop paying it.

It's 11pm in a flat in Valencia. You push a tiny refactor. CI starts. Twelve minutes later it fails — not your code, the seed script. A duplicate key value violates unique constraint error stares back at you from the logs. You click rerun. Twelve more minutes. Same error. You realize the previous job leaked rows into your "test" database because the cleanup hook never ran. Your weekly CI bill ticks up another few euros for nothing.
This is the silent tax of bad test data hygiene, and it scales nonlinearly with team size. Today we're leveling up. We're going to replace your fragile seed_test_data.sql with idempotent test data factories running against ephemeral containers. The result: tests you can run a hundred times in a row without manual intervention, CI runs that hold their wall-clock budget, and a billing line that stops creeping.
By the end of this post you'll have a production-ready factory module, a Testcontainers harness that survives reruns, a working comparison of the four patterns most teams use, and a debugging cheat sheet for when things still go sideways.
Why Does Your CI Bill Bleed When Tests Touch a Real Database?
Because most seed scripts are write-only and order-sensitive — any rerun trips a unique constraint and your suite quietly fails open or fails closed.
That's the short answer. The longer one starts with how CI providers actually charge. GitHub Actions, GitLab CI, CircleCI, and Buildkite all bill per minute of compute on hosted runners — the public pricing pages spell it out. Every retry, every "just rerun flaky" click, every cold-start of a 4 GB Postgres image is a billable minute. The Stack Overflow 2024 Developer Survey reported that more than half of professional developers use Docker as part of their daily workflow, which means most of us are already paying these minutes — we just don't see the bleed line by line.
In our experience, the three habits that quietly eat CI budgets are:
- Non-idempotent seed scripts. Pure
INSERT INTO ...lines that explode on the second run, forcing engineers to nuke the database and start over — sometimes by deleting the container and starting it again. - Shared mutable databases in CI. One Postgres for the whole pipeline. Tests pollute each other. Cleanup is a single
TRUNCATEthat ignores foreign keys and breaks silently. - Container cold-starts on every test file. Spinning a fresh Postgres container per test file feels "clean" but adds 3–8 seconds of startup × N files. On a 60-file suite that's an extra ~4 minutes per run.
The cure is two complementary ideas: write factories that are safe to call twice, and run them against containers that are safe to share within a test process. Get both right and the "rerun the pipeline" reflex stops being a daily ritual.
What Is an Idempotent Test Data Factory?
A function that returns the same logical row whether you call it once or a thousand times, keyed on a stable identifier with upsert semantics underneath.
Concretely: when you ask the factory for a user named alice@example.com, it either returns the existing row or creates one — never errors. Same for the accounts, posts, and webhooks that hang off it. The implementation underneath is usually PostgreSQL's INSERT ... ON CONFLICT (added in PostgreSQL 9.5, released January 2016), MySQL's INSERT ... ON DUPLICATE KEY UPDATE, or an SQLite INSERT OR REPLACE. The factory contract is: call me as many times as you need; I converge.
Let's start with what an actual factory looks like in Python — because if you can read Python you can translate this to any language you ship in.
Example 1: The Naive Factory That Will Bite You
Here's the pattern most teams write the first time. It looks clean, ships fast, and breaks on the second CI run.
# factories_v1_naive.py
# What NOT to do. Included for diagnostic value.
import psycopg
from dataclasses import dataclass
@dataclass
class User:
id: int
email: str
name: str
def make_user(conn: psycopg.Connection, email: str, name: str) -> User:
"""Naive: a plain INSERT. Fine on a virgin DB; explodes on rerun."""
with conn.cursor() as cur:
cur.execute(
"INSERT INTO users (email, name) VALUES (%s, %s) RETURNING id",
(email, name),
)
row = cur.fetchone()
if row is None:
raise RuntimeError("insert returned no row")
user_id = row[0]
return User(id=user_id, email=email, name=name)
# Demo: this works once and only once.
if __name__ == "__main__":
with psycopg.connect("postgresql://app:app@localhost:5432/app") as conn:
# First call: creates row id=1.
u1 = make_user(conn, "alice@example.com", "Alice")
print("first call ok:", u1)
# Second call in the same process — or same DB across reruns — raises:
# psycopg.errors.UniqueViolation: duplicate key value violates
# unique constraint "users_email_key"
try:
u2 = make_user(conn, "alice@example.com", "Alice")
except psycopg.errors.UniqueViolation as e:
print("second call BROKE:", e.diag.message_primary)Run this twice against the same database and the second call throws a UniqueViolation. In CI, that translates to: every developer who pushed before you left their alice row behind. The naive factory is fundamentally incompatible with shared or persistent databases. Every rerun becomes a coin flip.
The error gets worse with foreign keys. If your posts table requires a user_id and you wrap a half-finished test in a rollback, you can leave dangling FK errors that confuse subsequent test runs for hours.
Example 2: The Idempotent Factory That Saves You
Now the level-up version: an upsert-based factory with deterministic generation, foreign key support, and explicit error handling. This is the file you actually want in your repo.
# factories_v2_idempotent.py
# The level-up version. Safe to call any number of times.
from __future__ import annotations
import hashlib
from dataclasses import dataclass
from typing import Optional
import psycopg
from psycopg.rows import dict_row
@dataclass(frozen=True)
class User:
id: int
email: str
name: str
@dataclass(frozen=True)
class Post:
id: int
user_id: int
slug: str
title: str
class FactoryError(RuntimeError):
"""Raised when a factory cannot converge — usually a schema mismatch."""
def _stable_int(seed: str, modulus: int = 10_000_000) -> int:
"""Deterministic integer from a stable seed. Useful for FK keys."""
return int(hashlib.sha256(seed.encode()).hexdigest(), 16) % modulus
def make_user(
conn: psycopg.Connection,
email: str,
name: Optional[str] = None,
) -> User:
"""Idempotent: upsert on email. Returning row guaranteed to exist."""
name = name or email.split("@")[0].title()
sql = """
INSERT INTO users (email, name)
VALUES (%(email)s, %(name)s)
ON CONFLICT (email) DO UPDATE
SET name = EXCLUDED.name
RETURNING id, email, name
"""
try:
with conn.cursor(row_factory=dict_row) as cur:
cur.execute(sql, {"email": email, "name": name})
row = cur.fetchone()
except psycopg.errors.UndefinedColumn as e:
raise FactoryError(
f"schema mismatch on users.email/name: {e}. "
f"Did you forget to run migrations?"
) from e
if row is None:
raise FactoryError("upsert returned no row — broken unique index?")
return User(id=row["id"], email=row["email"], name=row["name"])
def make_post(
conn: psycopg.Connection,
user: User,
slug: str,
title: Optional[str] = None,
) -> Post:
"""Idempotent post tied to a user. Composite uniqueness on (user_id, slug)."""
title = title or slug.replace("-", " ").title()
sql = """
INSERT INTO posts (user_id, slug, title)
VALUES (%(user_id)s, %(slug)s, %(title)s)
ON CONFLICT (user_id, slug) DO UPDATE
SET title = EXCLUDED.title
RETURNING id, user_id, slug, title
"""
with conn.cursor(row_factory=dict_row) as cur:
cur.execute(sql, {"user_id": user.id, "slug": slug, "title": title})
row = cur.fetchone()
if row is None:
raise FactoryError("upsert returned no row for post")
return Post(id=row["id"], user_id=row["user_id"], slug=row["slug"], title=row["title"])
# Demo: run it twice. No errors. Same row IDs.
if __name__ == "__main__":
with psycopg.connect("postgresql://app:app@localhost:5432/app") as conn:
for run in (1, 2):
alice = make_user(conn, "alice@example.com")
hello = make_post(conn, alice, slug="hello-world")
conn.commit()
print(f"run {run}: user={alice.id} post={hello.id}")
# Output (both runs identical):
# run 1: user=1 post=1
# run 2: user=1 post=1Why this works: PostgreSQL's ON CONFLICT turns the operation into an atomic upsert backed by the unique index — there's no read-then-write race. The EXCLUDED pseudo-table refers to the row that would have been inserted, which is how you update fields like name without losing the existing id. The RETURNING clause gives you back the canonical row whether it was inserted or updated.
Two subtler choices matter. The composite (user_id, slug) conflict target means two users can each have a post with slug hello-world— exactly what production needs. The FactoryError wrapping turns cryptic Postgres errors (an undefined column, for instance) into a message that points at the actual fix ("run migrations"), which saves hours of head-scratching when the schema and the factory drift apart.
Example 3: Wrapping It in Testcontainers for Truly Ephemeral CI Runs
Idempotent factories are necessary but not sufficient. The other half is making sure each CI run starts from a known database state. That's where Testcontainers earns its place — and yes, Docker Inc. acquired AtomicJar (Testcontainers' core maintainer company) in December 2023, so the project has serious backing now.
The pattern below uses testcontainers-python with pytest, but the same idea applies to testcontainers-go, testcontainers-node, and the Java original. Note the savepoint pattern — it's what makes container reuse between tests safe.
# tests/conftest.py
# pytest harness: one Postgres container per pytest session, one
# transaction per test, automatic rollback. Combine with idempotent
# factories for full safety.
from __future__ import annotations
import logging
import os
from collections.abc import Generator
from pathlib import Path
import psycopg
import pytest
from testcontainers.postgres import PostgresContainer
log = logging.getLogger(__name__)
# Pin the image — floating tags break reproducibility.
PG_IMAGE = "postgres:16.4-alpine"
# Edge case: in CI, set TESTCONTAINERS_RYUK_DISABLED=true ONLY if your
# runner already cleans up containers (e.g. GitHub Actions ephemeral
# runners do). Locally always leave Ryuk enabled or you'll leak containers.
RYUK_DISABLED = os.environ.get("TESTCONTAINERS_RYUK_DISABLED", "false")
@pytest.fixture(scope="session")
def pg_container() -> Generator[PostgresContainer, None, None]:
"""One container for the whole test session. ~3s startup amortized."""
container = (
PostgresContainer(PG_IMAGE)
.with_env("POSTGRES_INITDB_ARGS", "--data-checksums")
)
try:
container.start()
except Exception as e:
# Common failure modes: Docker daemon not running, port collision,
# image pull blocked by network policy.
raise RuntimeError(
f"could not start {PG_IMAGE}; is Docker running and reachable? "
f"underlying error: {e}"
) from e
# Run schema migrations once per session. Replace with your tool of choice.
migrations_dir = Path(__file__).parent.parent / "migrations"
with psycopg.connect(container.get_connection_url()) as conn:
for path in sorted(migrations_dir.glob("*.sql")):
log.info("applying migration %s", path.name)
conn.execute(path.read_text())
conn.commit()
yield container
container.stop()
@pytest.fixture()
def db(pg_container: PostgresContainer) -> Generator[psycopg.Connection, None, None]:
"""One transaction per test. Rollback gives perfect isolation."""
conn = psycopg.connect(pg_container.get_connection_url(), autocommit=False)
try:
yield conn
finally:
# Critical: rollback even on test failure, so the next test
# sees a virgin schema. Idempotent factories survive either way;
# this just keeps assertion data clean.
conn.rollback()
conn.close()
# tests/test_user_flow.py
from factories_v2_idempotent import make_post, make_user
def test_user_can_publish_post(db: psycopg.Connection) -> None:
alice = make_user(db, "alice@example.com")
post = make_post(db, alice, slug="hello-world")
assert post.user_id == alice.id
# No teardown needed — the db fixture rolls back automatically.
def test_two_users_share_slug(db: psycopg.Connection) -> None:
# Demonstrates the composite-unique behavior. Without idempotency,
# this test would conflict with the previous one across reruns.
alice = make_user(db, "alice@example.com")
bob = make_user(db, "bob@example.com")
p1 = make_post(db, alice, slug="welcome")
p2 = make_post(db, bob, slug="welcome")
assert p1.id != p2.id
assert p1.slug == p2.slug == "welcome"The three design choices here are worth lingering on. First, the container has scope="session"— one start per pytest invocation, not per test. A typical Postgres 16 alpine container cold-starts in roughly 2–4 seconds on a modern runner; multiply that by 60 tests and you save real time. Second, each test gets a fresh connection in a transaction that is always rolled back at the end. Combined with the savepoint behavior of Postgres, this gives you true per-test isolation without dropping and recreating tables. Third, Ryuk — Testcontainers' sidecar reaper container — guarantees cleanup if your process dies mid-suite, which is the failure mode that leaks containers on dev laptops.
Four Approaches to Test Data: Side-by-Side
If you're still weighing whether to invest in idempotent factories plus Testcontainers, this comparison should make the trade-offs crisp:
| Approach | Setup cost | Rerun safe? | Isolation | Best for |
|---|---|---|---|---|
Static SQL fixtures (seed.sql) | Low | No (INSERT conflicts) | Poor | Toy demos only |
| Naive factories (plain INSERT) | Low | No | Poor | Greenfield local dev |
| Idempotent factories on shared DB | Medium | Yes | Weak (other engineers mutate) | Small solo projects |
| Idempotent factories + Testcontainers | Medium | Yes | Strong (per-process) | Production CI suites |
Notice that idempotent factories alone are a meaningful upgrade — even on a shared dev database, they make local reruns survivable. Pairing them with Testcontainers is what unlocks parallel CI shards without manual cleanup. That combination is the actual level-up.
Troubleshooting and Debugging
When this stack misbehaves, the symptoms are almost always one of these five categories. Here's how to diagnose each in under five minutes.
- ON CONFLICT does nothing / silently inserts duplicates. Almost always a missing or misnamed unique index. Run
\d+ usersin psql and verify there's a UNIQUE constraint matching your ON CONFLICT target. Without one, Postgres raises42P10: there is no unique or exclusion constraint matching the ON CONFLICT specification. - Testcontainers hangs on startup. Usually a host-side networking issue. Set
TESTCONTAINERS_HOST_OVERRIDE=localhostwhen running inside another container (Docker-in-Docker, GitHub Actions matrix jobs). Watchdocker logs $(docker ps -q -f ancestor=testcontainers/ryuk)for clues. - Rollback leaves rows behind. You committed inside a factory by accident — usually a stray
conn.commit()in the factory body. Factories must never commit; the calling test owns the transaction boundary. Grep your factory module for commit calls. - Connection refused mid-suite. The container restarted because of OOM. Postgres on a 1 GB runner is tight. Bump runner memory, or set
shared_buffers=128MBandmax_connections=20via.with_command(...)on the container. - Slow first test in CI. The image is being pulled. Pre-pull in a setup step (
docker pull postgres:16.4-alpine) before tests start, or use a GitHub Actions cache action on~/.docker. First-test latency drops from ~10s to ~2s in our experience.
Edge Cases and Gotchas
These are the things that will bite you between week 2 and week 8 of running this in production. File them away now and save the late-night debugging sessions.
- Sequences keep climbing. ON CONFLICT still increments the sequence on a conflict path in Postgres — so your
idcolumn will skip values rapidly across reruns. Don't assert on absolute IDs in tests; assert on relationships (post.user_id == user.id) instead. - Partial unique indexes. If your unique constraint has a
WHEREclause (e.g. only-active users), Postgres requires the matching index in the ON CONFLICT target viaON CONFLICT (email) WHERE deleted_at IS NULL. Get this wrong and the upsert silently becomes an insert. - Time-sensitive data. Factories that stamp
created_at = now()on update path will rewrite the original timestamp every rerun. UseCOALESCE(users.created_at, now())in the DO UPDATE clause to preserve creation time. - Foreign key cascade in cleanup. Even with rollback, if a test commits mid-flow (some frameworks auto-commit migrations), cascade deletes can clobber rows another test relies on. Run migrations once per session, never inside a test.
- Docker rate limits on hub.docker.com. Anonymous pulls are capped at 100 per 6 hours per IP. Hosted runners often share IPs. Authenticate the pull in CI even for public images — it's a one-line fix that prevents weird Friday-afternoon failures.
- Faker seeding. Seeded fakers (
Faker('es_ES').seed_instance(42)) only stay deterministic within a single process. Cross-process tests (Playwright + backend) will diverge unless you pass the seed explicitly.
The leveling-up mindset: a fixture is a noun; a factory is a verb. Fixtures decay because schemas change. Factories evolve with your code because they live next to it. Once you ship one well-built factory module, you wonder how you ever tested without it.
Where to Go From Here
You now have everything you need to retire your seed scripts: a naive baseline to learn from, an idempotent factory with proper error handling, a Testcontainers harness, a comparison table to convince your tech lead, and a list of gotchas. Start small. Pick the test file you rerun the most. Convert its setup to a factory. Watch the rerun click stop being a habit.
From here, the next moves stack naturally. Add a --reuse flag so local devs can keep one container across multiple pytest invocations. Build a snapshot/restore helper for the rare tests that genuinely need a precomputed dataset (analytics pipelines, mostly). Wire the factory module into your Playwright fixtures so end-to-end and integration tests share the same data layer. None of these require a platform team; they just require treating test data as a first-class module of your codebase.
Ship the first factory this week. Your future self — and every engineer in Barcelona, Madrid, Valencia, and Malaga who'll never have to click "rerun flaky" again — will quietly thank you on payday when the CI bill comes in flat.
Ready to level up your dev toolkit?
Desplega.ai helps developers transition to professional tools smoothly — let us help you ship faster with test infrastructure that actually scales.
Get StartedFrequently Asked Questions
What is the difference between a fixture file and a factory?
A fixture is a static snapshot loaded once before tests run. A factory builds rows on demand with sensible defaults. Factories scale; fixtures rot the moment your schema changes.
Do I really need Testcontainers if I already use docker-compose?
Compose is great for local dev. Testcontainers wins in CI because it gives each test process a private container, with health-gated startup and automatic cleanup via the Ryuk sidecar.
Can I run idempotent factories against a shared dev database?
Technically yes, but you lose isolation. Idempotency keeps writes safe to repeat; it does not stop other engineers from mutating the same rows mid-test. Use ephemeral containers in CI.
How do I keep factory data realistic without bloating my repo?
Generate values with Faker seeded by a fixed key per test. Realism comes from variety in shapes, not gigabytes of fixtures. Snapshot only what you need to diff in assertions.
When should I cache a container vs spin up a fresh one?
Reuse the container across tests in one process for speed, but truncate or rollback inside a transaction between cases. Spin a brand-new container only when schemas mutate.
Related Posts
Hot Module Replacement: Why Your Dev Server Restarts Are Killing Your Flow State | desplega.ai
Stop losing 2-3 hours daily to dev server restarts. Master HMR configuration in Vite and Next.js to maintain flow state, preserve component state, and boost coding velocity by 80%.
The Flaky Test Tax: Why Your Engineering Team is Secretly Burning Cash | desplega.ai
Discover how flaky tests create a hidden operational tax that costs CTOs millions in wasted compute, developer time, and delayed releases. Calculate your flakiness cost today.
The QA Death Spiral: When Your Test Suite Becomes Your Product | desplega.ai
An executive guide to recognizing when quality initiatives consume engineering capacity. Learn to identify test suite bloat, balance coverage vs velocity, and implement pragmatic quality gates.