Beyond Localhost: Architecting Ephemeral Test Environments with Testcontainers and K8s
Your shared staging server is the bottleneck. Replace it with disposable infrastructure that spins up per pull request and self-destructs when the tests finish.

You started where most of us started: docker-compose up, a Postgres container, maybe a Redis, and a folder of integration tests that "pass on my machine." For a weekend project, that setup is enough. For a team merging dozens of pull requests a day, it is the slowest, flakiest part of your engineering org — and you have not noticed yet because the pain is distributed across everyone who is waiting on green.
This guide is for the vibe coder ready to make the jump. We are going to walk through three production patterns for spinning up disposable test environments — first with Testcontainers at the service level, then with per-PR Kubernetes namespaces for full-stack confidence, and finally with vcluster when namespaces stop being enough. Every example is runnable, every example handles failure modes you will actually hit, and we close with a debugging section for the days when your "self-cleaning" environment forgets to clean itself.
If you want a complementary read on the test side of the same problem, the deep dive on Playwright vs Cypress parallelization covers the worker and shard layer that sits on top of these environments.
Why does shared staging keep breaking your tests?
Shared staging breaks because every commit and every test run mutates the same long-lived state. Isolation, not retries, is the durable fix.
Picture the typical "staging" server. One Postgres. One Redis. One deployed copy of every service. Five developers push branches at the same time. Two of them ran a script that seeded users with overlapping IDs. CI fires, integration tests pass for the first branch, fail for the second, and pass again on retry once the first merge has cleared. Nobody can reproduce the failure locally because the failure is not in the code — it is in the shared state.
Kubernetes is the most-used container orchestration platform among respondents to the CNCF Annual Survey, and Docker is consistently one of the most-used developer tools in the Stack Overflow Developer Survey. The interesting question is not whether to use them, but where. The answer in this post: use containers and Kubernetes to give every test, every PR, every run its own infrastructure — born, used, and destroyed inside a single CI job.
Pattern 1: Testcontainers for Per-Test Service Isolation
Testcontainers is the lowest-friction step up from docker-compose. Instead of declaring services in a YAML file and hoping you remember to docker-compose down, you instantiate them programmatically in the test code itself. The container lifecycle is bound to the test lifecycle. If the test crashes, the container is reaped by the Testcontainers Ryuk sidecar — a small Moby container the library starts automatically that watches for orphaned resources and removes them when the parent process exits.
The example below is what we ship for a Node.js service that talks to Postgres. Note the failure handling: the container start can hang on a slow image pull, the database can take longer than the default health check window, and the test runner can be killed by CI mid-test.
// tests/integration/users.spec.ts
import { PostgreSqlContainer, type StartedPostgreSqlContainer } from '@testcontainers/postgresql';
import { Wait } from 'testcontainers';
import { Client } from 'pg';
import { migrate } from '../../src/db/migrate';
import { UserRepo } from '../../src/repo/user';
// 5 minutes: image pulls on a cold CI runner can be slow.
// Without explicit jest timeout, you get an opaque "exceeded 5000ms".
jest.setTimeout(5 * 60 * 1000);
describe('UserRepo (integration)', () => {
let pg: StartedPostgreSqlContainer;
let client: Client;
beforeAll(async () => {
pg = await new PostgreSqlContainer('postgres:16-alpine')
.withDatabase('app_test')
.withUsername('app')
.withPassword('app')
// Don't rely on the default "ready" log line. The pg image logs
// "ready to accept connections" twice during startup — the FIRST
// time it's the init script, not the real server. The second is
// what you want. We use a SQL-level probe instead.
.withWaitStrategy(Wait.forSuccessfulCommand('pg_isready -U app -d app_test'))
.withStartupTimeout(120_000)
.start();
client = new Client({ connectionString: pg.getConnectionUri() });
await client.connect();
await migrate(client);
});
afterAll(async () => {
// Order matters. Disconnect the client BEFORE stopping the container,
// or you'll see "Connection terminated unexpectedly" noise in logs.
try {
await client?.end();
} finally {
await pg?.stop({ timeout: 10_000 });
}
});
it('creates and finds a user', async () => {
const repo = new UserRepo(client);
const user = await repo.create({ email: 'lola@example.com' });
const found = await repo.findById(user.id);
expect(found?.email).toBe('lola@example.com');
});
// Edge case: simulate a connection dropping mid-transaction.
// Without this, you'll never catch the bug where your repo
// swallows the "connection terminated" error and returns null.
it('surfaces transient connection errors', async () => {
const repo = new UserRepo(client);
const original = client.query.bind(client);
client.query = (async () => {
throw Object.assign(new Error('Connection terminated unexpectedly'), {
code: '57P01',
});
}) as typeof client.query;
await expect(repo.create({ email: 'x@x' })).rejects.toThrow(/terminated/);
client.query = original;
});
});Three things are worth understanding about what is happening under the hood here. First, Wait.forSuccessfulCommand executes inside the container via docker exec. The Testcontainers docs note that wait strategies are critical for stability — without one, the library waits for the container to enter a "running" state, which for Postgres happens long before the database is actually accepting queries. Second, the Ryuk reaper is the magic that makes the "ephemeral" promise hold even when afterAll never runs (because Jest was kill -9d by GitHub Actions). Third, image pulls dominate cold-start latency. In our experience, layer caching on the CI host — either through self-hosted runners or a registry mirror — is the single highest-leverage optimization you can make.
Pattern 2: Per-PR Kubernetes Namespace with Helm and TTL
Testcontainers solves the service-level isolation problem. It does not solve the "does my whole stack work together" problem. Microservices with their own ingress, network policies, and inter- service calls need an actual cluster slice. The standard pattern is to create a fresh namespace per pull request, deploy the full chart into it, run end-to-end tests against it, then garbage-collect.
The script below runs inside a GitHub Actions job. It assumes kubectl and helm are pre-installed and that a service account with create/delete on namespaces is already bound. The shape generalizes to GitLab CI or any other runner. Notice the trap-based cleanup — without it, a CI timeout leaves the namespace orphaned and costs you money.
#!/usr/bin/env bash
# .ci/ephemeral-env.sh
# Usage: ./ephemeral-env.sh <pr-number> <image-tag>
# Run e2e tests against a fresh namespace, then tear it down.
set -euo pipefail
PR="${1:?missing PR number}"
TAG="${2:?missing image tag}"
NS="pr-${PR}"
TTL_SECONDS="${TTL_SECONDS:-1800}" # 30 min default
# Cleanup runs even if the test step fails, the runner times out, or
# someone hits "Cancel workflow." trap on EXIT covers all three.
cleanup() {
local rc=$?
echo "::group::Cleanup namespace ${NS}"
# --wait=false avoids blocking CI on slow finalizers (e.g. PVCs with
# delete-protect). The TTL controller will sweep stragglers.
kubectl delete namespace "${NS}" --wait=false --ignore-not-found
echo "::endgroup::"
exit $rc
}
trap cleanup EXIT
echo "Creating namespace ${NS} with TTL ${TTL_SECONDS}s"
kubectl create namespace "${NS}"
# The TTL annotation is picked up by a cluster-side controller (e.g.
# kube-janitor or a tiny operator) that GCs namespaces past their
# deadline. This is the safety net for the day your trap fails too.
kubectl annotate namespace "${NS}" \
"janitor/ttl=${TTL_SECONDS}s" \
"pr=${PR}" \
"managed-by=ci"
# Helm install with --atomic + --wait: if any resource fails to come
# up, helm rolls back so we don't leak partial deploys. Without
# --atomic you get half-deployed namespaces that pass smoke and fail
# spec tests for confusing reasons.
helm upgrade --install "app" ./charts/app \
--namespace "${NS}" \
--set image.tag="${TAG}" \
--set ingress.host="${NS}.preview.example.com" \
--atomic \
--wait \
--timeout 8m
# Wait for the deployment to be actually ready. Helm --wait checks
# rollout status, but readiness probes can lie for the first second
# while the pod starts accepting connections. Add a hard probe.
kubectl -n "${NS}" rollout status deployment/app --timeout=5m
# Run e2e suite pointed at the unique URL.
BASE_URL="https://${NS}.preview.example.com" \
npm run test:e2e -- --reporter=junit
echo "E2E passed for PR ${PR}"Two protocol-level details make this work. First, helm --atomic wraps the install in a transaction that rolls back if any object fails to converge — Helm v3 tracks owned resources via release secrets in the namespace, so cleanup on failure is symmetric with cleanup on success. Second, the TTL annotation is only as good as the controller that honors it. Without something like kube-janitor, k8s-cleaner, or your own small operator, the annotation is metadata that nobody reads. Always pair the trap with a server-side garbage collector — your CI will time out, crash, or get cancelled, and on that day the trap will not run.
A gotcha worth flagging: Kubernetes namespace names must match the DNS-1123 label spec (lowercase alphanumeric and hyphens, up to 63 characters). Pull request numbers are safe; pull request titles or branch names are not. If you derive the namespace from anything that is not strictly numeric, slugify it ruthlessly.
Pattern 3: vcluster for Cluster-Scoped Isolation
Namespaces are a strong soft boundary, but they are not the whole cluster. Custom Resource Definitions, ClusterRoles, validating webhooks, and operators are cluster-scoped — and the moment two PRs need different versions of a CRD or two tests need different webhook configs, namespaces stop being enough. vcluster (from Loft Labs) addresses this by running a complete, lightweight Kubernetes control plane inside a namespace of your host cluster. To the tests, it looks like a real cluster. To your platform team, it is a few pods.
The example below boots a vcluster per CI job, applies the chart inside it, and tears it down. The interesting part is the kubeconfig dance — you want subsequent commands to target the vcluster, not the host.
#!/usr/bin/env bash
# .ci/vcluster-env.sh
# Spin up an isolated virtual cluster, run tests, destroy.
set -euo pipefail
JOB_ID="${GITHUB_RUN_ID:?must run inside GitHub Actions}"
VC_NAME="ci-${JOB_ID}"
HOST_NS="vclusters"
KUBECONFIG_OUT="$(mktemp)"
cleanup() {
local rc=$?
# Order matters: delete the vcluster CR first so its controllers
# can finalize PVCs and services before the host namespace goes.
vcluster delete "${VC_NAME}" --namespace "${HOST_NS}" || true
rm -f "${KUBECONFIG_OUT}"
exit $rc
}
trap cleanup EXIT
# Create the vcluster. --connect=false because we'll wire up
# kubeconfig manually — the default port-forward is fine locally
# but flaky in CI where the runner network is sometimes restricted.
vcluster create "${VC_NAME}" \
--namespace "${HOST_NS}" \
--connect=false \
--upgrade \
--set "syncer.extraArgs={--tls-san=${VC_NAME}.${HOST_NS}.svc}"
# Wait until the vcluster API server is reachable. The vcluster
# create command returns when the StatefulSet is scheduled, NOT
# when the API answers. Poll the readiness directly.
for i in {1..60}; do
if kubectl -n "${HOST_NS}" exec "${VC_NAME}-0" -- \
kubectl get --raw=/readyz >/dev/null 2>&1; then
echo "vcluster ready after ${i}s"
break
fi
if [ "$i" -eq 60 ]; then
echo "vcluster failed to become ready in 60s" >&2
kubectl -n "${HOST_NS}" describe pod "${VC_NAME}-0" >&2
kubectl -n "${HOST_NS}" logs "${VC_NAME}-0" -c syncer --tail=200 >&2
exit 1
fi
sleep 1
done
# Export the kubeconfig for downstream steps.
vcluster connect "${VC_NAME}" \
--namespace "${HOST_NS}" \
--print > "${KUBECONFIG_OUT}"
export KUBECONFIG="${KUBECONFIG_OUT}"
# Now every kubectl/helm command targets the virtual cluster.
kubectl apply -f ./crds/
helm install app ./charts/app --wait --timeout 5m
npm run test:e2e -- --reporter=junitWhy this works: vcluster does not actually run a full kubelet or its own node pool. It runs a synced control plane — its own kube-apiserver, scheduler, and controller-manager — and a syncer component that mirrors resources from the virtual namespace into the host namespace where they actually execute. Pods scheduled by the virtual scheduler land in the host cluster as regular pods (with rewritten names) and are run by the host kubelet. That detail explains both the upside (cheap, fast spin-up) and the gotcha: anything that requires real node-level isolation — kernel modules, real CRI sandboxes, exotic CNI plugins — still leaks across vclusters because they share the host nodes.
Comparison: Picking the Right Pattern
These three patterns are not competitors. They cover different isolation levels and different cost profiles. The table below maps the trade-offs we walk through with teams.
| Pattern | Isolation Boundary | Spin-up Time | Best For | Watch Out For |
|---|---|---|---|---|
| docker-compose (the baseline) | Project network on one host | Seconds (cached) | Local dev, single-service work | Shared volumes, port conflicts in CI |
| Testcontainers | Per-test (or per-suite) container | Seconds–tens of seconds | Integration tests, repo-local CI | Docker-in-Docker, Ryuk on rootless |
| K8s namespace per PR | Namespace + RBAC + NetworkPolicy | 1–5 minutes | E2E across multiple services | Cluster-scoped resources, TTL drift |
| vcluster | Virtual control plane | 30s–2 minutes | Operators, CRDs, cluster-admin tests | Host node leakage, syncer quirks |
Troubleshooting and Debugging Ephemeral Environments
Ephemeral environments fail in predictable ways. Here are the symptoms we see most often when onboarding teams and how to diagnose them.
- Testcontainers hangs at "Waiting for container to be ready". Almost always a wait strategy mismatch. Run
docker logs <container-id>while the test is stuck — if the service is running but the log line the library is watching for never appears (e.g. you bumped Postgres major and the log format changed), switch to a SQL or HTTP probe viaWait.forSuccessfulCommandorWait.forHttp. - Ryuk fails to start on rootless Docker or Podman. The reaper container needs access to the Docker socket. On rootless setups, mount the user-scoped socket explicitly and set
TESTCONTAINERS_RYUK_DISABLED=trueonly as a last resort — disabling Ryuk means orphaned containers if your test runner dies. - Namespace stuck in "Terminating". A finalizer is blocking. Run
kubectl get ns <ns> -o yamland inspect thefinalizersfield. Most often it is a custom resource whose controller is offline. Patch the CR to remove its finalizer before forcing the namespace. - helm --atomic rolled back but resources remain. Helm only owns resources it created via the release. Anything deployed out-of-band (e.g. by an operator that watches the namespace) needs separate cleanup. This is why a namespace-scoped TTL controller is non-negotiable.
- vcluster pods scheduled but CrashLoopBackOff inside the vcluster. Check both the virtual pod log and the host pod log — the underlying host pod has the real container output. Image pull secrets, in particular, do not auto-propagate from the virtual namespace to the host; you have to copy them via vcluster syncer config.
- Cold image pulls dominate CI wall-clock. Anecdotally, most slow ephemeral-environment pipelines are 60–80% image-pull time. Cache layers on the runner, use a registry mirror inside the cluster, or pre-warm a base image with the heavy layers already present.
How do I clean up environments when CI cancels mid-run?
Defense-in-depth: a bash trap EXIT for the happy path, plus a TTL or janitor controller in-cluster that sweeps stragglers when the runner dies hard.
The trap covers Ctrl-C and most failure modes. It does not cover the runner VM being killed by the cloud provider, the agent process being OOM-killed, or someone hitting "Cancel workflow" in a way that delivers SIGKILL. For those cases you need a controller in the cluster that periodically deletes namespaces whose janitor/ttl annotation has expired. Two production-grade options we have used: kube-janitor (which honors a configurable TTL annotation) and writing a small custom operator. Either way, the rule is: never trust a single cleanup mechanism. Always have a backup sweeper.
Edge Cases and Gotchas Worth Naming
- Container image lifetime vs Postgres data lifetime. With Testcontainers, a per-test container means a fresh database every test. Migrations run on every start. If your migration suite takes minutes, your tests are now minutes longer. Consider a per-suite (
beforeAll) container, plus a fastTRUNCATE-based reset between tests. - DNS inside ephemeral namespaces. If your services hardcode hostnames like
my-api.staging.internal, they will not resolve inside a per-PR namespace. Usekubernetes.io/service-name-based DNS (i.e.my-api, which resolves within the namespace) or pass the resolved URL via env at deploy time. - Resource requests, not just limits. A namespace with no requests gets scheduled wherever the scheduler feels like. A namespace with no limits can OOM neighbors on the host. Set both, and set a
ResourceQuotaon the namespace as a guardrail. - Secrets in ephemeral environments. Do not bake long-lived secrets into preview environments. Use short-lived credentials minted by your CI for the duration of the run, and delete them in the same trap that deletes the namespace. See our how-to on secrets in CI test environments for a worked example.
- Concurrent PR namespaces and the IP pool. Every namespace with an Ingress consumes an IP allocation (in the LB pool, the cluster CIDR, or both). At 50+ concurrent previews, plan capacity or use path-based routing under a single LB.
The Mindset Shift: Infrastructure Is Code Is Test Data
The deepest change here is not a library or a CRD. It is the realization that infrastructure is part of your test data. A test that depends on shared staging is implicitly asserting properties of a server that nobody on the team controls. A test that spins up its own Postgres, its own namespace, its own virtual cluster, asserts properties of code you can read in the diff. That difference compounds: deterministic environments make deterministic failures, which make fixable bugs.
The leveling-up path is well-trodden: start with Testcontainers in the service that hurts the most (probably the one with the most flakes and the most database dependence). Add per-PR namespaces when integration starts crossing service boundaries. Reach for vcluster only when operators or CRDs make namespaces insufficient. Each step builds on the previous one, and each step retires a category of "works on my machine" bug for good.
You do not need to migrate the entire org at once. Pick one painful test file. Convert it to Testcontainers. Watch the "why did Alice's PR break Bob's tests" tickets drop. That is the moment you stop being a vibe coder fighting infra and start being an engineer who shapes it.
Ready to level up your dev toolkit?
Desplega.ai helps developers transition to professional tools smoothly with end-to-end testing and ephemeral environments built for modern teams.
Get StartedFrequently Asked Questions
When should I move from docker-compose to Testcontainers?
When integration tests start touching real-world services and need to run in CI without shared infra. Testcontainers gives per-test lifecycle, auto-cleanup, and prod-image parity.
Do I really need Kubernetes for ephemeral environments?
Only when you need full-stack confidence — multiple services, real ingress, network policies. For service-level tests, Testcontainers is enough and cheaper to operate than a per-PR cluster.
How expensive are per-PR Kubernetes namespaces?
Cheaper than permanent staging if you tear them down fast. The trick is requests/limits and TTL controllers — a namespace that lives 30 minutes costs roughly the resources of those 30 minutes.
What is vcluster and when do I reach for it?
vcluster runs a virtual Kubernetes control plane inside a namespace. Reach for it when CRDs, RBAC, or cluster-scoped resources clash between tests, or when teams need admin without blast radius.
Will my tests be slower with ephemeral environments?
First run is slower because of image pulls and provisioning. Steady state, image caching and parallel namespaces usually beat a shared staging that everyone queues against. Cache aggressively.
Related Posts
Hot Module Replacement: Why Your Dev Server Restarts Are Killing Your Flow State | desplega.ai
Stop losing 2-3 hours daily to dev server restarts. Master HMR configuration in Vite and Next.js to maintain flow state, preserve component state, and boost coding velocity by 80%.
The Flaky Test Tax: Why Your Engineering Team is Secretly Burning Cash | desplega.ai
Discover how flaky tests create a hidden operational tax that costs CTOs millions in wasted compute, developer time, and delayed releases. Calculate your flakiness cost today.
The QA Death Spiral: When Your Test Suite Becomes Your Product | desplega.ai
An executive guide to recognizing when quality initiatives consume engineering capacity. Learn to identify test suite bloat, balance coverage vs velocity, and implement pragmatic quality gates.