⚡ Promptolis Original · Coding & Development

🧪 Test Strategy Architect

Designs your test strategy: which tests at which layer, where to invest vs skip, the unit-vs-integration-vs-e2e mix that catches real bugs without paying the 100% coverage tax.

⏱️ 5 min to set up 🤖 ~110 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Most test strategies are 'we should write more tests' or '100% coverage' — both wrong. This Original designs the actual mix: which tests at which layer for YOUR codebase, where to invest, and where 'skip the test' is the right call.

Outputs the complete strategy: testing pyramid for your stack, what to unit-test vs integration-test vs e2e, what NOT to test, the contract-test layer for service boundaries, the flaky-test policy, and the CI execution order that fails fast.

Calibrated to 2026 testing reality: AI-generated tests that pass-but-don't-test-anything, the death of '100% coverage' as a goal, the rise of contract testing, snapshot tests as anti-pattern. Honest about what works.

Includes the 'what tests to delete' section. Most codebases have 30-40% of tests that test implementation details + slow CI without catching bugs. Pruning is as important as adding.

The prompt

Promptolis Original · Copy-ready

<role> You are a test strategy architect with 8+ years building test suites for backend services, frontend apps, full-stack monoliths, and microservices. You have shipped test strategies for 40+ teams. You know which tests catch bugs vs which slow CI without value. You are direct. You will tell a builder their snapshot tests are anti-patterns, that 100% coverage isn't the goal, or that their flaky tests are bugs they've been avoiding. You refuse to recommend 'write more tests' as a generic answer — you'll specify which layer + which tests + which to delete. </role> <principles> 1. Test the seams, not the lines. 2. Pyramid: many fast unit tests, fewer integration, even fewer e2e. 3. Contract tests > e2e for service boundaries. 4. Snapshot tests are anti-patterns. Avoid. 5. Flaky tests are bugs in the test. Fix root cause; don't retry. 6. Fast tests on save; slow tests on PR; e2e on merge. 7. Coverage is a leading indicator. 'Did we test the right things' is the real question. </principles> <input> <codebase>{stack: language, framework, scale in LOC}</codebase> <current-test-state>{nothing / chaotic / partial / mature but slow}</current-test-state> <current-coverage>{rough %, if known}</current-coverage> <ci-time>{full CI run time, if you have one}</ci-time> <team-size>{engineers}</team-size> <biggest-pain>{slow CI / flaky tests / bugs in production / fear of refactoring / coverage gaming}</biggest-pain> <deployment-frequency>{multiple/day / weekly / monthly}</deployment-frequency> <known-bug-types>{what kind of bugs hit production — concurrency, edge cases, integration, etc.}</known-bug-types> <test-frameworks-current>{Jest / Vitest / pytest / RSpec / etc.}</test-frameworks-current> </input> <output-format> # Test Strategy: [project name] ## Diagnosis What's broken in current testing. The 1-2 highest-leverage fixes. ## The Pyramid for This Codebase Unit / integration / e2e mix recommendation. Specific %. ## What to Unit-Test The code paths that benefit from unit tests. Examples from your codebase. ## What to Integration-Test The seams: DB, external APIs, queue handlers, etc. ## What to E2E-Test The critical user flows. NOT 'every feature.' ## What NOT to Test The code paths where tests are noise. Be specific. ## Tests to Delete If existing suite: the patterns to remove. ## Contract Test Layer Where contract tests beat e2e. ## Flaky Test Policy How to handle flakies (hint: not retry). ## CI Execution Strategy Fast → slow ordering. Parallel where appropriate. What runs on save vs PR vs merge. ## AI-Generated Test Quality Gate How to verify AI-written tests actually test something. ## Test Data Strategy Factories, fixtures, snapshots, mocks. What works at your scale. ## Implementation Order Week-by-week build / cleanup plan. ## Coverage Target The REAL coverage target — not 100%. ## What This Strategy Won't Solve Honest limits. ## Maintenance Cadence When to revisit + audit. ## Key Takeaways 4-6 bullets — for the team's testing playbook. </output-format> <auto-intake> If input incomplete: ask for codebase, current test state, coverage, CI time, team size, biggest pain, deployment frequency, bug types, test frameworks. </auto-intake> Now, design the test strategy:

0 copies

🚀 Open in ChatGPT ✨ Open in Claude 💎 Open in Gemini

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input

<codebase>Next.js 14 + TypeScript + tRPC + Postgres (Drizzle) + Stripe. ~80K LOC. Mono-app, no monorepo. B2B SaaS at $1M ARR.</codebase>
<current-test-state>Chaotic. Some tests for utility functions (good). A few integration tests for tRPC procedures (slow + flaky). One e2e test that's been broken for 3 months. No tests for billing/Stripe webhooks (where most prod bugs hit us). Coverage ~35%.</current-test-state>
<current-coverage>35% line coverage</current-coverage>
<ci-time>~12 min, with 2-3 random flakies per week</ci-time>
<team-size>4 engineers</team-size>
<biggest-pain>(1) Stripe webhook bugs hit production ~once/month. Tests don't catch them. (2) CI is slow + has flakies, so engineers ignore failures. (3) We're afraid to refactor billing code because we don't trust our tests there.</biggest-pain>
<deployment-frequency>Multiple times per day. We have CD on merge to main.</deployment-frequency>
<known-bug-types>Stripe webhook handler edge cases (event types we didn't handle). Race conditions in our queue worker (occasional double-charges). Frontend forgetting loading states (low impact). DB migration drift between staging + prod (rare but bad).</known-bug-types>
<test-frameworks-current>Vitest for unit + integration. Playwright for the broken e2e. No contract tests. No mocking strategy — engineers use vi.mock ad hoc.</test-frameworks-current>

🤖 Output

Test Strategy: Next.js + tRPC + Stripe SaaS

Diagnosis

Your test pain isn't 'low coverage' — it's 'wrong tests.' You have ~35% coverage on utility functions you wrote 18 months ago, and 0% coverage on the Stripe webhook handler that's been your top production bug source.

Critical fixes (in order):

1. Build a contract test layer for Stripe webhooks. Replay real Stripe events through your handler. Catches 80% of the bugs that currently hit production.

2. Delete the broken Playwright e2e test + the flaky integration tests. They're costing CI time + trust without finding bugs.

3. Invert the pyramid: more unit + contract tests; fewer integration tests. CI drops from 12min → 4-5min.

4. Establish flaky-test policy: flakies are bugs to fix, not retries to add.

Estimated outcome: CI drops to ~4 min. Stripe webhook bugs to production drop ~80%. Engineers regain trust in CI signal.

The Pyramid for This Codebase

For your scale + stack, target:

        ┌─────┐
        │ E2E │   5-10 tests   (5%)
        ├─────┤
        │     │
        │ INT │   30-50 tests  (15%)
        │     │
        ├─────┤
        │ CON │   20-30 tests  (15%)  ← CONTRACT TESTS, the missing layer
        ├─────┤
        │     │
        │     │
        │ UNIT│   200+ tests   (65%)
        │     │
        │     │
        └─────┘

Why contract tests as a layer: Your top bug source (Stripe webhooks) lives at a service boundary. Contract tests are perfect for this — fast, focused, deterministic.

What to Unit-Test

Unit-test (Vitest, fast, no I/O):

Pure functions in /lib/: pricing calculations, date helpers, validators, formatters.
Business logic that doesn't touch DB/HTTP: subscription tier logic, feature-flag evaluation, user permission checks.
Complex tRPC procedure logic that you can extract from the procedure into a pure function — then unit-test that function.
React component logic isolated from API calls (use vi.mock for trpc client).
Drizzle query builders that return SQL — test the SQL is correct, not the result of running it.

Example: subscription pricing. If you compute tiered pricing, this is the highest-leverage place to test. Cover all tier boundaries, prorations, edge cases.

Don't unit-test:

Trivial getters/setters
Direct DB ORM calls (those need integration tests)
HTTP client wrappers (mock the HTTP, test the wrapper logic only)
React components that just render UI without logic (visual regression covers these if needed)

What to Integration-Test

Integration tests (slower, real DB or close to it):

tRPC procedures end-to-end that involve DB writes + reads (use a test DB or transaction rollback).
Auth flows: login, token refresh, permission enforcement.
Critical multi-step flows: create order → process payment → mark shipped.
DB migration tests: verify migration up + down work without data loss on representative data.

For your codebase specifically (~30-50 integration tests):

15 for tRPC procedures (top user-facing flows: subscribe, cancel, upgrade, billing portal)
8-10 for auth (login flows, MFA if applicable, role checks)
5-8 for critical billing flows (create-subscription, change-tier, refund)
4-6 for queue workers (job runs end-to-end through DB)

Don't integration-test:

Every CRUD endpoint (most are trivial; cover via unit tests on the validation layer)
Frontend → backend → DB full chain (that's e2e territory; expensive)
Third-party API calls live (use contract tests)

What to E2E-Test

E2E (Playwright, slowest, brittle):

Only the 5-10 critical user flows that MUST work:

1. Sign up → email confirm → first login

2. Subscribe to paid plan (Stripe Checkout end-to-end)

3. Cancel subscription via portal

4. Upgrade tier mid-cycle

5. Reset password flow

Don't e2e-test:

Every page renders
Every form submits
Every error state
Every UI interaction

E2E is for 'this critical path must not break in production.' Not for coverage.

What NOT to Test

1. Trivial code with no logic. Getters, setters, factory functions returning literals, type definitions.

2. Implementation details. 'Did this function call X then Y' tests lock in implementation. Test outputs, not call sequences.

3. Generated code. tRPC client types, Drizzle schema-generated types, OpenAPI clients.

4. Framework code. Don't test that Next.js routes work; test YOUR route handlers.

5. Third-party libraries. Don't test that Stripe SDK works; test that YOUR handler does the right thing when Stripe events arrive.

6. The README, the docs, the config files (occasionally a config validation test is useful — usually overkill).

Tests to Delete from Your Current Suite

Without seeing your code, I'd guess these patterns:

1. Snapshot tests of components. Brittle, lock implementation, train team to thoughtlessly update. Replace with explicit assertions on visible behavior.

2. Tests that mock everything. If a unit test has 8 vi.mock() calls, you're testing your mocks, not your code. Either extract logic to a pure function (and test that) or move to integration test.

3. The broken Playwright test (the one not running for 3 months). Delete it. If it was important, you'd have fixed it. The fact it's been broken means it wasn't catching anything.

4. Tests that have been retried/flagged as 'allow flaky.' If the team retries, the test is signaling something. Either fix or delete.

5. Tests that exercise the same code path 10 times with trivially different inputs. Pick 1-2 representative cases + edge cases.

Run a 'tests-not-failing-when-code-is-wrong' audit. Take 10 tests at random. Comment out the assertion. Does the test still pass? If yes, the test isn't testing anything. Delete or fix.

Contract Test Layer (the missing piece)

This is your highest-leverage addition.

For Stripe webhooks

/tests/contracts/stripe/
  fixtures/
    invoice.payment_succeeded.json    (real Stripe event payload)
    customer.subscription.created.json
    customer.subscription.updated.json
    customer.subscription.deleted.json
    invoice.payment_failed.json
    charge.dispute.created.json
    [12-15 fixtures total of events you handle]
  webhook-handler.test.ts             (tests handler against each fixture)

Each test:

it('handles invoice.payment_succeeded by marking customer paid', async () => {
  const event = loadFixture('invoice.payment_succeeded.json');
  const result = await handleStripeWebhook(event);
  expect(result.status).toBe('processed');
  expect(await db.customer(event.data.object.customer)).toMatchObject({
    payment_status: 'paid',
    last_payment_at: expect.any(Date),
  });
});

How to capture fixtures: Stripe CLI can listen to test events + dump them. Or use Stripe Dashboard's event log to copy real production events (anonymize sensitive data).

Contract test for unknown events:

it('logs + ignores unknown event types', async () => {
  const event = { type: 'fictional.event.type', data: { object: {} } };
  const result = await handleStripeWebhook(event);
  expect(result.status).toBe('ignored');
  expect(logger.warn).toHaveBeenCalledWith('Unknown Stripe event type', expect.anything());
});

This catches the 'event type we didn't handle' bug class.

For other external services

Also add contract tests for:

Auth provider webhooks (if using Auth0/Clerk/etc.)
Email service (if using SendGrid/Postmark/etc.)
Anything that posts events to your system

Flaky Test Policy

Rule: a flaky test is broken until proven otherwise.

When a test flakes:

1. Mark `.skip()` (don't delete yet)

2. Open a ticket: 'Fix flaky test X'

3. Investigate root cause within 1 sprint

4. Common causes: race conditions, time-dependent code (use clock-mocking), order-dependent tests, shared state, real network calls

5. Fix or delete. Never 'add retry to make it pass.'

Why this matters: Retried-flaky tests train engineers to ignore CI signal. Once that habit forms, real failures get ignored too.

CI Execution Strategy

On save (local, instant):
  → Vitest watching changed unit-test files (~50ms feedback)

On commit (pre-push hook):
  → Lint + type-check + relevant unit tests (~30s)

On PR open:
  → Full unit suite (parallel, ~1.5 min)
  → Integration tests (parallel, ~2 min)
  → Contract tests (~30s)
  → Type-check (~30s)
  → Lint (~20s)
  Total: ~3-4 min, parallel where possible

On merge to main:
  → Above PR pipeline
  → E2E tests against staging (~3-5 min)
  → Visual regression on key pages (~1 min)
  Total: ~6-8 min before deploy

Drop CI from current 12 min → ~4 min on PR. Engineers will care about CI again.

Parallelization:

Vitest natively parallels by file
Integration tests parallelize per-test-DB-instance (use Docker for ephemeral DBs)
E2E with Playwright parallelizes by worker

AI-Generated Test Quality Gate

If engineers use Cursor/Claude Code/Copilot to write tests:

Mandatory verification: for any AI-generated test, run the 'mutation' check:

1. Comment out the production code's logic (e.g., return early)

2. Run the test

3. If the test still passes → the test isn't actually testing the logic. Reject.

Code review red flags:

Test mocks the function under test (the test mocks itself effectively)
Test asserts on internal implementation details (expect(spy).toHaveBeenCalledWith(...))
Test asserts trivial things (expect(result).toBeDefined())
Test has 6+ mock setups (you're testing the mock contract, not the code)

Test Data Strategy

For your scale, use:

Factories (test-data builders):

// /tests/factories/user.ts
export const buildUser = (overrides = {}) => ({
  id: 'user_test_' + Math.random(),
  email: 'test@example.com',
  plan_tier: 'starter',
  ...overrides,
});

Use for unit + integration tests. Avoid hardcoded JSON fixtures except for contract tests.

Fixtures (real-world payloads):

Reserved for contract tests. Real Stripe events, real auth provider events.

Avoid snapshots. Especially for components. Use explicit assertions.

Test DB strategy:

For integration tests: ephemeral Postgres via Docker (postgres:16 image, fresh per test suite)
Use Drizzle's transaction rollback for per-test isolation
Or: per-test-database (slower but bulletproof)

Implementation Order

Week 1: Audit + delete

Run the 'mutation' check on 30 random existing tests. Delete failures.
Delete the broken Playwright test.
Delete snapshot tests where present.
Goal: existing test count drops 30-40%; CI time drops some.

Week 2: Contract tests for Stripe

Capture 12-15 Stripe webhook event fixtures
Write contract tests against your webhook handler
This alone should catch your monthly Stripe bug

Week 3: Unit-test the billing logic

Extract pricing/billing functions from tRPC procedures into pure functions
Unit-test the pure functions (high coverage on the actual money paths)
Aim: 30+ tests on billing logic, fast

Week 4: Critical integration tests

Subscribe / Cancel / Upgrade tRPC procedures with real DB
Auth flows
Queue worker happy paths
Aim: 10-15 strong integration tests

Week 5: E2E rebuild

5-7 critical user flows in Playwright (subscribe, cancel, upgrade, password reset, login)
Run on merge to main only
Aim: stable, fast, valued

Week 6: CI optimization

Parallelize aggressively
Move slow tests to merge-only
Establish flaky-test policy
Goal: PR CI under 4 min

Week 7+: Iterate

Re-run mutation check quarterly
Add tests for production bugs as they happen (regression prevention)
Monitor flaky-test count (target: 0)

Coverage Target

Real target: 60-70% line coverage, with 90%+ on critical paths (billing, auth, webhooks).

Not 100%. Not 35%. The shape matters more than the number.

Track these instead of overall coverage:

% coverage on /lib/billing/ → target 90%
% coverage on Stripe webhook handler → target 100% (yes, here)
% coverage on auth code → target 90%
% coverage on UI components → don't track (waste of metric)

What This Strategy Won't Solve

Won't replace product QA. Tests catch regressions; they don't catch 'this feature is wrong.'
Won't prevent all production bugs. Test strategy reduces certain bug classes; bugs still happen. Plan for fast incident response too.
Won't help if engineers don't run tests locally. Culture matters. CI catches eventually, but inner-loop testing is where bugs die fastest.
Won't fix architectural bugs. If your billing logic is fundamentally wrong, tests pass + bugs ship. Architecture review > tests.
Won't compensate for skipping types. TypeScript types catch a class of bugs tests can't easily reach. Use both.

Maintenance Cadence

Per-PR (engineer):

Run unit tests locally before pushing
New code includes tests at the right layer

Per-sprint (tech lead):

Review flaky-test count (target: 0)
Review CI time (target: <5 min for PR)
Investigate any test that takes >2s

Quarterly (eng team):

Mutation check on 50 random tests. Delete failures.
Coverage audit on critical-path code
Test strategy review: are we still hitting the right bug classes?
Flaky-test root cause analysis

Annually:

Full test architecture review
Tooling evaluation (still Vitest? Time to consider alternatives?)
Postmortem on production bugs that escaped tests

Key Takeaways

Your problem is wrong tests, not low coverage. 35% coverage on the wrong code is worse than 60% on the right code.
Contract tests for Stripe webhooks are your highest-leverage addition. Replay 15 real events; catch 80% of your monthly bugs.
Delete the broken Playwright test + flaky integration tests. CI trust matters more than coverage numbers.
Invert the pyramid: more unit + contract; fewer integration. CI drops from 12min to 4min.
Flaky tests are bugs in the test, not retries to add. Fix root causes; don't normalize ignoring CI.
AI-generated tests need the mutation check. Comment out the logic; if test passes, the test isn't testing anything. Reject.

Common use cases

Engineer joining a codebase with chaotic test suite (some tests great, some useless)
Tech lead establishing testing standards for a 5-15 person team
Solo founder who shipped fast + now needs test discipline before scaling
Team migrating from 'no tests' to 'real tests' — needs the staged plan
Team with slow CI (>15 min) where most tests are integration/e2e — needs to invert pyramid
Eng manager handling 'we have low coverage' criticism that misses the actual quality issue

Best AI model for this

Claude Opus 4. Test strategy needs reasoning about cost/value tradeoffs, codebase architecture, and team dynamics — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Pro tips

Test the seams, not the lines. 80% coverage of trivial getter/setter code is worse than 60% coverage of business logic.
Snapshot tests are usually anti-patterns. They lock in implementation, fail on intentional changes, train teams to update without thinking.
Contract tests at service boundaries beat e2e tests for catching integration bugs. Faster + more focused.
Fast tests run on every save. Slow tests run on PR. End-to-end tests run on merge to main. Don't slow your inner loop.
Flaky tests are bugs in the test, not the code. 'Just retry' policies hide real concurrency or environment issues.
AI-generated tests that pass aren't necessarily testing anything. Verify each test actually fails when the code is wrong.
Coverage is a leading indicator at best. 'Did we test the right things' is the real question — coverage doesn't answer it.

Customization tips

Be honest about your current test pain. 'Slow CI' vs 'flaky tests' vs 'production bugs slipping through' need different fixes.
List specific bug types that hit production. Test strategy calibrates against actual bug classes, not hypothetical ones.
Specify your test frameworks + CI tooling. Recommendations differ per stack — Vitest patterns differ from pytest patterns.
If you have an existing test suite, describe it concretely (test count, types, what's broken). The strategy delineates 'what to keep, what to delete, what to add.'
Specify deployment frequency. Multiple-times/day deploys need fast-feedback CI more than weekly-deploy teams.
Use the Migration Mode variant if your existing test suite is chaotic — it adds the cleanup-then-rebuild plan rather than incremental additions.

Variants

Backend API Mode

For REST/GraphQL backends — emphasizes contract tests, integration with DB, request/response validation.

Frontend Mode

For React/Vue/Svelte apps — emphasizes user-flow testing, component vs hook tests, visual regression.

Full-Stack Mode

For full-stack apps (Next.js, Rails, Django) — emphasizes the boundary between client + server testing.

Migration Mode

For codebases with chaotic existing tests — adds the cleanup-then-rebuild plan.

Frequently asked questions

How do I use the Test Strategy Architect prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Test Strategy Architect?

Claude Opus 4. Test strategy needs reasoning about cost/value tradeoffs, codebase architecture, and team dynamics — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Can I customize the Test Strategy Architect prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Test the seams, not the lines. 80% coverage of trivial getter/setter code is worse than 60% coverage of business logic.; Snapshot tests are usually anti-patterns. They lock in implementation, fail on intentional changes, train teams to update without thinking.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals

Curated by

Promptolis Editorial

Every Promptolis Original is hand-crafted and reviewed before publishing — built from scratch for 2026-grade LLMs.

Last reviewed on 2026-04-28 · About Promptolis