⚡ Promptolis Original · Coding & Development
🧪 Test Strategy Architect
Designs your test strategy: which tests at which layer, where to invest vs skip, the unit-vs-integration-vs-e2e mix that catches real bugs without paying the 100% coverage tax.
Why this is epic
Most test strategies are 'we should write more tests' or '100% coverage' — both wrong. This Original designs the actual mix: which tests at which layer for YOUR codebase, where to invest, and where 'skip the test' is the right call.
Outputs the complete strategy: testing pyramid for your stack, what to unit-test vs integration-test vs e2e, what NOT to test, the contract-test layer for service boundaries, the flaky-test policy, and the CI execution order that fails fast.
Calibrated to 2026 testing reality: AI-generated tests that pass-but-don't-test-anything, the death of '100% coverage' as a goal, the rise of contract testing, snapshot tests as anti-pattern. Honest about what works.
Includes the 'what tests to delete' section. Most codebases have 30-40% of tests that test implementation details + slow CI without catching bugs. Pruning is as important as adding.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
<codebase>Next.js 14 + TypeScript + tRPC + Postgres (Drizzle) + Stripe. ~80K LOC. Mono-app, no monorepo. B2B SaaS at $1M ARR.</codebase> <current-test-state>Chaotic. Some tests for utility functions (good). A few integration tests for tRPC procedures (slow + flaky). One e2e test that's been broken for 3 months. No tests for billing/Stripe webhooks (where most prod bugs hit us). Coverage ~35%.</current-test-state> <current-coverage>35% line coverage</current-coverage> <ci-time>~12 min, with 2-3 random flakies per week</ci-time> <team-size>4 engineers</team-size> <biggest-pain>(1) Stripe webhook bugs hit production ~once/month. Tests don't catch them. (2) CI is slow + has flakies, so engineers ignore failures. (3) We're afraid to refactor billing code because we don't trust our tests there.</biggest-pain> <deployment-frequency>Multiple times per day. We have CD on merge to main.</deployment-frequency> <known-bug-types>Stripe webhook handler edge cases (event types we didn't handle). Race conditions in our queue worker (occasional double-charges). Frontend forgetting loading states (low impact). DB migration drift between staging + prod (rare but bad).</known-bug-types> <test-frameworks-current>Vitest for unit + integration. Playwright for the broken e2e. No contract tests. No mocking strategy — engineers use vi.mock ad hoc.</test-frameworks-current>
Test Strategy: Next.js + tRPC + Stripe SaaS
Diagnosis
Your test pain isn't 'low coverage' — it's 'wrong tests.' You have ~35% coverage on utility functions you wrote 18 months ago, and 0% coverage on the Stripe webhook handler that's been your top production bug source.
Critical fixes (in order):
1. Build a contract test layer for Stripe webhooks. Replay real Stripe events through your handler. Catches 80% of the bugs that currently hit production.
2. Delete the broken Playwright e2e test + the flaky integration tests. They're costing CI time + trust without finding bugs.
3. Invert the pyramid: more unit + contract tests; fewer integration tests. CI drops from 12min → 4-5min.
4. Establish flaky-test policy: flakies are bugs to fix, not retries to add.
Estimated outcome: CI drops to ~4 min. Stripe webhook bugs to production drop ~80%. Engineers regain trust in CI signal.
The Pyramid for This Codebase
For your scale + stack, target:
┌─────┐
│ E2E │ 5-10 tests (5%)
├─────┤
│ │
│ INT │ 30-50 tests (15%)
│ │
├─────┤
│ CON │ 20-30 tests (15%) ← CONTRACT TESTS, the missing layer
├─────┤
│ │
│ │
│ UNIT│ 200+ tests (65%)
│ │
│ │
└─────┘
Why contract tests as a layer: Your top bug source (Stripe webhooks) lives at a service boundary. Contract tests are perfect for this — fast, focused, deterministic.
What to Unit-Test
Unit-test (Vitest, fast, no I/O):
- Pure functions in
/lib/: pricing calculations, date helpers, validators, formatters. - Business logic that doesn't touch DB/HTTP: subscription tier logic, feature-flag evaluation, user permission checks.
- Complex tRPC procedure logic that you can extract from the procedure into a pure function — then unit-test that function.
- React component logic isolated from API calls (use vi.mock for trpc client).
- Drizzle query builders that return SQL — test the SQL is correct, not the result of running it.
Example: subscription pricing. If you compute tiered pricing, this is the highest-leverage place to test. Cover all tier boundaries, prorations, edge cases.
Don't unit-test:
- Trivial getters/setters
- Direct DB ORM calls (those need integration tests)
- HTTP client wrappers (mock the HTTP, test the wrapper logic only)
- React components that just render UI without logic (visual regression covers these if needed)
What to Integration-Test
Integration tests (slower, real DB or close to it):
- tRPC procedures end-to-end that involve DB writes + reads (use a test DB or transaction rollback).
- Auth flows: login, token refresh, permission enforcement.
- Critical multi-step flows: create order → process payment → mark shipped.
- DB migration tests: verify migration up + down work without data loss on representative data.
For your codebase specifically (~30-50 integration tests):
- 15 for tRPC procedures (top user-facing flows: subscribe, cancel, upgrade, billing portal)
- 8-10 for auth (login flows, MFA if applicable, role checks)
- 5-8 for critical billing flows (create-subscription, change-tier, refund)
- 4-6 for queue workers (job runs end-to-end through DB)
Don't integration-test:
- Every CRUD endpoint (most are trivial; cover via unit tests on the validation layer)
- Frontend → backend → DB full chain (that's e2e territory; expensive)
- Third-party API calls live (use contract tests)
What to E2E-Test
E2E (Playwright, slowest, brittle):
Only the 5-10 critical user flows that MUST work:
1. Sign up → email confirm → first login
2. Subscribe to paid plan (Stripe Checkout end-to-end)
3. Cancel subscription via portal
4. Upgrade tier mid-cycle
5. Reset password flow
Don't e2e-test:
- Every page renders
- Every form submits
- Every error state
- Every UI interaction
E2E is for 'this critical path must not break in production.' Not for coverage.
What NOT to Test
1. Trivial code with no logic. Getters, setters, factory functions returning literals, type definitions.
2. Implementation details. 'Did this function call X then Y' tests lock in implementation. Test outputs, not call sequences.
3. Generated code. tRPC client types, Drizzle schema-generated types, OpenAPI clients.
4. Framework code. Don't test that Next.js routes work; test YOUR route handlers.
5. Third-party libraries. Don't test that Stripe SDK works; test that YOUR handler does the right thing when Stripe events arrive.
6. The README, the docs, the config files (occasionally a config validation test is useful — usually overkill).
Tests to Delete from Your Current Suite
Without seeing your code, I'd guess these patterns:
1. Snapshot tests of components. Brittle, lock implementation, train team to thoughtlessly update. Replace with explicit assertions on visible behavior.
2. Tests that mock everything. If a unit test has 8 vi.mock() calls, you're testing your mocks, not your code. Either extract logic to a pure function (and test that) or move to integration test.
3. The broken Playwright test (the one not running for 3 months). Delete it. If it was important, you'd have fixed it. The fact it's been broken means it wasn't catching anything.
4. Tests that have been retried/flagged as 'allow flaky.' If the team retries, the test is signaling something. Either fix or delete.
5. Tests that exercise the same code path 10 times with trivially different inputs. Pick 1-2 representative cases + edge cases.
Run a 'tests-not-failing-when-code-is-wrong' audit. Take 10 tests at random. Comment out the assertion. Does the test still pass? If yes, the test isn't testing anything. Delete or fix.
Contract Test Layer (the missing piece)
This is your highest-leverage addition.
For Stripe webhooks
/tests/contracts/stripe/
fixtures/
invoice.payment_succeeded.json (real Stripe event payload)
customer.subscription.created.json
customer.subscription.updated.json
customer.subscription.deleted.json
invoice.payment_failed.json
charge.dispute.created.json
[12-15 fixtures total of events you handle]
webhook-handler.test.ts (tests handler against each fixture)
Each test:
it('handles invoice.payment_succeeded by marking customer paid', async () => {
const event = loadFixture('invoice.payment_succeeded.json');
const result = await handleStripeWebhook(event);
expect(result.status).toBe('processed');
expect(await db.customer(event.data.object.customer)).toMatchObject({
payment_status: 'paid',
last_payment_at: expect.any(Date),
});
});
How to capture fixtures: Stripe CLI can listen to test events + dump them. Or use Stripe Dashboard's event log to copy real production events (anonymize sensitive data).
Contract test for unknown events:
it('logs + ignores unknown event types', async () => {
const event = { type: 'fictional.event.type', data: { object: {} } };
const result = await handleStripeWebhook(event);
expect(result.status).toBe('ignored');
expect(logger.warn).toHaveBeenCalledWith('Unknown Stripe event type', expect.anything());
});
This catches the 'event type we didn't handle' bug class.
For other external services
Also add contract tests for:
- Auth provider webhooks (if using Auth0/Clerk/etc.)
- Email service (if using SendGrid/Postmark/etc.)
- Anything that posts events to your system
Flaky Test Policy
Rule: a flaky test is broken until proven otherwise.
When a test flakes:
1. Mark `.skip()` (don't delete yet)
2. Open a ticket: 'Fix flaky test X'
3. Investigate root cause within 1 sprint
4. Common causes: race conditions, time-dependent code (use clock-mocking), order-dependent tests, shared state, real network calls
5. Fix or delete. Never 'add retry to make it pass.'
Why this matters: Retried-flaky tests train engineers to ignore CI signal. Once that habit forms, real failures get ignored too.
CI Execution Strategy
On save (local, instant):
→ Vitest watching changed unit-test files (~50ms feedback)
On commit (pre-push hook):
→ Lint + type-check + relevant unit tests (~30s)
On PR open:
→ Full unit suite (parallel, ~1.5 min)
→ Integration tests (parallel, ~2 min)
→ Contract tests (~30s)
→ Type-check (~30s)
→ Lint (~20s)
Total: ~3-4 min, parallel where possible
On merge to main:
→ Above PR pipeline
→ E2E tests against staging (~3-5 min)
→ Visual regression on key pages (~1 min)
Total: ~6-8 min before deploy
Drop CI from current 12 min → ~4 min on PR. Engineers will care about CI again.
Parallelization:
- Vitest natively parallels by file
- Integration tests parallelize per-test-DB-instance (use Docker for ephemeral DBs)
- E2E with Playwright parallelizes by worker
AI-Generated Test Quality Gate
If engineers use Cursor/Claude Code/Copilot to write tests:
Mandatory verification: for any AI-generated test, run the 'mutation' check:
1. Comment out the production code's logic (e.g., return early)
2. Run the test
3. If the test still passes → the test isn't actually testing the logic. Reject.
Code review red flags:
- Test mocks the function under test (the test mocks itself effectively)
- Test asserts on internal implementation details (
expect(spy).toHaveBeenCalledWith(...)) - Test asserts trivial things (
expect(result).toBeDefined()) - Test has 6+ mock setups (you're testing the mock contract, not the code)
Test Data Strategy
For your scale, use:
Factories (test-data builders):
// /tests/factories/user.ts
export const buildUser = (overrides = {}) => ({
id: 'user_test_' + Math.random(),
email: 'test@example.com',
plan_tier: 'starter',
...overrides,
});
Use for unit + integration tests. Avoid hardcoded JSON fixtures except for contract tests.
Fixtures (real-world payloads):
Reserved for contract tests. Real Stripe events, real auth provider events.
Avoid snapshots. Especially for components. Use explicit assertions.
Test DB strategy:
- For integration tests: ephemeral Postgres via Docker (postgres:16 image, fresh per test suite)
- Use Drizzle's transaction rollback for per-test isolation
- Or: per-test-database (slower but bulletproof)
Implementation Order
Week 1: Audit + delete
- Run the 'mutation' check on 30 random existing tests. Delete failures.
- Delete the broken Playwright test.
- Delete snapshot tests where present.
- Goal: existing test count drops 30-40%; CI time drops some.
Week 2: Contract tests for Stripe
- Capture 12-15 Stripe webhook event fixtures
- Write contract tests against your webhook handler
- This alone should catch your monthly Stripe bug
Week 3: Unit-test the billing logic
- Extract pricing/billing functions from tRPC procedures into pure functions
- Unit-test the pure functions (high coverage on the actual money paths)
- Aim: 30+ tests on billing logic, fast
Week 4: Critical integration tests
- Subscribe / Cancel / Upgrade tRPC procedures with real DB
- Auth flows
- Queue worker happy paths
- Aim: 10-15 strong integration tests
Week 5: E2E rebuild
- 5-7 critical user flows in Playwright (subscribe, cancel, upgrade, password reset, login)
- Run on merge to main only
- Aim: stable, fast, valued
Week 6: CI optimization
- Parallelize aggressively
- Move slow tests to merge-only
- Establish flaky-test policy
- Goal: PR CI under 4 min
Week 7+: Iterate
- Re-run mutation check quarterly
- Add tests for production bugs as they happen (regression prevention)
- Monitor flaky-test count (target: 0)
Coverage Target
Real target: 60-70% line coverage, with 90%+ on critical paths (billing, auth, webhooks).
Not 100%. Not 35%. The shape matters more than the number.
Track these instead of overall coverage:
- % coverage on
/lib/billing/→ target 90% - % coverage on Stripe webhook handler → target 100% (yes, here)
- % coverage on auth code → target 90%
- % coverage on UI components → don't track (waste of metric)
What This Strategy Won't Solve
- Won't replace product QA. Tests catch regressions; they don't catch 'this feature is wrong.'
- Won't prevent all production bugs. Test strategy reduces certain bug classes; bugs still happen. Plan for fast incident response too.
- Won't help if engineers don't run tests locally. Culture matters. CI catches eventually, but inner-loop testing is where bugs die fastest.
- Won't fix architectural bugs. If your billing logic is fundamentally wrong, tests pass + bugs ship. Architecture review > tests.
- Won't compensate for skipping types. TypeScript types catch a class of bugs tests can't easily reach. Use both.
Maintenance Cadence
Per-PR (engineer):
- Run unit tests locally before pushing
- New code includes tests at the right layer
Per-sprint (tech lead):
- Review flaky-test count (target: 0)
- Review CI time (target: <5 min for PR)
- Investigate any test that takes >2s
Quarterly (eng team):
- Mutation check on 50 random tests. Delete failures.
- Coverage audit on critical-path code
- Test strategy review: are we still hitting the right bug classes?
- Flaky-test root cause analysis
Annually:
- Full test architecture review
- Tooling evaluation (still Vitest? Time to consider alternatives?)
- Postmortem on production bugs that escaped tests
Key Takeaways
- Your problem is wrong tests, not low coverage. 35% coverage on the wrong code is worse than 60% on the right code.
- Contract tests for Stripe webhooks are your highest-leverage addition. Replay 15 real events; catch 80% of your monthly bugs.
- Delete the broken Playwright test + flaky integration tests. CI trust matters more than coverage numbers.
- Invert the pyramid: more unit + contract; fewer integration. CI drops from 12min to 4min.
- Flaky tests are bugs in the test, not retries to add. Fix root causes; don't normalize ignoring CI.
- AI-generated tests need the mutation check. Comment out the logic; if test passes, the test isn't testing anything. Reject.
Common use cases
- Engineer joining a codebase with chaotic test suite (some tests great, some useless)
- Tech lead establishing testing standards for a 5-15 person team
- Solo founder who shipped fast + now needs test discipline before scaling
- Team migrating from 'no tests' to 'real tests' — needs the staged plan
- Team with slow CI (>15 min) where most tests are integration/e2e — needs to invert pyramid
- Eng manager handling 'we have low coverage' criticism that misses the actual quality issue
Best AI model for this
Claude Opus 4. Test strategy needs reasoning about cost/value tradeoffs, codebase architecture, and team dynamics — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Pro tips
- Test the seams, not the lines. 80% coverage of trivial getter/setter code is worse than 60% coverage of business logic.
- Snapshot tests are usually anti-patterns. They lock in implementation, fail on intentional changes, train teams to update without thinking.
- Contract tests at service boundaries beat e2e tests for catching integration bugs. Faster + more focused.
- Fast tests run on every save. Slow tests run on PR. End-to-end tests run on merge to main. Don't slow your inner loop.
- Flaky tests are bugs in the test, not the code. 'Just retry' policies hide real concurrency or environment issues.
- AI-generated tests that pass aren't necessarily testing anything. Verify each test actually fails when the code is wrong.
- Coverage is a leading indicator at best. 'Did we test the right things' is the real question — coverage doesn't answer it.
Customization tips
- Be honest about your current test pain. 'Slow CI' vs 'flaky tests' vs 'production bugs slipping through' need different fixes.
- List specific bug types that hit production. Test strategy calibrates against actual bug classes, not hypothetical ones.
- Specify your test frameworks + CI tooling. Recommendations differ per stack — Vitest patterns differ from pytest patterns.
- If you have an existing test suite, describe it concretely (test count, types, what's broken). The strategy delineates 'what to keep, what to delete, what to add.'
- Specify deployment frequency. Multiple-times/day deploys need fast-feedback CI more than weekly-deploy teams.
- Use the Migration Mode variant if your existing test suite is chaotic — it adds the cleanup-then-rebuild plan rather than incremental additions.
Variants
Backend API Mode
For REST/GraphQL backends — emphasizes contract tests, integration with DB, request/response validation.
Frontend Mode
For React/Vue/Svelte apps — emphasizes user-flow testing, component vs hook tests, visual regression.
Full-Stack Mode
For full-stack apps (Next.js, Rails, Django) — emphasizes the boundary between client + server testing.
Migration Mode
For codebases with chaotic existing tests — adds the cleanup-then-rebuild plan.
Frequently asked questions
How do I use the Test Strategy Architect prompt?
Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.
Which AI model works best with Test Strategy Architect?
Claude Opus 4. Test strategy needs reasoning about cost/value tradeoffs, codebase architecture, and team dynamics — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Can I customize the Test Strategy Architect prompt for my use case?
Yes — every Promptolis Original is designed to be customized. Key levers: Test the seams, not the lines. 80% coverage of trivial getter/setter code is worse than 60% coverage of business logic.; Snapshot tests are usually anti-patterns. They lock in implementation, fail on intentional changes, train teams to update without thinking.
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals