⚡ Promptolis Original · Coding & Development
🚩 Feature Flag System Architect
Designs your feature-flag system: which flags vs configs, percentage rollout vs targeting, the cleanup process that prevents 200-stale-flag debt — without LaunchDarkly's $25K/yr bill if you don't need it.
Why this is epic
Most feature-flag systems become flag graveyards: 200 flags in production, half of which nobody knows what they do, ship-with-flag-on-permanently leakage. This Original designs the system + the cleanup discipline.
Outputs the complete strategy: flag taxonomy (release flags vs experiment flags vs ops flags vs permission flags), naming conventions, percentage-rollout patterns, default values, cleanup process, and the right tool for your scale.
Calls out when you DON'T need a paid feature-flag service. LaunchDarkly is $25K/yr; for small teams, simple env-var configs or PostHog free tier is plenty. The Original tells you which fits.
Calibrated to 2026 reality: feature flags as the standard rollout pattern, the rise of per-user targeting via segments, the cleanup-debt crisis (most companies have 100+ stale flags), AI-flagged experiments.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
<team-size>4 backend, 1 frontend</team-size> <flag-need>Mostly rollout safety (gradual rollout of new features, kill switches for risky changes). Occasional A/B test (3-4/year). Some per-customer flags (early-access features for specific accounts).</flag-need> <scale>~5K customers, ~80K users, ~200 req/sec average. Multiple deploys per day.</scale> <existing-state>Nothing systematic. We have ~12 env vars masquerading as flags (FEATURE_X=true). One ad-hoc 'admin can toggle' check via DB lookup.</existing-state> <budget-tolerance>Moderate. Not paying $25K/yr for LaunchDarkly. Open to ~$50/month tools.</budget-tolerance> <infrastructure>Next.js 14 + tRPC + Postgres + Redis. Vercel deploys.</infrastructure> <rollout-needs>Percentage rollout for risky features (start at 1%, ramp to 100%). Per-customer for early-access. Kill switch is critical.</rollout-needs> <deployment-pattern>Continuous deploy. ~5 deploys/day average.</deployment-pattern> <cleanup-track-record>Currently nobody cleans up. The 12 env-var 'flags' include 4 that nobody remembers what they do.</cleanup-track-record>
Feature Flag System: B2B SaaS Mid-Scale
Suitability Check
You need a feature-flag system. Env vars don't support percentage rollout, per-customer targeting, or kill switches at runtime.
Tool recommendations for your scale + budget:
1. PostHog (free tier covers your scale). PostHog's feature flag system is solid + free at your size. Built-in A/B testing. Percentage rollout. Per-user targeting. ~$0/month at 80K users + ~10K events/day.
2. Self-hosted Postgres-backed flags (DIY). ~$0/month, 1 week to build. Good if you want full control + no SaaS dependency.
3. GrowthBook (free open-source, paid hosted at $19/user). Strong for A/B testing. Solid for rollouts. ~$95/month for 5 users.
4. Statsig (free tier <1M events/month). Strong A/B testing focus. Alternative to PostHog.
LaunchDarkly is overkill. Their $25K/yr buys enterprise features (SSO, audit log, compliance attestations) you don't need at this stage.
Recommendation: PostHog. You may already use it for analytics; flags are bundled. Free tier sufficient. If not on PostHog already, GrowthBook self-hosted is the cheap second-best.
Flag Taxonomy
1. Release flags (gradual rollout of new features)
- Lifecycle: created → 1% → 25% → 50% → 100% → REMOVED (within 60 days of 100%)
- Owner: feature engineer
- Examples:
release.new-checkout-flow,release.improved-search-ranking
2. Experiment flags (A/B tests)
- Lifecycle: created → measured for 2-4 weeks → conclusion → REMOVED (within 2 weeks of decision)
- Owner: PM + engineer
- Examples:
experiment.pricing-page-redesign-q4,experiment.email-subject-line
3. Ops flags (kill switches)
- Lifecycle: created → live indefinitely (operational tool)
- Owner: tech lead / on-call
- Examples:
ops.kill-switch-payments,ops.disable-ai-summarization,ops.maintenance-mode
4. Permission flags (per-customer entitlement / early access)
- Lifecycle: per-customer; lasts as long as the entitlement does
- Owner: account exec / customer success
- Examples:
permission.early-access-customer-acme,permission.beta-program
Note: permanent tier differences (Free vs Pro features) are NOT flags. Use a subscription.tier check instead. Flags are for time-limited or customer-specific overrides.
Naming Conventions
<type>.<name>
release.new-checkout-flow
release.improved-search-ranking-2026q2
experiment.pricing-page-redesign
experiment.email-subject-line-test
ops.kill-switch-payments
ops.disable-ai-summarization
permission.early-access-customer-acme
permission.beta-program-q2-2026
Rules:
- Lowercase, hyphenated
- First segment is type prefix (lifecycle signal)
- Second segment describes purpose
- Optional date suffix for time-bounded experiments
Reject any flag without a prefix. Add lint rule.
Implementation Architecture (PostHog-based)
// /lib/flags/index.ts
import { PostHog } from 'posthog-node';
const posthog = new PostHog(process.env.POSTHOG_API_KEY!, {
host: 'https://us.i.posthog.com',
});
export async function getFlag(
flagKey: string,
context: { userId: string; customerId?: string; userProperties?: Record<string, any> }
): Promise<boolean | string> {
try {
const value = await posthog.getFeatureFlag(flagKey, context.userId, {
groups: context.customerId ? { customer: context.customerId } : undefined,
personProperties: context.userProperties,
});
return value;
} catch (e) {
// SAFE DEFAULT on failure
logger.warn({ flagKey, error: e }, 'Feature flag fetch failed');
return getSafeDefault(flagKey);
}
}
function getSafeDefault(flagKey: string): boolean {
// Convention: flags default to OFF unless explicitly registered
const SAFE_ON_FLAGS = new Set([
// Flags that are already at 100% rollout — shouldn't fail-off
// Track in a registry; review monthly
]);
return SAFE_ON_FLAGS.has(flagKey);
}
Caching: PostHog SDK caches flag values per-user with sensible defaults. For ultra-low-latency reads, cache in app-memory (5min TTL).
Per-Type Lifecycle
Release flags
- Created with explicit retirement date in metadata (default: 60 days from creation)
- Status progression: 1% → 25% → 50% → 100%
- After 30 days at 100%: automated reminder to remove
- After 60 days at 100%: PR mandatory to remove or extend
Experiment flags
- Created with experiment-end date
- After end date: 7-day grace to make decision
- After decision: 14-day grace to remove from code
- After 30 days post-decision: blocking PR to remove
Ops flags
- Live indefinitely
- Quarterly audit: still valid? Owner still here?
- Document purpose + how to flip
Permission flags
- Tied to customer/account record
- When customer's entitlement ends: flag value flips to false naturally
- Cleanup when no customer has the entitlement for 90+ days
Percentage Rollout Strategy
Hash on userId (stable). Same user always sees same variant.
// Pseudo-implementation (PostHog handles this internally)
function shouldRollout(userId: string, percentage: number): boolean {
const hash = sha256(`${userId}:${flagKey}`);
const bucket = parseInt(hash.slice(0, 8), 16) % 100;
return bucket < percentage;
}
Why hash on userId, not request: consistent experience per user. If they refresh, same flag state.
Standard ramp: 1% → 5% → 10% → 25% → 50% → 100%. Hold at each for at least 24h to verify metrics.
Rollout abort: if metrics regress at 5%, drop back to 1% or 0%. Don't proceed.
Targeting + Segmentation
For per-customer early access:
// PostHog: target by customer property
await getFlag('permission.early-access-feature-x', {
userId: user.id,
customerId: user.customerId,
userProperties: {
customer_plan_tier: user.plan,
is_early_access: user.customer.earlyAccess === true,
},
});
// In PostHog UI: target users where is_early_access = true
For segment-based experiments:
// Target only paying customers in a specific tier
// PostHog UI: 'plan_tier in [growth, scale]' AND 'signup_date_after 2026-01-01'
Cleanup Discipline
The hard part. Most teams skip cleanup; you must not.
Process
1. Every flag creation includes `expectedRetirement: <date>` in PR description
2. Weekly cron: query flags whose retirement date passed without action → Slack alert to owner
3. Monthly: zero-tolerance review of ops + permission flags (still valid?)
4. Quarterly: audit flag count. Target <20 active flags at any time.
Code-level cleanup
For each flag retired:
// Before:
if (await getFlag('release.new-checkout-flow', context)) {
return newCheckoutFlow();
} else {
return oldCheckoutFlow();
}
// After (cleanup):
return newCheckoutFlow();
// Delete oldCheckoutFlow function entirely
// Remove flag from PostHog
Lint rule
// .eslint-flag-rules.ts (custom rule)
// Detect getFlag() calls; warn if flag's expectedRetirement is past
This can be a check that runs on PR — block merging if a flag is past its retirement date and the PR doesn't clean it up.
Implementation Skeleton
/lib/flags/
client.ts (PostHog client setup)
get-flag.ts (main getFlag function with safe defaults)
registry.ts (registered flags + their metadata)
test-helpers.ts (mockFlag for tests)
cleanup-monitor.ts (script that runs weekly, alerts on stale flags)
/lib/flags/registry.ts:
export const FLAGS = {
'release.new-checkout-flow': {
type: 'release',
owner: '@sarah',
description: 'Switches to redesigned checkout flow',
createdAt: '2026-04-01',
expectedRetirement: '2026-06-01',
},
'ops.kill-switch-payments': {
type: 'ops',
owner: '@oncall',
description: 'Emergency kill switch for payments. Set to true to disable all payment processing.',
createdAt: '2025-01-01',
// No retirement (ops flag — lives indefinitely)
},
// ... all flags registered here
};
Why register flags in code: single source of truth. Remove from PostHog without removing from registry → CI fails. Forces cleanup discipline.
Testing Strategy
For active experiments:
// /tests/checkout.test.ts
import { mockFlag } from '@/lib/flags/test-helpers';
describe('Checkout', () => {
describe('with new-checkout-flow flag ON', () => {
beforeEach(() => mockFlag('release.new-checkout-flow', true));
it('uses new flow', async () => { /* ... */ });
});
describe('with new-checkout-flow flag OFF', () => {
beforeEach(() => mockFlag('release.new-checkout-flow', false));
it('uses old flow', async () => { /* ... */ });
});
});
CI runs both states. Active flags get duplicate test runs. Once flag is retired, the OFF tests get deleted along with the OFF code path.
Migration from Existing
Week 1: Audit current 12 env-var flags
- For each: what does it do? Who owns? Is it a real flag or a config?
- 4 nobody remembers → kill them. Zero risk for unused features; if they ARE used, the audit reveals it.
Week 2: Set up PostHog flags
- Create PostHog account (or use existing)
- Migrate 'real' flags from env vars to PostHog
- Start with non-critical flags; verify behavior
Week 3: Migrate kill switches
- Convert env-var-based kill switches to PostHog flags
- Test the kill switches actually work (no regressions)
Week 4: Build registry + cleanup tooling
- Code-level registry (the constants file)
- Weekly cron alerting on stale flags
- Lint rule for unprefixed flag names
Week 5: Document + train team
- Wiki page on the flag system
- Show team the lifecycle expectations
- Code-review checklist
What This System Won't Solve
- Won't replace permission/entitlement system. Permanent feature differences should NOT be flags.
- Won't auto-enforce cleanup if humans skip it. The discipline matters; tooling helps but can't replace will.
- Won't help if owners leave the company. Establish ownership transfer when engineers leave.
- Won't compensate for poor experiment design. A/B tests need proper hypothesis + sample size + statistical analysis. Flags are the delivery mechanism.
- Won't scale to 1000s of customers needing per-customer flags. Per-customer flags work at your 5K-customer scale; at 50K customers, the registry approach is unwieldy.
Maintenance Cadence
Weekly:
- Cron alert: any flag past retirement date?
- Quick scan: any new flags added without registration?
Monthly:
- Active flag count audit (target: <20)
- Ops flag review (still valid?)
- Permission flag review (still in use?)
Quarterly:
- Full flag-system audit
- Tooling review (PostHog still right? Add features?)
- Team training refresh
Annually:
- Re-evaluate vs SaaS alternatives
- Cost/value review
Key Takeaways
- PostHog free tier or self-hosted PG-backed flags. LaunchDarkly is overkill at your scale.
- 4 flag types with prefix naming:
release.,experiment.,ops.,permission.. Prefix tells the lifecycle. - Cleanup discipline from day 1. Without it, you accumulate 100+ stale flags within 2 years.
- Code-level registry as single source of truth. Forces the cleanup; tooling enforces.
- Test BOTH states for active experiments. CI runs duplicate. Cleanup deletes the off-state when flag retires.
- Hash percentage rollout on userId. Same user, same variant, every time. Gradual ramp 1% → 100%.
Common use cases
- Engineer adding feature flags to a codebase that's never had them
- Tech lead establishing flag-cleanup discipline in a flag-graveyard codebase
- Founder evaluating LaunchDarkly vs open-source vs DIY flag system
- Backend engineer designing flag system that handles per-user targeting + percentage rollout
- DevOps consolidating multiple ad-hoc flag systems into unified infrastructure
- Engineer building safe-rollout pattern for risky changes (DB migration switches, new pricing logic)
Best AI model for this
Claude Opus 4. Feature-flag design needs reasoning about deploy strategy, ops, and operational cost vs benefit — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Pro tips
- 4 flag types: release (gradual rollout), experiment (A/B test), ops (kill switch), permission (entitlement). Different lifecycles.
- Naming convention from day 1. `release.new-checkout-flow` vs `experiment.pricing-test-q4` vs `ops.kill-switch-payments`. Prefix tells you lifecycle.
- Default values matter. Flag-fetch-fail → use safe default (typically 'feature off'). Don't break.
- Cleanup is the discipline that distinguishes mature systems. Track flag age. Auto-alert at 90 days.
- Percentage rollouts hash on STABLE user attribute (user_id), not request. Same user always sees same variant.
- Don't use feature flags for permanent feature differences (free vs paid tier). That's a permission/entitlement check.
- Test both flag states. CI runs tests with flag on AND off for active experiments.
Customization tips
- Specify ALL flag-like patterns in your codebase (env vars used as flags, DB-toggle-this-feature checks). Migration is calibrated to current state.
- Be honest about cleanup track record. 'We never clean up' vs 'we clean up sometimes' shape the discipline design.
- List your specific flag needs precisely. Pure rollout-safety vs heavy A/B testing vs per-customer entitlement need different tools.
- Specify scale (users, deploys/day). Tooling differs at 1K users + weekly deploys vs 1M users + 10/day deploys.
- Mention budget honestly. Free tier vs $50/month vs $25K/yr LaunchDarkly are very different recommendations.
- Use the Cleanup Debt Mode variant if you have 100+ stale flags already — adds the systematic cleanup process before designing new flags.
Variants
Self-Hosted Mode
For DIY flag systems — emphasizes simple Postgres-backed implementation, no SaaS dependency.
LaunchDarkly Replacement Mode
For teams considering or migrating off LaunchDarkly — picks the right scale + alternative.
Cleanup Debt Mode
For codebases with 100+ stale flags — audits + builds the systematic cleanup process.
AI Experiment Mode
For teams running A/B tests on AI features — emphasizes proper experiment design + measurement.
Frequently asked questions
How do I use the Feature Flag System Architect prompt?
Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.
Which AI model works best with Feature Flag System Architect?
Claude Opus 4. Feature-flag design needs reasoning about deploy strategy, ops, and operational cost vs benefit — exactly Claude's strengths. ChatGPT GPT-5 second-best.
Can I customize the Feature Flag System Architect prompt for my use case?
Yes — every Promptolis Original is designed to be customized. Key levers: 4 flag types: release (gradual rollout), experiment (A/B test), ops (kill switch), permission (entitlement). Different lifecycles.; Naming convention from day 1. `release.new-checkout-flow` vs `experiment.pricing-test-q4` vs `ops.kill-switch-payments`. Prefix tells you lifecycle.
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals