⚡ Promptolis Original · Coding & Development

🚩 Feature Flag System Architect

Q: How do I use the Feature Flag System Architect prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Q: Which AI model works best with Feature Flag System Architect?

Claude Opus 4. Feature-flag design needs reasoning about deploy strategy, ops, and operational cost vs benefit — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Q: Can I customize the Feature Flag System Architect prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: 4 flag types: release (gradual rollout), experiment (A/B test), ops (kill switch), permission (entitlement). Different lifecycles.; Naming convention from day 1. `release.new-checkout-flow` vs `experiment.pricing-test-q4` vs `ops.kill-switch-payments`. Prefix tells you lifecycle.

Designs your feature-flag system: which flags vs configs, percentage rollout vs targeting, the cleanup process that prevents 200-stale-flag debt — without LaunchDarkly's $25K/yr bill if you don't need it.

⏱️ 5 min to set up 🤖 ~110 seconds in Claude 🗓️ Updated 2026-04-28

Why this is epic

Most feature-flag systems become flag graveyards: 200 flags in production, half of which nobody knows what they do, ship-with-flag-on-permanently leakage. This Original designs the system + the cleanup discipline.

Outputs the complete strategy: flag taxonomy (release flags vs experiment flags vs ops flags vs permission flags), naming conventions, percentage-rollout patterns, default values, cleanup process, and the right tool for your scale.

Calls out when you DON'T need a paid feature-flag service. LaunchDarkly is $25K/yr; for small teams, simple env-var configs or PostHog free tier is plenty. The Original tells you which fits.

Calibrated to 2026 reality: feature flags as the standard rollout pattern, the rise of per-user targeting via segments, the cleanup-debt crisis (most companies have 100+ stale flags), AI-flagged experiments.

The prompt

Promptolis Original · Copy-ready

<role> You are a feature-flag system architect with 6+ years building flag systems for SaaS, fintech, OSS. You have shipped flag systems handling 1000s of flags + helped 20+ teams clean up flag debt. You know which flag patterns scale + which become technical debt. You are direct. You will tell a builder LaunchDarkly is overkill at their scale, that their flag without an owner is a debt timebomb, or that they need a feature-flag system, not env vars, for percentage rollout. You refuse to recommend complex flag infra when env vars suffice. </role> <principles> 1. 4 flag types. Different lifecycles. 2. Naming convention with prefix from day 1. 3. Safe default on flag-fetch failure. 4. Cleanup discipline. Flags age out. 5. Percentage rollout hashes on stable user_id. 6. Permanent feature differences = permissions, not flags. 7. Test BOTH flag states in CI. </principles> <input> <team-size>{engineers}</team-size> <flag-need>{rollout safety / A/B testing / kill switches / per-customer features / 'recommend'}</flag-need> <scale>{users, requests/sec, deployments/day}</scale> <existing-state>{nothing / env vars / homebrew / LaunchDarkly / mature with debt}</existing-state> <budget-tolerance>{strict / moderate / can-pay-for-saas}</budget-tolerance> <infrastructure>{stack, available tools}</infrastructure> <rollout-needs>{percentage rollout / per-user / per-segment / time-based / 'recommend'}</rollout-needs> <deployment-pattern>{Continuous deploy / weekly / monthly}</deployment-pattern> <cleanup-track-record>{do you currently clean up flags? Or accumulate?}</cleanup-track-record> </input> <output-format> # Feature Flag System: [scope] ## Suitability Check Which tool for your scale + budget. Honest about whether you need this complexity. ## Flag Taxonomy The 4 types + when to use each. Examples for YOUR codebase. ## Naming Conventions The specific prefix system. Examples. ## Implementation Architecture Flag storage, evaluation, fallback. Specific to your stack. ## Per-Type Lifecycle For each flag type: max age, cleanup trigger, who owns. ## Percentage Rollout Strategy Hashing approach. Stable bucketing. Defaults. ## Targeting + Segmentation If needed: how to target by user attributes. ## Cleanup Discipline The specific process. Tools to enforce. Default settings. ## Implementation Skeleton File structure + code patterns. ## Testing Strategy Both-states-tested approach. CI integration. ## Migration from Existing If chaos: cleanup + transition plan. ## What This System Won't Solve Honest limits. ## Maintenance Cadence Weekly + quarterly. ## Key Takeaways 4-6 bullets — for the team's playbook. </output-format> <auto-intake> If input incomplete: ask for team size, flag need, scale, existing state, budget, infrastructure, rollout needs, deployment pattern, cleanup track record. </auto-intake> Now, design the feature-flag system:

0 copies

🚀 Open in ChatGPT ✨ Open in Claude 💎 Open in Gemini

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input

<team-size>4 backend, 1 frontend</team-size>
<flag-need>Mostly rollout safety (gradual rollout of new features, kill switches for risky changes). Occasional A/B test (3-4/year). Some per-customer flags (early-access features for specific accounts).</flag-need>
<scale>~5K customers, ~80K users, ~200 req/sec average. Multiple deploys per day.</scale>
<existing-state>Nothing systematic. We have ~12 env vars masquerading as flags (FEATURE_X=true). One ad-hoc 'admin can toggle' check via DB lookup.</existing-state>
<budget-tolerance>Moderate. Not paying $25K/yr for LaunchDarkly. Open to ~$50/month tools.</budget-tolerance>
<infrastructure>Next.js 14 + tRPC + Postgres + Redis. Vercel deploys.</infrastructure>
<rollout-needs>Percentage rollout for risky features (start at 1%, ramp to 100%). Per-customer for early-access. Kill switch is critical.</rollout-needs>
<deployment-pattern>Continuous deploy. ~5 deploys/day average.</deployment-pattern>
<cleanup-track-record>Currently nobody cleans up. The 12 env-var 'flags' include 4 that nobody remembers what they do.</cleanup-track-record>

🤖 Output

Feature Flag System: B2B SaaS Mid-Scale

Suitability Check

You need a feature-flag system. Env vars don't support percentage rollout, per-customer targeting, or kill switches at runtime.

Tool recommendations for your scale + budget:

1. PostHog (free tier covers your scale). PostHog's feature flag system is solid + free at your size. Built-in A/B testing. Percentage rollout. Per-user targeting. ~$0/month at 80K users + ~10K events/day.

2. Self-hosted Postgres-backed flags (DIY). ~$0/month, 1 week to build. Good if you want full control + no SaaS dependency.

3. GrowthBook (free open-source, paid hosted at $19/user). Strong for A/B testing. Solid for rollouts. ~$95/month for 5 users.

4. Statsig (free tier <1M events/month). Strong A/B testing focus. Alternative to PostHog.

LaunchDarkly is overkill. Their $25K/yr buys enterprise features (SSO, audit log, compliance attestations) you don't need at this stage.

Recommendation: PostHog. You may already use it for analytics; flags are bundled. Free tier sufficient. If not on PostHog already, GrowthBook self-hosted is the cheap second-best.

Flag Taxonomy

1. Release flags (gradual rollout of new features)

Lifecycle: created → 1% → 25% → 50% → 100% → REMOVED (within 60 days of 100%)
Owner: feature engineer
Examples: release.new-checkout-flow, release.improved-search-ranking

2. Experiment flags (A/B tests)

Lifecycle: created → measured for 2-4 weeks → conclusion → REMOVED (within 2 weeks of decision)
Owner: PM + engineer
Examples: experiment.pricing-page-redesign-q4, experiment.email-subject-line

3. Ops flags (kill switches)

Lifecycle: created → live indefinitely (operational tool)
Owner: tech lead / on-call
Examples: ops.kill-switch-payments, ops.disable-ai-summarization, ops.maintenance-mode

4. Permission flags (per-customer entitlement / early access)

Lifecycle: per-customer; lasts as long as the entitlement does
Owner: account exec / customer success
Examples: permission.early-access-customer-acme, permission.beta-program

Note: permanent tier differences (Free vs Pro features) are NOT flags. Use a subscription.tier check instead. Flags are for time-limited or customer-specific overrides.

Naming Conventions

<type>.<name>

release.new-checkout-flow
release.improved-search-ranking-2026q2
experiment.pricing-page-redesign
experiment.email-subject-line-test
ops.kill-switch-payments
ops.disable-ai-summarization
permission.early-access-customer-acme
permission.beta-program-q2-2026

Rules:

Lowercase, hyphenated
First segment is type prefix (lifecycle signal)
Second segment describes purpose
Optional date suffix for time-bounded experiments

Reject any flag without a prefix. Add lint rule.

Implementation Architecture (PostHog-based)

// /lib/flags/index.ts
import { PostHog } from 'posthog-node';

const posthog = new PostHog(process.env.POSTHOG_API_KEY!, {
  host: 'https://us.i.posthog.com',
});

export async function getFlag(
  flagKey: string,
  context: { userId: string; customerId?: string; userProperties?: Record<string, any> }
): Promise<boolean | string> {
  try {
    const value = await posthog.getFeatureFlag(flagKey, context.userId, {
      groups: context.customerId ? { customer: context.customerId } : undefined,
      personProperties: context.userProperties,
    });
    return value;
  } catch (e) {
    // SAFE DEFAULT on failure
    logger.warn({ flagKey, error: e }, 'Feature flag fetch failed');
    return getSafeDefault(flagKey);
  }
}

function getSafeDefault(flagKey: string): boolean {
  // Convention: flags default to OFF unless explicitly registered
  const SAFE_ON_FLAGS = new Set([
    // Flags that are already at 100% rollout — shouldn't fail-off
    // Track in a registry; review monthly
  ]);
  return SAFE_ON_FLAGS.has(flagKey);
}

Caching: PostHog SDK caches flag values per-user with sensible defaults. For ultra-low-latency reads, cache in app-memory (5min TTL).

Per-Type Lifecycle

Release flags

Created with explicit retirement date in metadata (default: 60 days from creation)
Status progression: 1% → 25% → 50% → 100%
After 30 days at 100%: automated reminder to remove
After 60 days at 100%: PR mandatory to remove or extend

Experiment flags

Created with experiment-end date
After end date: 7-day grace to make decision
After decision: 14-day grace to remove from code
After 30 days post-decision: blocking PR to remove

Ops flags

Live indefinitely
Quarterly audit: still valid? Owner still here?
Document purpose + how to flip

Permission flags

Tied to customer/account record
When customer's entitlement ends: flag value flips to false naturally
Cleanup when no customer has the entitlement for 90+ days

Percentage Rollout Strategy

Hash on userId (stable). Same user always sees same variant.

// Pseudo-implementation (PostHog handles this internally)
function shouldRollout(userId: string, percentage: number): boolean {
  const hash = sha256(`${userId}:${flagKey}`);
  const bucket = parseInt(hash.slice(0, 8), 16) % 100;
  return bucket < percentage;
}

Why hash on userId, not request: consistent experience per user. If they refresh, same flag state.

Standard ramp: 1% → 5% → 10% → 25% → 50% → 100%. Hold at each for at least 24h to verify metrics.

Rollout abort: if metrics regress at 5%, drop back to 1% or 0%. Don't proceed.

Targeting + Segmentation

For per-customer early access:

// PostHog: target by customer property
await getFlag('permission.early-access-feature-x', {
  userId: user.id,
  customerId: user.customerId,
  userProperties: {
    customer_plan_tier: user.plan,
    is_early_access: user.customer.earlyAccess === true,
  },
});

// In PostHog UI: target users where is_early_access = true

For segment-based experiments:

// Target only paying customers in a specific tier
// PostHog UI: 'plan_tier in [growth, scale]' AND 'signup_date_after 2026-01-01'

Cleanup Discipline

The hard part. Most teams skip cleanup; you must not.

Process

1. Every flag creation includes `expectedRetirement: <date>` in PR description

2. Weekly cron: query flags whose retirement date passed without action → Slack alert to owner

3. Monthly: zero-tolerance review of ops + permission flags (still valid?)

4. Quarterly: audit flag count. Target <20 active flags at any time.

Code-level cleanup

For each flag retired:

// Before:
if (await getFlag('release.new-checkout-flow', context)) {
  return newCheckoutFlow();
} else {
  return oldCheckoutFlow();
}

// After (cleanup):
return newCheckoutFlow();
// Delete oldCheckoutFlow function entirely
// Remove flag from PostHog

Lint rule

// .eslint-flag-rules.ts (custom rule)
// Detect getFlag() calls; warn if flag's expectedRetirement is past

This can be a check that runs on PR — block merging if a flag is past its retirement date and the PR doesn't clean it up.

Implementation Skeleton

/lib/flags/
  client.ts                  (PostHog client setup)
  get-flag.ts                (main getFlag function with safe defaults)
  registry.ts                (registered flags + their metadata)
  test-helpers.ts            (mockFlag for tests)
  cleanup-monitor.ts         (script that runs weekly, alerts on stale flags)

/lib/flags/registry.ts:
export const FLAGS = {
  'release.new-checkout-flow': {
    type: 'release',
    owner: '@sarah',
    description: 'Switches to redesigned checkout flow',
    createdAt: '2026-04-01',
    expectedRetirement: '2026-06-01',
  },
  'ops.kill-switch-payments': {
    type: 'ops',
    owner: '@oncall',
    description: 'Emergency kill switch for payments. Set to true to disable all payment processing.',
    createdAt: '2025-01-01',
    // No retirement (ops flag — lives indefinitely)
  },
  // ... all flags registered here
};

Why register flags in code: single source of truth. Remove from PostHog without removing from registry → CI fails. Forces cleanup discipline.

Testing Strategy

For active experiments:

// /tests/checkout.test.ts
import { mockFlag } from '@/lib/flags/test-helpers';

describe('Checkout', () => {
  describe('with new-checkout-flow flag ON', () => {
    beforeEach(() => mockFlag('release.new-checkout-flow', true));
    it('uses new flow', async () => { /* ... */ });
  });
  
  describe('with new-checkout-flow flag OFF', () => {
    beforeEach(() => mockFlag('release.new-checkout-flow', false));
    it('uses old flow', async () => { /* ... */ });
  });
});

CI runs both states. Active flags get duplicate test runs. Once flag is retired, the OFF tests get deleted along with the OFF code path.

Migration from Existing

Week 1: Audit current 12 env-var flags

For each: what does it do? Who owns? Is it a real flag or a config?
4 nobody remembers → kill them. Zero risk for unused features; if they ARE used, the audit reveals it.

Week 2: Set up PostHog flags

Create PostHog account (or use existing)
Migrate 'real' flags from env vars to PostHog
Start with non-critical flags; verify behavior

Week 3: Migrate kill switches

Convert env-var-based kill switches to PostHog flags
Test the kill switches actually work (no regressions)

Week 4: Build registry + cleanup tooling

Code-level registry (the constants file)
Weekly cron alerting on stale flags
Lint rule for unprefixed flag names

Week 5: Document + train team

Wiki page on the flag system
Show team the lifecycle expectations
Code-review checklist

What This System Won't Solve

Won't replace permission/entitlement system. Permanent feature differences should NOT be flags.
Won't auto-enforce cleanup if humans skip it. The discipline matters; tooling helps but can't replace will.
Won't help if owners leave the company. Establish ownership transfer when engineers leave.
Won't compensate for poor experiment design. A/B tests need proper hypothesis + sample size + statistical analysis. Flags are the delivery mechanism.
Won't scale to 1000s of customers needing per-customer flags. Per-customer flags work at your 5K-customer scale; at 50K customers, the registry approach is unwieldy.

Maintenance Cadence

Weekly:

Cron alert: any flag past retirement date?
Quick scan: any new flags added without registration?

Monthly:

Active flag count audit (target: <20)
Ops flag review (still valid?)
Permission flag review (still in use?)

Quarterly:

Full flag-system audit
Tooling review (PostHog still right? Add features?)
Team training refresh

Annually:

Re-evaluate vs SaaS alternatives
Cost/value review

Key Takeaways

PostHog free tier or self-hosted PG-backed flags. LaunchDarkly is overkill at your scale.
4 flag types with prefix naming: release., experiment., ops., permission.. Prefix tells the lifecycle.
Cleanup discipline from day 1. Without it, you accumulate 100+ stale flags within 2 years.
Code-level registry as single source of truth. Forces the cleanup; tooling enforces.
Test BOTH states for active experiments. CI runs duplicate. Cleanup deletes the off-state when flag retires.
Hash percentage rollout on userId. Same user, same variant, every time. Gradual ramp 1% → 100%.

Common use cases

Engineer adding feature flags to a codebase that's never had them
Tech lead establishing flag-cleanup discipline in a flag-graveyard codebase
Founder evaluating LaunchDarkly vs open-source vs DIY flag system
Backend engineer designing flag system that handles per-user targeting + percentage rollout
DevOps consolidating multiple ad-hoc flag systems into unified infrastructure
Engineer building safe-rollout pattern for risky changes (DB migration switches, new pricing logic)

Best AI model for this

Claude Opus 4. Feature-flag design needs reasoning about deploy strategy, ops, and operational cost vs benefit — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Pro tips

4 flag types: release (gradual rollout), experiment (A/B test), ops (kill switch), permission (entitlement). Different lifecycles.
Naming convention from day 1. `release.new-checkout-flow` vs `experiment.pricing-test-q4` vs `ops.kill-switch-payments`. Prefix tells you lifecycle.
Default values matter. Flag-fetch-fail → use safe default (typically 'feature off'). Don't break.
Cleanup is the discipline that distinguishes mature systems. Track flag age. Auto-alert at 90 days.
Percentage rollouts hash on STABLE user attribute (user_id), not request. Same user always sees same variant.
Don't use feature flags for permanent feature differences (free vs paid tier). That's a permission/entitlement check.
Test both flag states. CI runs tests with flag on AND off for active experiments.

Customization tips

Specify ALL flag-like patterns in your codebase (env vars used as flags, DB-toggle-this-feature checks). Migration is calibrated to current state.
Be honest about cleanup track record. 'We never clean up' vs 'we clean up sometimes' shape the discipline design.
List your specific flag needs precisely. Pure rollout-safety vs heavy A/B testing vs per-customer entitlement need different tools.
Specify scale (users, deploys/day). Tooling differs at 1K users + weekly deploys vs 1M users + 10/day deploys.
Mention budget honestly. Free tier vs $50/month vs $25K/yr LaunchDarkly are very different recommendations.
Use the Cleanup Debt Mode variant if you have 100+ stale flags already — adds the systematic cleanup process before designing new flags.

Variants

Self-Hosted Mode

For DIY flag systems — emphasizes simple Postgres-backed implementation, no SaaS dependency.

LaunchDarkly Replacement Mode

For teams considering or migrating off LaunchDarkly — picks the right scale + alternative.

Cleanup Debt Mode

For codebases with 100+ stale flags — audits + builds the systematic cleanup process.

AI Experiment Mode

For teams running A/B tests on AI features — emphasizes proper experiment design + measurement.

Frequently asked questions

How do I use the Feature Flag System Architect prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Feature Flag System Architect?

Claude Opus 4. Feature-flag design needs reasoning about deploy strategy, ops, and operational cost vs benefit — exactly Claude's strengths. ChatGPT GPT-5 second-best.

Can I customize the Feature Flag System Architect prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: 4 flag types: release (gradual rollout), experiment (A/B test), ops (kill switch), permission (entitlement). Different lifecycles.; Naming convention from day 1. `release.new-checkout-flow` vs `experiment.pricing-test-q4` vs `ops.kill-switch-payments`. Prefix tells you lifecycle.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals

Curated by

Promptolis Editorial

Every Promptolis Original is hand-crafted and reviewed before publishing — built from scratch for 2026-grade LLMs.

Last reviewed on 2026-04-28 · About Promptolis