⚡ Promptolis Original · AI Agents & Automation

🧠 Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory

The structured memory architecture for Claude agents that need to remember across sessions — covering the 4 memory types (episodic/semantic/procedural/working), vector-DB patterns, forgetting strategies, and the memory-retrieval discipline that prevents context explosion.

⏱️ 12 min to design + 3-7 days to implement 🤖 ~2 min in Claude 🗓️ Updated 2026-04-20

Why this is epic

Agents that don't remember are frustrating to users. Agents that remember too much hit context windows and cost explosions. This Original produces the structured memory architecture: 4 memory types with different retention policies, selective retrieval (not 'dump everything'), and forgetting strategies that keep memory useful over months of use.

Names the 4 memory types: EPISODIC (specific events/conversations), SEMANTIC (facts about the user/system), PROCEDURAL (learned workflows), WORKING (current-session context). Each has different storage + retrieval + forgetting characteristics. Most agent 'memory' systems conflate these and fail in specific predictable ways.

Produces the full stack: embedding model choice, vector DB selection (pgvector / Pinecone / Chroma / Weaviate trade-offs), indexing strategy, retrieval filters (recency + relevance + type), memory-writeback policy, and forgetting algorithm. Based on patterns from production agents with 6+ months of memory at Cognition, Mem, and other production agent companies.

The prompt

Promptolis Original · Copy-ready
<role> You are an AI agent memory architect with deep experience building persistent memory systems for production agents. You've designed memory architectures handling millions of memories across tens of thousands of users for agent products (personal assistants, support bots, research tools). You know the trade-offs between storage patterns (Postgres / vector DB / hybrid), retrieval strategies (recency / semantic / filtered), and forgetting algorithms (time-decay / irrelevance-decay / LRU). You are direct. You will name when memory is over-engineered, when vector DB is premature, when forgetting is missing, and when retrieval strategies will fail at scale. </role> <principles> 1. 4 memory types: episodic, semantic, procedural, working. Each has different storage + retrieval + forgetting. 2. Start simple. Postgres JSON for <1K memories. Vector DB only when scale demands. 3. Selective retrieval (3-10 per turn) beats dump-everything. 4. Writeback matters as much as retrieval. 5. Forgetting is a feature. Time-decay + irrelevance-decay. 6. Embed at write, not read. Cache embeddings. 7. Summarize conversations before embedding. Don't embed raw. 8. User-visible memory builds trust. Let users inspect + delete. </principles> <input> <agent-purpose>{what the agent does + needs memory for}</agent-purpose> <memory-duration>{session-only / days / months / years}</memory-duration> <user-scale>{single user / 100s / 10K+ / multi-tenant}</user-scale> <memory-volume-estimate>{memories per user per month}</memory-volume-estimate> <tech-stack>{language, existing infra}</tech-stack> <privacy-requirements>{PII handling, user access, retention}</privacy-requirements> <retrieval-patterns>{semantic search / recency / filtered / mix}</retrieval-patterns> <budget>{open-source only / can pay for managed services}</budget> </input> <output-format> # Memory Architecture: [Agent name] ## Memory Type Analysis Which of the 4 types your agent needs + why. ## Storage Design What lives where — Postgres / vector DB / cache. ## Episodic Memory Architecture Conversation/event storage + retrieval. ## Semantic Memory Architecture Facts about user/system + fast lookup. ## Procedural Memory Architecture Learned workflows + activation. ## Working Memory Architecture Current-session context management. ## Retrieval Strategy How memories get into prompts. ## Writeback Policy What gets stored, what gets updated, what gets discarded. ## Forgetting Algorithm Time-decay + irrelevance-decay + user-requested. ## User-Facing Memory UI What users can see + control. ## Cost + Latency Model Projected storage, embedding, retrieval costs. ## Implementation Roadmap Phased rollout. ## Key Takeaways 5 bullets. </output-format> <auto-intake> If input incomplete: ask for agent purpose, memory duration, user scale, memory volume, tech stack, privacy, retrieval patterns, budget. </auto-intake> Now, design:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
<agent-purpose>Personal productivity + journaling AI assistant. Users talk to it daily about their work, goals, challenges, learnings. It should remember conversations from days/weeks/months ago so when a user says 'how am I tracking on Q3 goals?' or 'what was I thinking about leadership two weeks ago?' it can pull relevant context and respond meaningfully. Think: Notion meets therapist.</agent-purpose>
<memory-duration>Years. Users will use this for 2-5 years ideally. Need long-term memory + selective forgetting.</memory-duration>
<user-scale>Currently 800 beta users. Target 10K paying users in 12 months. Each user's memory is isolated from others (no cross-user sharing).</user-scale>
<memory-volume-estimate>Average user: ~1 conversation per day, ~500 words, ~3 'facts to remember' extracted per conversation. Over 1 year: ~365 conversations, ~1,100 facts per user. Heavy users: 3-5x this volume.</memory-volume-estimate>
<tech-stack>Python (FastAPI backend), Postgres, currently no vector DB. Hosted on Fly.io. Using Claude Sonnet 4.5 as primary model.</tech-stack>
<privacy-requirements>Deeply personal content (career struggles, relationships, health). GDPR. Users must be able to see + delete any memory. No selling user data. No cross-user data leakage. Encryption at rest required.</privacy-requirements>
<retrieval-patterns>Mix: sometimes 'what did I say about X' (semantic search), sometimes 'what happened this week' (recency), sometimes 'am I consistent on this goal' (filtered by topic over time).</retrieval-patterns>
<budget>Bootstrapped startup. $500/mo max on memory infrastructure now. Can scale to $3-5K/mo at 10K users.</budget>
🤖 Output

Memory Architecture: Personal Productivity + Journaling Agent

An agent memory architecture produces the complete persistent-memory system covering 4 memory types (episodic/semantic/procedural/working), storage decisions, retrieval strategy, writeback policy, and forgetting algorithms. Based on production agent memory systems serving millions of memories, the critical design choice is NOT 'which vector DB' — it's 'which memory type needs what treatment.' Your use case (years-long personal memory, 10K-user target, deeply personal content, GDPR) benefits from a hybrid Postgres + pgvector approach — mature tech, single infrastructure, $0 additional spend initially, scales to your 10K target. This architecture produces the full design with per-memory-type strategies, selective retrieval (3-10 memories per turn), time + irrelevance decay forgetting, and user-visible memory UI.

Memory Type Analysis

Your agent needs all 4 types:

EPISODIC (specific conversations, days/events):

  • 'Last Tuesday you said you were struggling with your direct report'
  • 'The conversation about your Q3 goals from 2 weeks ago'
  • Primary memory type for journaling — every conversation is an episode

SEMANTIC (facts about the user):

  • 'You work as a product manager at [company]'
  • 'Your partner's name is [name]'
  • 'You're training for a half-marathon'
  • 'Your Q3 goal is to ship feature X'
  • Built up over time from conversations

PROCEDURAL (learned interaction patterns):

  • 'User prefers structured frameworks over open exploration'
  • 'User finds it helpful when I reference past conversations explicitly'
  • 'User's journaling is most productive in the morning'
  • Meta-patterns that shape how the agent responds

WORKING (current conversation):

  • Today's conversation context
  • Recent topic thread
  • Fetched relevant memories for this turn

Most personal-agent implementations conflate episodic + semantic. Separating them is essential for your case because they have different retrieval + forgetting characteristics.

Storage Design

Recommendation: Hybrid Postgres + pgvector (start). Upgrade path to dedicated vector DB only if needed.

Why hybrid Postgres:

  • Single infrastructure (you already have Postgres)
  • pgvector handles semantic search for <1M vectors easily
  • Transactional (updates + retrieval in one system)
  • Mature, well-understood, GDPR-compliant
  • Cost: your existing Postgres + minor storage increase

Storage tables:

-- Episodic: conversations
CREATE TABLE conversations (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  started_at TIMESTAMP,
  ended_at TIMESTAMP,
  summary TEXT,  -- Claude-generated ~200-word summary
  summary_embedding VECTOR(1536),
  full_transcript TEXT,  -- encrypted
  topics TEXT[],  -- extracted topics for filtering
  sentiment TEXT,
  INDEX ON user_id,
  INDEX ON summary_embedding USING ivfflat
);

-- Semantic: facts about the user
CREATE TABLE user_facts (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  category TEXT,  -- 'work', 'relationships', 'goals', 'health', etc.
  fact TEXT,  -- 'Works as PM at Acme Corp'
  source_conversation_id UUID,
  confidence FLOAT,  -- 0-1, higher = more confident
  first_mentioned TIMESTAMP,
  last_confirmed TIMESTAMP,
  supersedes_fact_id UUID,  -- chain for updated facts
  INDEX ON user_id, category
);

-- Procedural: learned patterns
CREATE TABLE user_patterns (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  pattern_type TEXT,  -- 'communication_style', 'time_of_day', 'topic_preference'
  pattern_description TEXT,
  observed_count INT,
  last_observed TIMESTAMP,
  INDEX ON user_id
);

-- Retrieval metadata
CREATE TABLE memory_access_log (
  memory_id UUID,
  user_id UUID,
  accessed_at TIMESTAMP,
  access_type TEXT,  -- 'retrieved' or 'mentioned_in_response'
  INDEX ON memory_id, accessed_at
);

Why not Pinecone/Chroma/Weaviate yet:

  • pgvector handles your scale (10K users × 1100 memories = 11M vectors — well within pgvector limits)
  • Adding separate vector DB = more ops, more cost, more privacy surface
  • Migrate later if you genuinely hit pgvector limits (unlikely in next 18 months)

Episodic Memory Architecture

Storage: conversations table

Writeback (after each conversation):

1. Full transcript encrypted, stored

2. Claude generates ~200-word summary

3. Summary embedded (OpenAI text-embedding-3-small, 1536 dims, $0.02/1M tokens)

4. Topics extracted via Claude (structured output: `['career', 'leadership', 'Q3-goals']`)

5. Sentiment estimated (useful for 'how has my mood been?' queries)

Retrieval strategies:

Recency-based: 'What did I talk about this week?' → WHERE started_at > NOW() - INTERVAL '7 days'

Semantic search: 'What have I said about my direct report?' → embedding search on summary_embedding + filter by user_id

Topic-filtered: 'How have I been tracking on Q3 goals?' → WHERE 'Q3-goals' = ANY(topics) ORDER BY started_at

Combined: 'What was I thinking about leadership 2 weeks ago?' → time range + topic filter + semantic rank

Retrieval size: top 3-5 conversation summaries per turn. NEVER dump all history into context.

Semantic Memory Architecture

Storage: user_facts table

Writeback (fact extraction):

After each conversation, Claude extracts structured facts:

[
  {"category": "work", "fact": "Working on launching feature X by end of Q3", "confidence": 0.9},
  {"category": "relationships", "fact": "Partner's name is Sarah", "confidence": 0.95},
  {"category": "goals", "fact": "Training for a half-marathon in October", "confidence": 0.85}
]

Fact reconciliation:

New fact may update or supersede existing fact:

  • 'Works at Acme Corp' (old) → 'Works at Beta Inc starting Nov 1' (new, supersedes)
  • Implement: supersedes_fact_id chain + last_confirmed timestamp refresh when fact reaffirmed

Retrieval strategy:

  • Always-fetch recent semantic facts for working memory (top 20 by last_confirmed + confidence)
  • Ensures Claude knows basic user context without searching
  • Fetched at conversation start, cached in session

Privacy: user can see all facts via memory UI. Can edit, delete, correct inaccuracies.

Procedural Memory Architecture

Storage: user_patterns table

Writeback (weekly batch job):

After a user has 20+ conversations, run pattern-extraction:

  • Time-of-day patterns (when do they journal? when are they most open?)
  • Communication style (structured vs. exploratory, verbose vs. terse)
  • Topic cadence (they always return to leadership on Mondays?)
  • Response preferences (what types of responses get engagement?)

Retrieval strategy:

  • Load top 5 patterns into context at start of each session
  • Used by Claude to adapt response style, not to execute logic

Example procedural memory influence:

  • Pattern: 'User responds better to 2-sentence responses than essays'
  • Effect: Claude's response-length tendency shifts for this user

Working Memory Architecture

Storage: in-memory (per-session), not persisted until conversation end

Contents:

  • Today's conversation (full transcript so far)
  • Retrieved memories for this turn
  • Active topic thread
  • Current sentiment

Management:

  • Context window: Claude Sonnet 4.5 = 200K tokens. Plenty of room.
  • Don't fill with history. Use retrieved-memories strategy (5-10 items).
  • Working memory closes at session end → episodic storage.

Retrieval Strategy

Per turn, retrieve 3-10 memories based on query classification:

async def retrieve_memories(user_id: str, query: str, session_context: dict):
    # Step 1: Always load semantic base (20 facts, cheap)
    base_facts = await load_user_facts(user_id, limit=20)
    
    # Step 2: Classify query type
    query_type = await classify_query(query)
    # 'recency' / 'semantic' / 'topic' / 'general'
    
    # Step 3: Retrieve episodic memories based on type
    if query_type == 'recency':
        conversations = await get_recent_conversations(user_id, days=7, limit=5)
    elif query_type == 'semantic':
        embedding = await embed(query)
        conversations = await semantic_search(user_id, embedding, limit=5)
    elif query_type == 'topic':
        topics = await extract_topics(query)
        conversations = await filter_by_topics(user_id, topics, limit=5)
    else:
        conversations = await get_recent_conversations(user_id, days=3, limit=3)
    
    # Step 4: Load procedural patterns
    patterns = await load_user_patterns(user_id, limit=5)
    
    return {
        'semantic_facts': base_facts,
        'episodic_memories': conversations,
        'procedural_patterns': patterns,
    }

Token budget per turn:

  • Base facts: ~1,500 tokens
  • Episodic memories: ~2,000 tokens (5 summaries × 400 tokens)
  • Patterns: ~500 tokens
  • Total memory context: ~4,000 tokens (2% of 200K context)
  • Leaves 196K tokens for conversation + response

Writeback Policy

After each conversation (async, via background job):

1. Store conversation: encrypted full transcript to `conversations` table

2. Generate summary: Claude summarizes to ~200 words

3. Embed summary: OpenAI text-embedding-3-small

4. Extract facts: Claude extracts structured facts with confidence scores

5. Reconcile facts:

- New fact matches existing → update last_confirmed, increment confidence

- New fact conflicts with existing → create supersession chain

- Novel fact → insert new

6. Extract patterns (weekly batch):

- Look at last 7 days of conversations

- Update user_patterns with observed patterns

Write-time cost:

  • Summary generation: ~$0.002 (500-word conversation → 200-word summary)
  • Embedding: ~$0.00002
  • Fact extraction: ~$0.003
  • Total per conversation: ~$0.005
  • At 10K users × 1 conversation/day: $50/day = $1,500/month in write costs

Forgetting Algorithm

Time-decay:

  • Conversations > 90 days old: retrieval priority × 0.5
  • Conversations > 365 days old: retrieval priority × 0.2 (but still accessible)
  • Facts > 6 months unconfirmed: confidence × 0.5

Irrelevance-decay:

  • Memories never retrieved in 180 days: move to cold storage (compressed, slow retrieve)
  • Facts with confidence < 0.3: archive

User-requested:

  • Users can delete any memory via UI
  • GDPR right-to-be-forgotten: full account deletion removes all user memories
  • Users can mark memories 'keep forever' (override decay)

Summarization compression:

  • Conversations > 1 year old: keep summary, compress full transcript
  • Save storage (full transcripts are 5-10x summary size)

User-Facing Memory UI

'Your Memory' page in the product:

1. Conversations list: chronological, searchable, deletable

2. Facts I know about you: categorized, editable, delete-able

3. Patterns I've noticed: shown with 'helpful / wrong / fine' feedback

4. Settings:

- Memory retention period (default 3 years, min 30 days, max forever)

- Topics to never remember (e.g., 'health')

- Delete all memories (full reset)

- Export memories (GDPR)

Transparency matters. Users trust agents that let them inspect. 'Secretly-remembers-everything' feels creepy.

Cost + Latency Model

At 10K users × 1 conversation/day:

Storage:

  • Conversations: ~500 words × 1100/user × 10K users = 5.5B words ≈ 35GB
  • Embeddings: 1536 dims × float32 × 11M embeddings = ~65GB
  • Facts + patterns: <5GB
  • Total: ~100GB Postgres storage. At Fly.io ~$75/month.

Compute:

  • Write costs: ~$1,500/month (summaries + embeddings + fact extraction)
  • Retrieval costs: ~$0 (pgvector query on existing Postgres)
  • Total: ~$1,500/month compute.

Latency:

  • Retrieval per turn: ~50ms (pgvector query + fact lookup)
  • User experience: negligible added latency

Total: ~$1,575/month at 10K users. Well within your $3-5K budget.

Implementation Roadmap

Phase 1 (Week 1-2): Foundation

  • Add pgvector extension to Postgres
  • Create schema (conversations, user_facts, user_patterns, memory_access_log)
  • Implement conversation save + summary generation
  • Basic recency-based retrieval

Phase 2 (Week 3-4): Semantic

  • Embedding integration (OpenAI text-embedding-3-small)
  • Semantic search retrieval
  • Fact extraction pipeline
  • Query classification + retrieval routing

Phase 3 (Week 5-6): Patterns + UI

  • Weekly batch pattern-extraction
  • 'Your Memory' UI page (see + edit + delete)
  • Time-decay implementation

Phase 4 (Week 7-8): Polish

  • Irrelevance-decay
  • Cold-storage migration for ancient memories
  • User settings (retention period, forbidden topics)
  • GDPR compliance finalization

Post-launch:

  • Monitor retrieval quality (are fetched memories actually relevant?)
  • Eval pipeline for memory-quality assessment
  • Scale decisions (if pgvector becomes bottleneck, migrate to Pinecone/Chroma)

Key Takeaways

  • 4 memory types (episodic/semantic/procedural/working) with different storage + retrieval + forgetting. Don't conflate them.
  • Hybrid Postgres + pgvector is correct for your scale. Don't add separate vector DB until genuinely needed (>1M vectors per user OR performance issues).
  • Selective retrieval (3-10 memories per turn), not dump-everything. Classify query type, route to appropriate retrieval strategy.
  • Forgetting is essential. Time-decay + irrelevance-decay + user-controlled retention. Without forgetting, memory grows linearly and becomes useless.
  • User-visible memory UI is a trust feature, not optional. 'Your Memory' page with inspect/edit/delete is table stakes for an agent handling deeply personal content.

Common use cases

  • Agent developers building personal AI assistants that remember user preferences
  • Customer-support agents that need to remember ticket history across contacts
  • Coaching/therapy-style agents that build long-term user models
  • Productivity agents that learn user workflows over time
  • Knowledge-work agents (research, writing, strategy) that remember project context
  • Multi-session technical support agents
  • Sales agents that track prospect interactions across weeks
  • Educational tutors that remember student progress
  • Platform teams designing memory infrastructure for multiple agents

Best AI model for this

Claude Opus 4 or Sonnet 4.5. Memory architecture requires reasoning about retrieval, embeddings, cost, and retention simultaneously. Top-tier reasoning matters.

Pro tips

  • DON'T start with a vector database. Start with structured storage (Postgres JSON columns) for the first 100 memories. Only add vector DB when you have >1,000 memories AND need semantic retrieval. Over-engineering memory is the #1 mistake.
  • The 4 memory types need different retrieval strategies. Episodic = retrieve by recency + similarity. Semantic = always-available fast lookup. Procedural = conditional activation. Working = full-context for current session. Don't use one-size-fits-all retrieval.
  • Selective retrieval beats dump-everything. Instead of 'give Claude all 500 memories and let it sort,' retrieve the 3-10 most relevant per turn. This is cheaper, faster, and produces better outputs.
  • Memory WRITEBACK matters as much as retrieval. After each interaction: what should be stored as new memory? What existing memory should be updated? What should be forgotten? This 'write' step is where most systems leak context.
  • Forgetting is a feature, not a bug. Humans forget — agents should too. Implement time-decay (older memories get lower retrieval priority) + irrelevance-decay (memories never retrieved for 30+ days get archived). Without forgetting, memory grows linearly and becomes useless.
  • Embed at write-time, not read-time. Embeddings are cheap (~$0.00002/1K tokens for OpenAI ada-002) but latency matters. Batch-embed memories on write; fetch cached embeddings on read.
  • Don't embed entire conversations. Embed SUMMARIES of conversations (Claude can summarize to ~200 tokens). Search against summaries; fetch full conversations when needed. Dramatically cheaper storage + faster search.
  • User-visible memory UI is a trust feature. Let users see + delete their stored memories. Agents that secretly remember everything feel creepy. Memory users can inspect feels helpful.

Customization tips

  • Start with episodic memory only. Semantic + procedural can come in phase 2. Premature memory-type-splitting is over-engineering for early-stage products.
  • Test retrieval quality early. Sample 20 real user queries, run through retrieval, manually rate: is the fetched memory actually relevant? Iterate on classifier + search strategy until hit-rate >80%.
  • For deeply personal content (your use case), encryption-at-rest is non-negotiable. Use Postgres column-level encryption for `full_transcript` field. Summary can be plaintext for search.
  • Version your memory schema. Changes to how you store memories (e.g., adding embedding column, renaming fields) will happen. Migration strategies should be first-class from day 1.
  • If users push back on retention limits ('I want ALL my memories forever'), offer a 'forever tier' at higher price. Storage + embedding costs scale with retention — charge appropriately.

Variants

Personal Assistant Mode

For personal AI agents remembering user preferences, habits, tasks over years. Emphasizes semantic memory + procedural learning.

Support Agent Mode

For customer-support agents remembering ticket history + user-specific context. Emphasizes episodic memory + user-profile semantic.

Research/Knowledge Agent Mode

For research agents with long project arcs. Emphasizes semantic memory + procedural workflow memory.

Multi-User Platform Mode

For agents serving many users with isolated memory per user. Emphasizes tenant isolation + per-user quotas.

Frequently asked questions

How do I use the Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory?

Claude Opus 4 or Sonnet 4.5. Memory architecture requires reasoning about retrieval, embeddings, cost, and retention simultaneously. Top-tier reasoning matters.

Can I customize the Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: DON'T start with a vector database. Start with structured storage (Postgres JSON columns) for the first 100 memories. Only add vector DB when you have >1,000 memories AND need semantic retrieval. Over-engineering memory is the #1 mistake.; The 4 memory types need different retrieval strategies. Episodic = retrieve by recency + similarity. Semantic = always-available fast lookup. Procedural = conditional activation. Working = full-context for current session. Don't use one-size-fits-all retrieval.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals