⚡ Promptolis Original · AI Agents & Automation
🧠 Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory
The structured memory architecture for Claude agents that need to remember across sessions — covering the 4 memory types (episodic/semantic/procedural/working), vector-DB patterns, forgetting strategies, and the memory-retrieval discipline that prevents context explosion.
Why this is epic
Agents that don't remember are frustrating to users. Agents that remember too much hit context windows and cost explosions. This Original produces the structured memory architecture: 4 memory types with different retention policies, selective retrieval (not 'dump everything'), and forgetting strategies that keep memory useful over months of use.
Names the 4 memory types: EPISODIC (specific events/conversations), SEMANTIC (facts about the user/system), PROCEDURAL (learned workflows), WORKING (current-session context). Each has different storage + retrieval + forgetting characteristics. Most agent 'memory' systems conflate these and fail in specific predictable ways.
Produces the full stack: embedding model choice, vector DB selection (pgvector / Pinecone / Chroma / Weaviate trade-offs), indexing strategy, retrieval filters (recency + relevance + type), memory-writeback policy, and forgetting algorithm. Based on patterns from production agents with 6+ months of memory at Cognition, Mem, and other production agent companies.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
<agent-purpose>Personal productivity + journaling AI assistant. Users talk to it daily about their work, goals, challenges, learnings. It should remember conversations from days/weeks/months ago so when a user says 'how am I tracking on Q3 goals?' or 'what was I thinking about leadership two weeks ago?' it can pull relevant context and respond meaningfully. Think: Notion meets therapist.</agent-purpose> <memory-duration>Years. Users will use this for 2-5 years ideally. Need long-term memory + selective forgetting.</memory-duration> <user-scale>Currently 800 beta users. Target 10K paying users in 12 months. Each user's memory is isolated from others (no cross-user sharing).</user-scale> <memory-volume-estimate>Average user: ~1 conversation per day, ~500 words, ~3 'facts to remember' extracted per conversation. Over 1 year: ~365 conversations, ~1,100 facts per user. Heavy users: 3-5x this volume.</memory-volume-estimate> <tech-stack>Python (FastAPI backend), Postgres, currently no vector DB. Hosted on Fly.io. Using Claude Sonnet 4.5 as primary model.</tech-stack> <privacy-requirements>Deeply personal content (career struggles, relationships, health). GDPR. Users must be able to see + delete any memory. No selling user data. No cross-user data leakage. Encryption at rest required.</privacy-requirements> <retrieval-patterns>Mix: sometimes 'what did I say about X' (semantic search), sometimes 'what happened this week' (recency), sometimes 'am I consistent on this goal' (filtered by topic over time).</retrieval-patterns> <budget>Bootstrapped startup. $500/mo max on memory infrastructure now. Can scale to $3-5K/mo at 10K users.</budget>
Memory Architecture: Personal Productivity + Journaling Agent
An agent memory architecture produces the complete persistent-memory system covering 4 memory types (episodic/semantic/procedural/working), storage decisions, retrieval strategy, writeback policy, and forgetting algorithms. Based on production agent memory systems serving millions of memories, the critical design choice is NOT 'which vector DB' — it's 'which memory type needs what treatment.' Your use case (years-long personal memory, 10K-user target, deeply personal content, GDPR) benefits from a hybrid Postgres + pgvector approach — mature tech, single infrastructure, $0 additional spend initially, scales to your 10K target. This architecture produces the full design with per-memory-type strategies, selective retrieval (3-10 memories per turn), time + irrelevance decay forgetting, and user-visible memory UI.
Memory Type Analysis
Your agent needs all 4 types:
EPISODIC (specific conversations, days/events):
- 'Last Tuesday you said you were struggling with your direct report'
- 'The conversation about your Q3 goals from 2 weeks ago'
- Primary memory type for journaling — every conversation is an episode
SEMANTIC (facts about the user):
- 'You work as a product manager at [company]'
- 'Your partner's name is [name]'
- 'You're training for a half-marathon'
- 'Your Q3 goal is to ship feature X'
- Built up over time from conversations
PROCEDURAL (learned interaction patterns):
- 'User prefers structured frameworks over open exploration'
- 'User finds it helpful when I reference past conversations explicitly'
- 'User's journaling is most productive in the morning'
- Meta-patterns that shape how the agent responds
WORKING (current conversation):
- Today's conversation context
- Recent topic thread
- Fetched relevant memories for this turn
Most personal-agent implementations conflate episodic + semantic. Separating them is essential for your case because they have different retrieval + forgetting characteristics.
Storage Design
Recommendation: Hybrid Postgres + pgvector (start). Upgrade path to dedicated vector DB only if needed.
Why hybrid Postgres:
- Single infrastructure (you already have Postgres)
- pgvector handles semantic search for <1M vectors easily
- Transactional (updates + retrieval in one system)
- Mature, well-understood, GDPR-compliant
- Cost: your existing Postgres + minor storage increase
Storage tables:
-- Episodic: conversations
CREATE TABLE conversations (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
started_at TIMESTAMP,
ended_at TIMESTAMP,
summary TEXT, -- Claude-generated ~200-word summary
summary_embedding VECTOR(1536),
full_transcript TEXT, -- encrypted
topics TEXT[], -- extracted topics for filtering
sentiment TEXT,
INDEX ON user_id,
INDEX ON summary_embedding USING ivfflat
);
-- Semantic: facts about the user
CREATE TABLE user_facts (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
category TEXT, -- 'work', 'relationships', 'goals', 'health', etc.
fact TEXT, -- 'Works as PM at Acme Corp'
source_conversation_id UUID,
confidence FLOAT, -- 0-1, higher = more confident
first_mentioned TIMESTAMP,
last_confirmed TIMESTAMP,
supersedes_fact_id UUID, -- chain for updated facts
INDEX ON user_id, category
);
-- Procedural: learned patterns
CREATE TABLE user_patterns (
id UUID PRIMARY KEY,
user_id UUID NOT NULL,
pattern_type TEXT, -- 'communication_style', 'time_of_day', 'topic_preference'
pattern_description TEXT,
observed_count INT,
last_observed TIMESTAMP,
INDEX ON user_id
);
-- Retrieval metadata
CREATE TABLE memory_access_log (
memory_id UUID,
user_id UUID,
accessed_at TIMESTAMP,
access_type TEXT, -- 'retrieved' or 'mentioned_in_response'
INDEX ON memory_id, accessed_at
);
Why not Pinecone/Chroma/Weaviate yet:
- pgvector handles your scale (10K users × 1100 memories = 11M vectors — well within pgvector limits)
- Adding separate vector DB = more ops, more cost, more privacy surface
- Migrate later if you genuinely hit pgvector limits (unlikely in next 18 months)
Episodic Memory Architecture
Storage: conversations table
Writeback (after each conversation):
1. Full transcript encrypted, stored
2. Claude generates ~200-word summary
3. Summary embedded (OpenAI text-embedding-3-small, 1536 dims, $0.02/1M tokens)
4. Topics extracted via Claude (structured output: `['career', 'leadership', 'Q3-goals']`)
5. Sentiment estimated (useful for 'how has my mood been?' queries)
Retrieval strategies:
Recency-based: 'What did I talk about this week?' → WHERE started_at > NOW() - INTERVAL '7 days'
Semantic search: 'What have I said about my direct report?' → embedding search on summary_embedding + filter by user_id
Topic-filtered: 'How have I been tracking on Q3 goals?' → WHERE 'Q3-goals' = ANY(topics) ORDER BY started_at
Combined: 'What was I thinking about leadership 2 weeks ago?' → time range + topic filter + semantic rank
Retrieval size: top 3-5 conversation summaries per turn. NEVER dump all history into context.
Semantic Memory Architecture
Storage: user_facts table
Writeback (fact extraction):
After each conversation, Claude extracts structured facts:
[
{"category": "work", "fact": "Working on launching feature X by end of Q3", "confidence": 0.9},
{"category": "relationships", "fact": "Partner's name is Sarah", "confidence": 0.95},
{"category": "goals", "fact": "Training for a half-marathon in October", "confidence": 0.85}
]
Fact reconciliation:
New fact may update or supersede existing fact:
- 'Works at Acme Corp' (old) → 'Works at Beta Inc starting Nov 1' (new, supersedes)
- Implement:
supersedes_fact_idchain +last_confirmedtimestamp refresh when fact reaffirmed
Retrieval strategy:
- Always-fetch recent semantic facts for working memory (top 20 by
last_confirmed+confidence) - Ensures Claude knows basic user context without searching
- Fetched at conversation start, cached in session
Privacy: user can see all facts via memory UI. Can edit, delete, correct inaccuracies.
Procedural Memory Architecture
Storage: user_patterns table
Writeback (weekly batch job):
After a user has 20+ conversations, run pattern-extraction:
- Time-of-day patterns (when do they journal? when are they most open?)
- Communication style (structured vs. exploratory, verbose vs. terse)
- Topic cadence (they always return to leadership on Mondays?)
- Response preferences (what types of responses get engagement?)
Retrieval strategy:
- Load top 5 patterns into context at start of each session
- Used by Claude to adapt response style, not to execute logic
Example procedural memory influence:
- Pattern: 'User responds better to 2-sentence responses than essays'
- Effect: Claude's response-length tendency shifts for this user
Working Memory Architecture
Storage: in-memory (per-session), not persisted until conversation end
Contents:
- Today's conversation (full transcript so far)
- Retrieved memories for this turn
- Active topic thread
- Current sentiment
Management:
- Context window: Claude Sonnet 4.5 = 200K tokens. Plenty of room.
- Don't fill with history. Use retrieved-memories strategy (5-10 items).
- Working memory closes at session end → episodic storage.
Retrieval Strategy
Per turn, retrieve 3-10 memories based on query classification:
async def retrieve_memories(user_id: str, query: str, session_context: dict):
# Step 1: Always load semantic base (20 facts, cheap)
base_facts = await load_user_facts(user_id, limit=20)
# Step 2: Classify query type
query_type = await classify_query(query)
# 'recency' / 'semantic' / 'topic' / 'general'
# Step 3: Retrieve episodic memories based on type
if query_type == 'recency':
conversations = await get_recent_conversations(user_id, days=7, limit=5)
elif query_type == 'semantic':
embedding = await embed(query)
conversations = await semantic_search(user_id, embedding, limit=5)
elif query_type == 'topic':
topics = await extract_topics(query)
conversations = await filter_by_topics(user_id, topics, limit=5)
else:
conversations = await get_recent_conversations(user_id, days=3, limit=3)
# Step 4: Load procedural patterns
patterns = await load_user_patterns(user_id, limit=5)
return {
'semantic_facts': base_facts,
'episodic_memories': conversations,
'procedural_patterns': patterns,
}
Token budget per turn:
- Base facts: ~1,500 tokens
- Episodic memories: ~2,000 tokens (5 summaries × 400 tokens)
- Patterns: ~500 tokens
- Total memory context: ~4,000 tokens (2% of 200K context)
- Leaves 196K tokens for conversation + response
Writeback Policy
After each conversation (async, via background job):
1. Store conversation: encrypted full transcript to `conversations` table
2. Generate summary: Claude summarizes to ~200 words
3. Embed summary: OpenAI text-embedding-3-small
4. Extract facts: Claude extracts structured facts with confidence scores
5. Reconcile facts:
- New fact matches existing → update last_confirmed, increment confidence
- New fact conflicts with existing → create supersession chain
- Novel fact → insert new
6. Extract patterns (weekly batch):
- Look at last 7 days of conversations
- Update user_patterns with observed patterns
Write-time cost:
- Summary generation: ~$0.002 (500-word conversation → 200-word summary)
- Embedding: ~$0.00002
- Fact extraction: ~$0.003
- Total per conversation: ~$0.005
- At 10K users × 1 conversation/day: $50/day = $1,500/month in write costs
Forgetting Algorithm
Time-decay:
- Conversations > 90 days old: retrieval priority × 0.5
- Conversations > 365 days old: retrieval priority × 0.2 (but still accessible)
- Facts > 6 months unconfirmed: confidence × 0.5
Irrelevance-decay:
- Memories never retrieved in 180 days: move to cold storage (compressed, slow retrieve)
- Facts with confidence < 0.3: archive
User-requested:
- Users can delete any memory via UI
- GDPR right-to-be-forgotten: full account deletion removes all user memories
- Users can mark memories 'keep forever' (override decay)
Summarization compression:
- Conversations > 1 year old: keep summary, compress full transcript
- Save storage (full transcripts are 5-10x summary size)
User-Facing Memory UI
'Your Memory' page in the product:
1. Conversations list: chronological, searchable, deletable
2. Facts I know about you: categorized, editable, delete-able
3. Patterns I've noticed: shown with 'helpful / wrong / fine' feedback
4. Settings:
- Memory retention period (default 3 years, min 30 days, max forever)
- Topics to never remember (e.g., 'health')
- Delete all memories (full reset)
- Export memories (GDPR)
Transparency matters. Users trust agents that let them inspect. 'Secretly-remembers-everything' feels creepy.
Cost + Latency Model
At 10K users × 1 conversation/day:
Storage:
- Conversations: ~500 words × 1100/user × 10K users = 5.5B words ≈ 35GB
- Embeddings: 1536 dims × float32 × 11M embeddings = ~65GB
- Facts + patterns: <5GB
- Total: ~100GB Postgres storage. At Fly.io ~$75/month.
Compute:
- Write costs: ~$1,500/month (summaries + embeddings + fact extraction)
- Retrieval costs: ~$0 (pgvector query on existing Postgres)
- Total: ~$1,500/month compute.
Latency:
- Retrieval per turn: ~50ms (pgvector query + fact lookup)
- User experience: negligible added latency
Total: ~$1,575/month at 10K users. Well within your $3-5K budget.
Implementation Roadmap
Phase 1 (Week 1-2): Foundation
- Add pgvector extension to Postgres
- Create schema (conversations, user_facts, user_patterns, memory_access_log)
- Implement conversation save + summary generation
- Basic recency-based retrieval
Phase 2 (Week 3-4): Semantic
- Embedding integration (OpenAI text-embedding-3-small)
- Semantic search retrieval
- Fact extraction pipeline
- Query classification + retrieval routing
Phase 3 (Week 5-6): Patterns + UI
- Weekly batch pattern-extraction
- 'Your Memory' UI page (see + edit + delete)
- Time-decay implementation
Phase 4 (Week 7-8): Polish
- Irrelevance-decay
- Cold-storage migration for ancient memories
- User settings (retention period, forbidden topics)
- GDPR compliance finalization
Post-launch:
- Monitor retrieval quality (are fetched memories actually relevant?)
- Eval pipeline for memory-quality assessment
- Scale decisions (if pgvector becomes bottleneck, migrate to Pinecone/Chroma)
Key Takeaways
- 4 memory types (episodic/semantic/procedural/working) with different storage + retrieval + forgetting. Don't conflate them.
- Hybrid Postgres + pgvector is correct for your scale. Don't add separate vector DB until genuinely needed (>1M vectors per user OR performance issues).
- Selective retrieval (3-10 memories per turn), not dump-everything. Classify query type, route to appropriate retrieval strategy.
- Forgetting is essential. Time-decay + irrelevance-decay + user-controlled retention. Without forgetting, memory grows linearly and becomes useless.
- User-visible memory UI is a trust feature, not optional. 'Your Memory' page with inspect/edit/delete is table stakes for an agent handling deeply personal content.
Common use cases
- Agent developers building personal AI assistants that remember user preferences
- Customer-support agents that need to remember ticket history across contacts
- Coaching/therapy-style agents that build long-term user models
- Productivity agents that learn user workflows over time
- Knowledge-work agents (research, writing, strategy) that remember project context
- Multi-session technical support agents
- Sales agents that track prospect interactions across weeks
- Educational tutors that remember student progress
- Platform teams designing memory infrastructure for multiple agents
Best AI model for this
Claude Opus 4 or Sonnet 4.5. Memory architecture requires reasoning about retrieval, embeddings, cost, and retention simultaneously. Top-tier reasoning matters.
Pro tips
- DON'T start with a vector database. Start with structured storage (Postgres JSON columns) for the first 100 memories. Only add vector DB when you have >1,000 memories AND need semantic retrieval. Over-engineering memory is the #1 mistake.
- The 4 memory types need different retrieval strategies. Episodic = retrieve by recency + similarity. Semantic = always-available fast lookup. Procedural = conditional activation. Working = full-context for current session. Don't use one-size-fits-all retrieval.
- Selective retrieval beats dump-everything. Instead of 'give Claude all 500 memories and let it sort,' retrieve the 3-10 most relevant per turn. This is cheaper, faster, and produces better outputs.
- Memory WRITEBACK matters as much as retrieval. After each interaction: what should be stored as new memory? What existing memory should be updated? What should be forgotten? This 'write' step is where most systems leak context.
- Forgetting is a feature, not a bug. Humans forget — agents should too. Implement time-decay (older memories get lower retrieval priority) + irrelevance-decay (memories never retrieved for 30+ days get archived). Without forgetting, memory grows linearly and becomes useless.
- Embed at write-time, not read-time. Embeddings are cheap (~$0.00002/1K tokens for OpenAI ada-002) but latency matters. Batch-embed memories on write; fetch cached embeddings on read.
- Don't embed entire conversations. Embed SUMMARIES of conversations (Claude can summarize to ~200 tokens). Search against summaries; fetch full conversations when needed. Dramatically cheaper storage + faster search.
- User-visible memory UI is a trust feature. Let users see + delete their stored memories. Agents that secretly remember everything feel creepy. Memory users can inspect feels helpful.
Customization tips
- Start with episodic memory only. Semantic + procedural can come in phase 2. Premature memory-type-splitting is over-engineering for early-stage products.
- Test retrieval quality early. Sample 20 real user queries, run through retrieval, manually rate: is the fetched memory actually relevant? Iterate on classifier + search strategy until hit-rate >80%.
- For deeply personal content (your use case), encryption-at-rest is non-negotiable. Use Postgres column-level encryption for `full_transcript` field. Summary can be plaintext for search.
- Version your memory schema. Changes to how you store memories (e.g., adding embedding column, renaming fields) will happen. Migration strategies should be first-class from day 1.
- If users push back on retention limits ('I want ALL my memories forever'), offer a 'forever tier' at higher price. Storage + embedding costs scale with retention — charge appropriately.
Variants
Personal Assistant Mode
For personal AI agents remembering user preferences, habits, tasks over years. Emphasizes semantic memory + procedural learning.
Support Agent Mode
For customer-support agents remembering ticket history + user-specific context. Emphasizes episodic memory + user-profile semantic.
Research/Knowledge Agent Mode
For research agents with long project arcs. Emphasizes semantic memory + procedural workflow memory.
Multi-User Platform Mode
For agents serving many users with isolated memory per user. Emphasizes tenant isolation + per-user quotas.
Frequently asked questions
How do I use the Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory prompt?
Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.
Which AI model works best with Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory?
Claude Opus 4 or Sonnet 4.5. Memory architecture requires reasoning about retrieval, embeddings, cost, and retention simultaneously. Top-tier reasoning matters.
Can I customize the Agent Memory Architect — Persistent, Structured, Retrievable Agent Memory prompt for my use case?
Yes — every Promptolis Original is designed to be customized. Key levers: DON'T start with a vector database. Start with structured storage (Postgres JSON columns) for the first 100 memories. Only add vector DB when you have >1,000 memories AND need semantic retrieval. Over-engineering memory is the #1 mistake.; The 4 memory types need different retrieval strategies. Episodic = retrieve by recency + similarity. Semantic = always-available fast lookup. Procedural = conditional activation. Working = full-context for current session. Don't use one-size-fits-all retrieval.
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals