4.3 — Memory Architecture at Scale
Phase 3 introduced RAG, embeddings, and vector databases as tools for giving agents persistent memory. At scale, you need a deliberate architecture — not just one tool, but a layered system with clear rules for what lives where.
The 4-Layer Memory Stack
Section titled “The 4-Layer Memory Stack”┌────────────────────────────────┐│ WORKING MEMORY │ ← Current conversation context│ (context window) │ Fast, limited, temporary├────────────────────────────────┤│ SESSION MEMORY │ ← Within-session persistence│ (files, CLAUDE.md) │ Medium, survives compaction├────────────────────────────────┤│ PERSISTENT MEMORY │ ← Cross-session knowledge│ (memory files, database) │ Slow to retrieve, permanent├────────────────────────────────┤│ KNOWLEDGE BASE │ ← Organizational knowledge│ (RAG, vector DB, Brain) │ Searchable, scalable, shared└────────────────────────────────┘Working memory is the context window — everything the agent can see right now. Fast and immediate, but it disappears when the conversation ends and it has a hard size limit.
Session memory lives in files and CLAUDE.md. It survives context compaction and can be re-read during a long session, but it’s still bounded by what you can load into context.
Persistent memory crosses session boundaries. Memory files and databases store facts permanently and are retrieved as needed — the agent doesn’t carry them at all times.
Knowledge base — the organizational layer. RAG systems, vector databases, and tools like Brain live here. Searchable, scalable, shareable across multiple agents.
Design Decision Questions
Section titled “Design Decision Questions”Before building a memory system, answer these four questions:
- What should the agent remember always vs. recall on demand? Always-on knowledge belongs in CLAUDE.md or working memory. Large corpora belong in RAG.
- How do you prevent memory from becoming stale? Outdated facts in persistent memory can be worse than no memory. Build update and expiry mechanisms from the start.
- How do you prevent memory from becoming bloated? Noise drowns out signal. Design for curation, not just accumulation.
- How do multiple agents share knowledge without conflicts? A shared knowledge base solves this — but requires write-access rules to prevent agents from overwriting each other’s updates.
Next: 4.4 — Agentic Project Management | Phase overview: Phase 4