4.3 — Memory Architecture at Scale

Phase 3 introduced RAG, embeddings, and vector databases as tools for giving agents persistent memory. At scale, you need a deliberate architecture — not just one tool, but a layered system with clear rules for what lives where.

The 4-Layer Memory Stack

┌────────────────────────────────┐
│  WORKING MEMORY                │  ← Current conversation context
│  (context window)              │     Fast, limited, temporary
├────────────────────────────────┤
│  SESSION MEMORY                │  ← Within-session persistence
│  (files, CLAUDE.md)            │     Medium, survives compaction
├────────────────────────────────┤
│  PERSISTENT MEMORY             │  ← Cross-session knowledge
│  (memory files, database)      │     Slow to retrieve, permanent
├────────────────────────────────┤
│  KNOWLEDGE BASE                │  ← Organizational knowledge
│  (RAG, vector DB, Brain)       │     Searchable, scalable, shared
└────────────────────────────────┘

Working memory is the context window — everything the agent can see right now. Fast and immediate, but it disappears when the conversation ends and it has a hard size limit.

Session memory lives in files and CLAUDE.md. It survives context compaction and can be re-read during a long session, but it’s still bounded by what you can load into context.

Persistent memory crosses session boundaries. Memory files and databases store facts permanently and are retrieved as needed — the agent doesn’t carry them at all times.

Knowledge base — the organizational layer. RAG systems, vector databases, and tools like Brain live here. Searchable, scalable, shareable across multiple agents.

Design Decision Questions

Before building a memory system, answer these four questions:

What should the agent remember always vs. recall on demand? Always-on knowledge belongs in CLAUDE.md or working memory. Large corpora belong in RAG.
How do you prevent memory from becoming stale? Outdated facts in persistent memory can be worse than no memory. Build update and expiry mechanisms from the start.
How do you prevent memory from becoming bloated? Noise drowns out signal. Design for curation, not just accumulation.
How do multiple agents share knowledge without conflicts? A shared knowledge base solves this — but requires write-access rules to prevent agents from overwriting each other’s updates.

Next: 4.4 — Agentic Project Management | Phase overview: Phase 4