3.5 — Memory & RAG

The Problem

AI has no memory between conversations. Every new session starts blank. The model that helped you build your project yesterday has no idea your project exists today.

For a one-off question, that’s fine. For ongoing work — a project with conventions, context, and history — it’s fatal. Every session you’d have to re-explain everything.

There are two solutions, and serious agent architects use both.

Solution 1: File-Based Memory

Store important information in files the AI reads at the start of each session. CLAUDE.md is one form of this. The memory directory at ~/.claude/projects/[project]/memory/ is another.

How it works: You write facts into files. The agent reads those files. The facts are in context.

Strengths:

Simple — it’s just files
Human-readable and editable
No infrastructure required

Limits:

Bounded by the context window — you can’t load unlimited files
Manual to maintain — you write and update the files
Good for: project conventions, user preferences, architectural decisions

Solution 2: RAG (Retrieval-Augmented Generation)

Instead of loading everything into context at once, store knowledge in a searchable system. When a question comes up, retrieve only the relevant pieces and inject them into context.

RAG breaks down as:

Retrieval — searching a knowledge base for information relevant to the current question
Augmented — adding that retrieved information to the AI’s context
Generation — the AI generates its response with the added context

Analogy:

Without RAG: Studying for an exam by memorizing the entire textbook. You’re limited by how much you can hold in your head (context window limit).

With RAG: Taking an open-book exam. You don’t memorize everything — you know where to find the right information and pull it when you need it.

RAG scales where file-based memory can’t. A knowledge base can hold millions of documents. RAG finds the three relevant ones and puts those in context.

Key Terms

Term	Definition
Embedding	A way to represent text as numbers so a computer can measure similarity between pieces of text. How RAG systems find “relevant” documents.
Vector database	A database optimized for storing and searching embeddings. The filing system behind RAG.
Semantic search	Searching by meaning, not exact keywords. “How do I fix login?” finds results about “authentication errors” because the meaning is similar.
Knowledge base	An organized collection of information an AI system can draw from
Partition	A section within a knowledge base — like folders within a filing cabinet. Keeps different topics from interfering with each other.

Choosing Between Them

Use file-based memory when:

The knowledge is small and changes infrequently
You want humans to be able to read and edit it directly
You’re working on a single focused project

Use RAG when:

The knowledge base is large (hundreds of documents or more)
You need to search by meaning, not just load everything
Multiple agents or sessions need to share the same knowledge

In Phase 4, you’ll design full memory architectures that layer both approaches — working memory, session memory, persistent memory, and a shared knowledge base.

Next: 3.6 — Agent Patterns | Phase overview: Phase 3