Skip to content

3.5 — Memory & RAG

AI has no memory between conversations. Every new session starts blank. The model that helped you build your project yesterday has no idea your project exists today.

For a one-off question, that’s fine. For ongoing work — a project with conventions, context, and history — it’s fatal. Every session you’d have to re-explain everything.

There are two solutions, and serious agent architects use both.

Store important information in files the AI reads at the start of each session. CLAUDE.md is one form of this. The memory directory at ~/.claude/projects/[project]/memory/ is another.

How it works: You write facts into files. The agent reads those files. The facts are in context.

Strengths:

  • Simple — it’s just files
  • Human-readable and editable
  • No infrastructure required

Limits:

  • Bounded by the context window — you can’t load unlimited files
  • Manual to maintain — you write and update the files
  • Good for: project conventions, user preferences, architectural decisions

Solution 2: RAG (Retrieval-Augmented Generation)

Section titled “Solution 2: RAG (Retrieval-Augmented Generation)”

Instead of loading everything into context at once, store knowledge in a searchable system. When a question comes up, retrieve only the relevant pieces and inject them into context.

RAG breaks down as:

  • Retrieval — searching a knowledge base for information relevant to the current question
  • Augmented — adding that retrieved information to the AI’s context
  • Generation — the AI generates its response with the added context

Analogy:

  • Without RAG: Studying for an exam by memorizing the entire textbook. You’re limited by how much you can hold in your head (context window limit).
  • With RAG: Taking an open-book exam. You don’t memorize everything — you know where to find the right information and pull it when you need it.

RAG scales where file-based memory can’t. A knowledge base can hold millions of documents. RAG finds the three relevant ones and puts those in context.

TermDefinition
EmbeddingA way to represent text as numbers so a computer can measure similarity between pieces of text. How RAG systems find “relevant” documents.
Vector databaseA database optimized for storing and searching embeddings. The filing system behind RAG.
Semantic searchSearching by meaning, not exact keywords. “How do I fix login?” finds results about “authentication errors” because the meaning is similar.
Knowledge baseAn organized collection of information an AI system can draw from
PartitionA section within a knowledge base — like folders within a filing cabinet. Keeps different topics from interfering with each other.

Use file-based memory when:

  • The knowledge is small and changes infrequently
  • You want humans to be able to read and edit it directly
  • You’re working on a single focused project

Use RAG when:

  • The knowledge base is large (hundreds of documents or more)
  • You need to search by meaning, not just load everything
  • Multiple agents or sessions need to share the same knowledge

In Phase 4, you’ll design full memory architectures that layer both approaches — working memory, session memory, persistent memory, and a shared knowledge base.


Next: 3.6 — Agent Patterns | Phase overview: Phase 3