Claude Code Memory Systems: Chapter 8 Analysis
Date: 2026-04-25
Sources: Ken Huang Substack
Raw: raw/gmail/2026-04-25-starred.md
TL;DR
Chapter 8 of Ken Huang's Claude Code vs. Hermes Agent comparative series dissects how each system handles memory continuity across sessions. Claude Code uses file-backed transcripts with eager flush (write before the API call, not after), an LRU cache for file reads, and path-set deduplication for CLAUDE.md injection. Hermes Agent uses SQLite with FTS5 (full-text search) for episodic recall. The two approaches are complementary: Claude Code optimizes for crash safety and session portability; Hermes optimizes for searchable history across many sessions.
Claude Code Memory Architecture
The QueryEngine State Container
export class QueryEngine {
private mutableMessages: Message[] // Full conversation, grows each turn
private loadedNestedMemoryPaths = new Set<string>() // Dedup for CLAUDE.md injection
private discoveredSkillNames = new Set<string>() // Telemetry
}
mutableMessages is the source of truth. Every user message, assistant response, tool call, and tool result is appended. The session lives in RAM; the transcript on disk is the durable backup.
Eager Flush — The Key Design Decision
Most systems write the transcript after the API responds. Claude Code writes it before sending the request:
// Pre-emptive persistence BEFORE the API call
if (persistSession && messagesFromUserInput.length > 0) {
const transcriptPromise = recordTranscript(messages)
if (isBareMode()) {
void transcriptPromise // Fire-and-forget in SDK mode
} else {
await transcriptPromise // Block for durability in interactive mode
}
}
Mode-aware: interactive sessions block on the write (crash safety first), SDK sessions fire-and-forget (speed first). This means if the process is killed mid-inference, the user's message is already on disk and the session can resume. The Claude Code postmortem (04-24) revealed that a caching bug corrupted this mechanism, causing the perceived "shrinkflation" — context was being cleared before it should have been.
LRU Cache with Clone-on-Spawn
File reads are cached across turns. When a nested agent spawns, it gets a clone of the parent's cache. Changes propagate back on completion:
const engine = new QueryEngine({
readFileCache: cloneFileStateCache(getReadFileCache()), // Child gets own copy
})
try {
yield* engine.submitMessage(prompt, { uuid: promptUuid })
} finally {
setReadFileCache(engine.getReadFileState()) // Propagate child's reads back to parent
}
This pattern avoids double file reads in nested agents while keeping parent and child state isolated during execution.
CLAUDE.md Auto-Injection with Deduplication
CLAUDE.md files are prefetched in parallel before the API call, then injected as tool results. The loadedNestedMemoryPaths set prevents re-injection across turns even if the LRU cache evicts the file entry:
using pendingMemoryPrefetch = startRelevantMemoryPrefetch(state.messages, state.toolUseContext)
// Later:
const memoryAttachments = filterDuplicateMemoryAttachments(
await pendingMemoryPrefetch.promise,
toolUseContext.readFileState,
)
Two dedup mechanisms: cache-level (LRU hits) and session-level (path set). The session-level set is the safety net for long sessions where cache eviction is likely.
Context Compaction and Preserved Tail
When context compacts, a "preserved tail" is written to the transcript so --resume can reconstruct post-compaction state:
if (message.type === 'system' && message.subtype === 'compact_boundary') {
const tailUuid = message.compactMetadata?.preservedSegment?.tailUuid
if (tailUuid) {
const tailIdx = this.mutableMessages.findLastIndex(m => m.uuid === tailUuid)
await recordTranscript(this.mutableMessages.slice(0, tailIdx + 1))
}
}
This is the mechanism that cere-bro's own session resumption relies on.
Hermes Agent Memory Architecture
Hermes takes a database-first approach: all history in SQLite with WAL mode for concurrent access, plus FTS5 for full-text search:
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
content,
content=messages, -- backed by messages table
content_rowid=id
);
-- Triggers keep FTS index in sync automatically
CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
END;
FTS5 content table: the index stays in sync with the source table automatically via triggers. This is O(log n) search across all sessions ever stored, without full-table scans.
Three memory layers:
- Working memory: live conversation (equivalent to
mutableMessages) - Episodic memory: SQLite FTS5 across all prior sessions — searchable by any keyword
- Procedural memory: markdown playbooks for reusable skills (equivalent to CLAUDE.md)
Comparison
| Concern | Claude Code | Hermes Agent |
|---|---|---|
| Session persistence | File-backed transcripts | SQLite WAL |
| History search | None (file-based only) | FTS5 across all sessions |
| Crash safety | Eager flush (pre-API write) | WAL (atomic commits) |
| Memory injection | CLAUDE.md auto-prefetch | Markdown playbook retrieval |
| Nested agents | Clone-on-spawn + propagate-back | Separate process, explicit handoff |
The gap: Claude Code has no cross-session search. Hermes has no eager-flush crash safety. A production agent needs both.
Prior Context
Directly extends Claude Code Architecture (04-17, 04-19): Prior pages covered the permission model and tool execution loop. This chapter fills in the memory layer — how state persists across turns, crashes, and compaction boundaries.
Explains the Claude Code postmortem (04-24): The caching bug that caused "shrinkflation" corrupted the LRU cache and flush pipeline described here. Understanding the eager flush mechanism makes clear why the bug had outsized impact: it broke the durability guarantee that the entire session resume system relies on.
Connects to Claude Code vs Hermes Permissions (04-23): Chapter 7 covered the permission model. Chapter 8 covers memory. Together they paint a complete picture of where Claude Code's agent architecture is robust (permissions, crash safety) and where it's incomplete (no episodic search).