Memory Pipeline
From raw conversations to curated knowledge
Memory is what transforms a language model into a truly useful assistant. Our complete pipeline processes raw conversation sessions through 13 stages -- from fact extraction to real-time context compaction -- and makes them searchable through 4 parallel search layers. EasyClaw v2 enriches this pipeline with automatic memory extraction, long-term consolidation, semantic selection at recall time, and multi-level compaction during the session.
13
stages in the memory pipeline
4
search sources queried in parallel
4
new EasyClaw v2 modules integrated
Smart
semantic deduplication
Listwise
reranking via gpt-4.1-nano (RankGPT approach, coherent global ordering)
Optimized
context budget per query
3
compaction levels (full, detailed, condensed)
Deployed
on both instances (Max and Eva)
13 stages from raw data to fused and contextual knowledge
Extract
Hourly cron parses JSONL sessions into structured summaries (.daily-raw/, interactions.md). Zero LLM tokens.
Extract-MemoriesNEW
At session end, automatically analyzes content and extracts significant facts (preferences, decisions, context). Structured, deduplicated, stored in memory/extracts/.
Learn moreReflections
2x/day, the agent writes autonomous introspective reflections about its behavior and decisions into memory/reflect/.
Distill
2x/day, Claude compresses the recent interactions and extractions into curated memory. Deduplication and contradiction resolution.
Auto-DreamNEW
During idle periods, automatically consolidates memory: merges similar memories, resolves contradictions, archives obsolete information.
Learn moreShepherd
Every 3h, verifies MEMORY.md integrity against a human-approved baseline. Detects drift, restores if needed.
Vector SQLite
Local embeddings (text-embedding-3-small) in SQLite with sqlite-vec. Semantic fallback search layer.
QMD
Hybrid BM25 + vector + reranking on indexed markdown files. 93% recall vs 55% vector-only.
Mem0 + Qdrant
Auto-extracts facts via Mem0 v1.0.5, stored in Qdrant server v1.17.0. Auto-recall injects top 5 memories. Fed by the Extract-Memories plugin.
GraphitiNEW
Temporal knowledge graph (Graphiti by Zep, FalkorDB backend). Typed entities and relations, with validity windows — old facts are invalidated, not deleted. Queryable triplets (entity -> relation -> entity).
Memory SelectionNEW
On every message, semantic analysis identifies the most relevant memories. Multi-criteria scoring: relevance (40%), recency (20%), frequency (20%), importance (20%).
Learn moreFusion
Merges results from QMD + Mem0/Qdrant + Graphiti in parallel. Listwise reranking via gpt-4.1-nano: cross-result comparison (RankGPT approach) for coherent global ordering, rather than independent per-item scores. Cross-source deduplication, position-based sorting, context budget enforced.
CompactionNEW
Multi-level context management during the session. Distant past summarized, recent past detailed, immediate context fully preserved. Critical elements protected.
Learn moreEach layer has different strengths. All run simultaneously, preceded by semantic selection and followed by compaction.
Multi-criteria scoring to identify relevant memories
QMD
BM25 + Vector + Rerank
Mem0
Qdrant + auto-facts
Graphiti
Temporal knowledge graph
SQLite
Vector fallback
Progressive context summarization, critical anchor protection
QMD (Hybrid Search)
Hybrid (keywords + semantic)
Native OpenClaw memory backend. Combines BM25 (exact keyword matching) + vector search (semantic similarity) + LLM reranking using local quantized models.
Local quantized models (embeddings, reranker, query expansion) executed in loopback. 93% recall vs 55% with vector-only search.
Mem0 + Qdrant (Auto-Facts)
Auto-extracted facts and preferences
Mem0 auto-extracts facts after each conversation with intelligent deduplication. Auto-recall injects the 5 most relevant memories before each response. Fed by the Extract-Memories plugin which structures facts upstream.
Backend: Qdrant running as a standalone binary, loopback only. Multi-dimensional embeddings with cosine distance. Lightweight memory footprint. Extraction driven by Claude.
Graphiti (Temporal Knowledge Graph)
Temporal relational triplets with validity windows
Extracts entities and relationships from memory episodes using gpt-4.1-nano. Builds a queryable temporal graph (e.g., julien → corrected_by → bug gateway) where each fact has validity windows.
Graphiti by Zep, FalkorDB backend. Incremental updates, no global recomputation. Exposed via a local FastAPI wrapper (port 8002). Deployed on Max, Neo, with DD in progress.
Vector SQLite (Fallback)
Pure semantic (cosine similarity)
The original vector index, still active as a safety net. OpenAI embeddings, stored in local SQLite.
Hybrid search: 70% vector + 30% full-text. Covers complete history.
A third recall dimension that complements textual similarity and curated memories.
Textual similarity and semantic memories answer one question: "what resembles my query?". The graph answers a different one: "what is connected to what I mention?". The two approaches complement each other, they do not replace each other.
Principle
When the user talks about a project, similarity search returns passages that describe it. The graph returns the people working on it, the related technologies, the deadlines, the clients — even if no single passage co-mentions all of them. Relations live in the structure of the graph, not in the resemblance of words.
Three Complementary Layers
QMD
Textual similarity
Hybrid BM25 + vector search. Returns passages that resemble the query.
Mem0
Semantic memories
Extracted facts and curated experiences. Returns what the agent has remembered.
Graph
Structural neighborhood
Typed entities and relations. Returns what is connected, not what resembles.
Approach
- Lightweight detection of named entities in the user query.
- Targeted lookup of each entity's immediate neighbors in the graph.
- Compact, structured injection into the agent context: a handful of relations, not a raw dump.
Benefits
Compactness
A few lines of relations are enough to enrich context without blowing the token budget.
Determinism
Relations either exist or they don't. No fuzzy scoring, no threshold to tune.
Non-obvious relations
The graph surfaces connections that textual similarity alone would never have seen.
Performance
Real-time lookup. No LLM in the relation-recall loop.
| Layer | Max | Eva | Status |
|---|---|---|---|
| Vector SQLite | OK | OK | Production |
| QMD (hybrid BM25+vec+rerank) | OK | OK | Production |
| Mem0 + Qdrant (auto-facts) | OK | OK | Production |
| Graphiti (temporal knowledge graph)NEW | OK | OK | Production (migrated from Cognee, 2026-04-16) |
| Memory Fusion (dedup + rerank) | OK | OK | Production |
| Extract-Memories (fact extraction)NEW | OK | OK | Production |
| Auto-Dream (long-term consolidation)NEW | OK | -- | Production |
| Memory Selection (semantic recall)NEW | OK | OK | Production |
| Multi-Level CompactionNEW | OK | OK | Production |
| Daemon | Frequency | Role |
|---|---|---|
Memory Extract daemon | Hourly | Parse sessions, produce structured summaries and trigger reindexing |
Extract-Memories pluginNEW | Session end | Extracts significant facts from conversations, structures and stores them in long-term memory |
Memory Distill daemon | 2x/day | LLM compresses recent interactions and extractions into curated memory, then triggers reindexing |
Auto-Dream pluginNEW | Daily (3am) | Consolidates long-term memory: merges duplicates, resolves contradictions, archives obsolete |
Memory Shepherd daemon | Every 3h | Protects the memory baseline and archives scratch notes |
Agent-Reflect daemon | 2x/day | Agent writes autonomous reflection into its introspective journal |
Graphiti + FalkorDB (Docker stack)NEW | Permanent | Temporal knowledge-graph stack (FalkorDB + graphiti-core via FastAPI wrapper on port 8002) |
How each component works and why it exists.
Memory Extract daemon
Hourly cron (launchd)Transforms raw conversation sessions into structured, searchable memory. First stage of the pipeline, it produces the raw data that all other systems consume.
HOW IT WORKS
- Scans all session files from the last 24 hours
- Extracts structured summaries (bounded retention) and maintains a rolling interactions log
- Triggers QMD reindexing after each run to keep vector embeddings up to date
- Zero LLM tokens — pure parsing, no API calls
WHY IT MATTERS
Without extraction, the agent would only have raw JSONL logs. This script creates the structured layer that all other memory systems build on.
extract-memories (plugin)
NEWSession end (automatic)Analyzes every completed session and extracts significant facts: user preferences, technical decisions, project constraints, contacts. Facts are structured, categorized, and deduplicated before being stored in persistent memory.
HOW IT WORKS
- Triggers automatically when a conversation ends or reaches a length threshold
- A specialized model identifies significant facts, preferences, decisions, and context elements
- Each fact is categorized (preference, decision, constraint, contact) and associated with metadata (date, source, confidence)
- Already-known facts are merged or updated rather than duplicated
- Validated memories join the long-term store, available for Mem0 and distillation
WHY IT MATTERS
Before Extract-Memories, valuable information from conversations was lost at session end. This plugin ensures preferences, decisions, and project context are automatically retained without manual intervention.
Learn moreMemory Distill daemon
2x/day (launchd)Compresses daily interactions and fact extractions into a curated, long-term memory.
HOW IT WORKS
- Reads recent summaries, the rolling interactions log, and extracts from the Extract-Memories plugin
- Uses Claude to summarize, merge, and deduplicate entries
- Rewrites the curated memory with the condensed result, preserving important context and dropping noise
- Triggers QMD reindexing after each distill cycle
WHY IT MATTERS
Raw interactions accumulate fast. Without distillation, the curated memory would grow indefinitely and lose signal. The LLM acts as a curator, keeping only what matters.
auto-dream (plugin)
NEWDaily at 3am (idle)Consolidates long-term memory during idle periods, inspired by how human memory works during sleep. Complements and reinforces distill-memory's work.
HOW IT WORKS
- Detects periods without interaction (night, weekends, configurable inactivity threshold)
- Scans the entire memory to identify clusters of semantically close memories
- Merges redundant or complementary memories into a single, more complete and better-formulated fact
- When two memories contradict each other, keeps the most recent and archives the older one with an audit trail
- Archives obsolete memories (superseded by new information, too old without recall)
WHY IT MATTERS
Over time, memory accumulates hundreds of facts. Without consolidation, it becomes noisy: duplicates, outdated information, incomplete fragments. Auto-Dream maintains a compact, coherent memory that improves over time.
Learn morememory-shepherd
Every 3 hours (launchd)Protects the curated memory against slow identity drift caused by repeated LLM rewriting (distill + auto-dream).
HOW IT WORKS
- Maintains a human-verified baseline copy of the curated memory
- Every 3h, compares the current curated memory against the baseline using diff analysis
- If drift exceeds threshold: archives the corrupted version, restores the baseline, and logs the incident
- Also archives scratch notes (temporary memory) to prevent buildup
WHY IT MATTERS
Distill and Auto-Dream use an LLM to rewrite the curated memory. Over hundreds of cycles, small deviations compound — the agent's personality, instructions, or knowledge could silently shift. The shepherd anchors everything to a human-approved reference.
Agent-Reflect daemon
2x/day (launchd)Gives the agent autonomous introspective capabilities.
HOW IT WORKS
- The agent writes free-form reflections about its own behavior, decisions, and interactions into its introspective journal
- Reflections are indexed by QMD and available for future context retrieval
- No human review required -- this is the agent thinking about itself
WHY IT MATTERS
Self-reflection enables behavioral improvement over time. The agent can recognize patterns in its own mistakes and adjust without explicit human correction.
memory-selection (core)
NEWOn every message (real-time)Identifies and injects the most relevant memories into the agent's context before it formulates a response. The agent doesn't remember everything -- it remembers what's relevant.
HOW IT WORKS
- The incoming message is transformed into a semantic vector and analyzed to extract key themes and entities
- Semantically close memories are identified via a search in the vector database (Qdrant, pgvector)
- Each candidate memory receives a combined score: semantic relevance (40%), recency (20%), recall frequency (20%), importance (20%)
- The top N scored memories are selected (N is configurable, typically 5-15)
- Selected memories are injected into the agent's context, in a dedicated section, before the agent formulates its response
WHY IT MATTERS
An agent can have thousands of memories. Injecting all of them would be impossible and counterproductive. Semantic selection gives the agent exactly the right memories at the right time, like a human colleague who knows the project well.
Learn morememory-fusion
On every query (plugin)Deduplicates and reranks results from all 3 memory sources (QMD + Mem0/Qdrant + Graphiti) into one clean block. Runs after semantic selection to merge results from the different layers.
HOW IT WORKS
- Queries QMD, Mem0/Qdrant, and Graphiti in parallel with fault isolation: a failing source does not affect the others
- Deduplicates results by semantic similarity (an overlap threshold flags duplicates)
- Listwise reranking via gpt-4.1-nano: the model compares all results against each other and returns a complete ordering (RankGPT approach), rather than scoring each item independently — same model used by Mem0 and Graphiti for stack coherence
- Injects a single fused memory block respecting a defined context budget into the agent prompt
WHY IT MATTERS
Without fusion, the 3 sources would each inject their own results, creating duplicates and noise. Fusion gives the agent one clean, ranked memory context instead of three overlapping ones.
multi-level compaction (core)
NEWDuring session (continuous)Intelligently manages context for long conversations to prevent information loss when the model's context window is reached. Inspired by operating system memory management.
HOW IT WORKS
- Continuously monitors context size relative to the model's limit
- At 70% capacity, triggers the first compaction level
- Progressively summarizes older exchanges while preserving key decisions, important facts, and the narrative thread
- Organizes context into 3 levels: full detail (last 20 messages), detailed summary (previous 50), condensed summary (rest)
- Elements marked as critical (decisions, validated code, explicit instructions) are protected from compaction via anchors
WHY IT MATTERS
Without compaction, long conversations lose their beginning (where the design was discussed) when context overflows. Multi-level compaction enables 3h+ sessions without losing coherence -- the agent keeps the overall plan, decisions along the way, and full detail of the most recent exchanges.
Learn more