Memory Pipeline
From raw conversations to curated knowledge
Memory is what transforms a language model into a truly useful assistant. Our system processes raw conversation sessions through a 9-stage pipeline -- from extraction to knowledge graph -- and makes them searchable through 4 parallel search layers fused by a dedicated plugin (memory-fusion) that deduplicates, reranks, and injects a single clean context block.
9
stages in the memory pipeline
4
search sources queried in parallel
Jaccard
deduplication (60% similarity threshold)
Reranking
by Claude Haiku (0-10 scoring per result)
6,000
characters injected max per query
Deployed
on both instances (Max and Eva)
9 stages from raw data to fused knowledge
Extract
Hourly cron parses JSONL sessions into structured summaries (.daily-raw/, interactions.md). Zero LLM tokens.
Distill
2x/day, Claude Sonnet compresses daily-raw + interactions into MEMORY.md (25K chars). Dedup and contradiction resolution.
Shepherd
Every 3h, verifies MEMORY.md integrity against a human-approved baseline. Detects drift, restores if needed.
Reflections
2x/day, the agent writes autonomous introspective reflections about its behavior and decisions.
Vector SQLite
Local embeddings (text-embedding-3-small) in SQLite with sqlite-vec. Semantic fallback search layer.
QMD
Hybrid BM25 + vector + reranking on indexed markdown files. 93% recall vs 55% vector-only.
Mem0 + Qdrant
Auto-extracts facts via Mem0 v1.0.5, stored in Qdrant server v1.17.0 (standalone binary, openclaw_mem0 collection, 1536-dim embeddings, Cosine distance). Auto-recall injects top 5 memories.
Cognee
Knowledge graph with typed entities and relations. Queryable triplets (entity -> relation -> entity).
Fusion
Merges results from QMD + Mem0/Qdrant + Cognee in parallel. Jaccard dedup, Haiku reranking (0-10), 6000 chars budget.
Each layer has different strengths. All run simultaneously.
QMD
BM25 + Vector + Rerank
Mem0
Qdrant + auto-facts
Cognee
Knowledge graph
SQLite
Vector fallback
QMD (Hybrid Search)
Hybrid (keywords + semantic)
Native OpenClaw memory backend. Combines BM25 (exact keyword matching) + vector search (semantic similarity) + LLM reranking using local GGUF models.
Models: embeddinggemma-300M, Qwen3-Reranker-0.6B, qmd-query-expansion-1.7B. Mode: query. 93% recall vs 55% with vector-only search.
Mem0 + Qdrant (Auto-Facts)
Auto-extracted facts and preferences
Mem0 v1.0.5 auto-extracts facts after each conversation with intelligent deduplication. Auto-recall injects the 5 most relevant memories before each response.
Backend: Qdrant server v1.17.0 (standalone binary, not Docker). Collection openclaw_mem0, 1536-dim embeddings (text-embedding-3-small), Cosine distance. Port 6334, loopback only. ~80-180 MB RAM. LLM: gpt-4o-mini.
Cognee (Knowledge Graph)
Relational triplets (entity -> relation -> entity)
Extracts entities and relationships from memory files. Builds a queryable graph (e.g., julien -> corrected_by -> bug gateway).
FastAPI server on localhost:8000. Search mode: INSIGHTS (entity-relation-entity triplets). 19 files indexed. Deployed on Max and Eva.
Vector SQLite (Fallback)
Pure semantic (cosine similarity)
The original vector index, still active as a safety net. Embeddings via OpenAI text-embedding-3-small, stored in local SQLite.
Hybrid search: 70% vector + 30% full-text. Covers complete history.
| Layer | Max | Eva | Status |
|---|---|---|---|
| Vector SQLite | OK | OK | Production |
| QMD (hybrid BM25+vec+rerank) | OK | OK | Production |
| Mem0 + Qdrant (auto-facts) | OK | OK | Production |
| Cognee (knowledge graph) | OK | OK | Production |
| Memory Fusion (dedup + rerank) | OK | OK | Production |
| Daemon | Frequency | Role |
|---|---|---|
com.openclaw.memory-extract | Hourly | Parse sessions, produce interactions.md + .daily-raw/, reindex |
com.openclaw.memory-distill | 2x/day | LLM compresses daily-raw + interactions into MEMORY.md, reindex |
ai.openclaw.memory-shepherd | Every 3h | Protect baseline MEMORY.md, archive scratch notes |
com.openclaw.agent-reflect | 2x/day | Agent writes autonomous reflection into memory/reflect/ |
ai.openclaw.cognee-server | Permanent | FastAPI Cognee server (port 8000) |
How each component works and why it exists.
extract-and-reindex.py
Hourly cron (launchd)Transforms raw conversation sessions into structured, searchable memory.
HOW IT WORKS
- Scans all JSONL session files from the last 24h
- Extracts structured summaries into .daily-raw/ (4-day retention) and memory/interactions.md (40K chars max, rolling window)
- Triggers QMD reindexing after each run to keep vector embeddings up to date
- Zero LLM tokens -- pure Python parsing, no API calls
WHY IT MATTERS
Without extraction, the agent would only have raw JSONL logs. This script creates the structured layer that all other memory systems build on.
distill-memory.py
2x/day (launchd)Compresses daily interactions into a curated, long-term MEMORY.md.
HOW IT WORKS
- Reads .daily-raw/ summaries and memory/interactions.md
- Uses Claude Sonnet (~5K tokens per run) to summarize, merge, and deduplicate entries
- Rewrites MEMORY.md with the condensed result, preserving important context and dropping noise
- Triggers QMD reindexing after each distill cycle
WHY IT MATTERS
Raw interactions accumulate fast. Without distillation, MEMORY.md would grow indefinitely and lose signal. The LLM acts as a curator, keeping only what matters.
memory-shepherd
Every 3 hours (launchd)Protects MEMORY.md against slow identity drift caused by repeated LLM rewriting.
HOW IT WORKS
- Maintains a human-verified baseline copy of MEMORY.md
- Every 3h, compares the current MEMORY.md against the baseline using diff analysis
- If drift exceeds threshold: archives the corrupted version, restores the baseline, and logs the incident
- Also archives scratch notes (temporary memory) to prevent buildup
WHY IT MATTERS
The distill script uses an LLM to rewrite MEMORY.md twice a day. Over hundreds of cycles, small deviations compound -- the agent's personality, instructions, or knowledge could silently shift. The shepherd anchors everything to a human-approved reference.
agent-reflect
2x/day (launchd)Gives the agent autonomous introspective capabilities.
HOW IT WORKS
- The agent writes free-form reflections about its own behavior, decisions, and interactions into memory/reflect/
- Reflections are indexed by QMD and available for future context retrieval
- No human review required -- this is the agent thinking about itself
WHY IT MATTERS
Self-reflection enables behavioral improvement over time. The agent can recognize patterns in its own mistakes and adjust without explicit human correction.
memory-fusion
On every query (plugin)Deduplicates and reranks results from all 3 memory sources (QMD + Mem0/Qdrant + Cognee) into one clean block.
HOW IT WORKS
- Queries QMD, Mem0/Qdrant, and Cognee in parallel via Promise.allSettled (tolerates individual source failures)
- Deduplicates results using Jaccard similarity on word sets (threshold: 0.6 -- 60% overlap = duplicate)
- Reranks the deduplicated results using Claude Haiku with per-result relevance scoring (0-10 scale)
- Injects a single fused memory block respecting a 6,000-character budget (max 8 results) into the agent context
WHY IT MATTERS
Without fusion, the 3 sources would each inject their own results, creating duplicates and noise. Fusion gives the agent one clean, ranked memory context instead of three overlapping ones.
