9

stages in the memory pipeline

4

search sources queried in parallel

Jaccard

deduplication (60% similarity threshold)

Reranking

by Claude Haiku (0-10 scoring per result)

6,000

characters injected max per query

Deployed

on both instances (Max and Eva)

9 stages from raw data to fused knowledge

Ingestion
1

Extract

Hourly cron parses JSONL sessions into structured summaries (.daily-raw/, interactions.md). Zero LLM tokens.

2

Distill

2x/day, Claude Sonnet compresses daily-raw + interactions into MEMORY.md (25K chars). Dedup and contradiction resolution.

3

Shepherd

Every 3h, verifies MEMORY.md integrity against a human-approved baseline. Detects drift, restores if needed.

Indexation
4

Reflections

2x/day, the agent writes autonomous introspective reflections about its behavior and decisions.

5

Vector SQLite

Local embeddings (text-embedding-3-small) in SQLite with sqlite-vec. Semantic fallback search layer.

6

QMD

Hybrid BM25 + vector + reranking on indexed markdown files. 93% recall vs 55% vector-only.

Search & Fusion
7

Mem0 + Qdrant

Auto-extracts facts via Mem0 v1.0.5, stored in Qdrant server v1.17.0 (standalone binary, openclaw_mem0 collection, 1536-dim embeddings, Cosine distance). Auto-recall injects top 5 memories.

8

Cognee

Knowledge graph with typed entities and relations. Queryable triplets (entity -> relation -> entity).

9

Fusion

Merges results from QMD + Mem0/Qdrant + Cognee in parallel. Jaccard dedup, Haiku reranking (0-10), 6000 chars budget.

Each layer has different strengths. All run simultaneously.

Agent Query
parallel

QMD

BM25 + Vector + Rerank

Mem0

Qdrant + auto-facts

Cognee

Knowledge graph

SQLite

Vector fallback

Fusion: Jaccard dedup + Haiku rerank
Enriched Response

QMD (Hybrid Search)

Hybrid (keywords + semantic)

Primary

Native OpenClaw memory backend. Combines BM25 (exact keyword matching) + vector search (semantic similarity) + LLM reranking using local GGUF models.

Models: embeddinggemma-300M, Qwen3-Reranker-0.6B, qmd-query-expansion-1.7B. Mode: query. 93% recall vs 55% with vector-only search.

Mem0 + Qdrant (Auto-Facts)

Auto-extracted facts and preferences

Facts

Mem0 v1.0.5 auto-extracts facts after each conversation with intelligent deduplication. Auto-recall injects the 5 most relevant memories before each response.

Backend: Qdrant server v1.17.0 (standalone binary, not Docker). Collection openclaw_mem0, 1536-dim embeddings (text-embedding-3-small), Cosine distance. Port 6334, loopback only. ~80-180 MB RAM. LLM: gpt-4o-mini.

Cognee (Knowledge Graph)

Relational triplets (entity -> relation -> entity)

Graph

Extracts entities and relationships from memory files. Builds a queryable graph (e.g., julien -> corrected_by -> bug gateway).

FastAPI server on localhost:8000. Search mode: INSIGHTS (entity-relation-entity triplets). 19 files indexed. Deployed on Max and Eva.

Vector SQLite (Fallback)

Pure semantic (cosine similarity)

Fallback

The original vector index, still active as a safety net. Embeddings via OpenAI text-embedding-3-small, stored in local SQLite.

Hybrid search: 70% vector + 30% full-text. Covers complete history.

LayerMaxEvaStatus
Vector SQLiteOKOKProduction
QMD (hybrid BM25+vec+rerank)OKOKProduction
Mem0 + Qdrant (auto-facts)OKOKProduction
Cognee (knowledge graph)OKOKProduction
Memory Fusion (dedup + rerank)OKOKProduction
DaemonFrequencyRole
com.openclaw.memory-extractHourlyParse sessions, produce interactions.md + .daily-raw/, reindex
com.openclaw.memory-distill2x/dayLLM compresses daily-raw + interactions into MEMORY.md, reindex
ai.openclaw.memory-shepherdEvery 3hProtect baseline MEMORY.md, archive scratch notes
com.openclaw.agent-reflect2x/dayAgent writes autonomous reflection into memory/reflect/
ai.openclaw.cognee-serverPermanentFastAPI Cognee server (port 8000)

How each component works and why it exists.

extract-and-reindex.py

Hourly cron (launchd)

Transforms raw conversation sessions into structured, searchable memory.

HOW IT WORKS

  1. Scans all JSONL session files from the last 24h
  2. Extracts structured summaries into .daily-raw/ (4-day retention) and memory/interactions.md (40K chars max, rolling window)
  3. Triggers QMD reindexing after each run to keep vector embeddings up to date
  4. Zero LLM tokens -- pure Python parsing, no API calls

WHY IT MATTERS

Without extraction, the agent would only have raw JSONL logs. This script creates the structured layer that all other memory systems build on.

distill-memory.py

2x/day (launchd)

Compresses daily interactions into a curated, long-term MEMORY.md.

HOW IT WORKS

  1. Reads .daily-raw/ summaries and memory/interactions.md
  2. Uses Claude Sonnet (~5K tokens per run) to summarize, merge, and deduplicate entries
  3. Rewrites MEMORY.md with the condensed result, preserving important context and dropping noise
  4. Triggers QMD reindexing after each distill cycle

WHY IT MATTERS

Raw interactions accumulate fast. Without distillation, MEMORY.md would grow indefinitely and lose signal. The LLM acts as a curator, keeping only what matters.

memory-shepherd

Every 3 hours (launchd)

Protects MEMORY.md against slow identity drift caused by repeated LLM rewriting.

HOW IT WORKS

  1. Maintains a human-verified baseline copy of MEMORY.md
  2. Every 3h, compares the current MEMORY.md against the baseline using diff analysis
  3. If drift exceeds threshold: archives the corrupted version, restores the baseline, and logs the incident
  4. Also archives scratch notes (temporary memory) to prevent buildup

WHY IT MATTERS

The distill script uses an LLM to rewrite MEMORY.md twice a day. Over hundreds of cycles, small deviations compound -- the agent's personality, instructions, or knowledge could silently shift. The shepherd anchors everything to a human-approved reference.

agent-reflect

2x/day (launchd)

Gives the agent autonomous introspective capabilities.

HOW IT WORKS

  1. The agent writes free-form reflections about its own behavior, decisions, and interactions into memory/reflect/
  2. Reflections are indexed by QMD and available for future context retrieval
  3. No human review required -- this is the agent thinking about itself

WHY IT MATTERS

Self-reflection enables behavioral improvement over time. The agent can recognize patterns in its own mistakes and adjust without explicit human correction.

memory-fusion

On every query (plugin)

Deduplicates and reranks results from all 3 memory sources (QMD + Mem0/Qdrant + Cognee) into one clean block.

HOW IT WORKS

  1. Queries QMD, Mem0/Qdrant, and Cognee in parallel via Promise.allSettled (tolerates individual source failures)
  2. Deduplicates results using Jaccard similarity on word sets (threshold: 0.6 -- 60% overlap = duplicate)
  3. Reranks the deduplicated results using Claude Haiku with per-result relevance scoring (0-10 scale)
  4. Injects a single fused memory block respecting a 6,000-character budget (max 8 results) into the agent context

WHY IT MATTERS

Without fusion, the 3 sources would each inject their own results, creating duplicates and noise. Fusion gives the agent one clean, ranked memory context instead of three overlapping ones.

Memory Pipeline -- Architecture Deep Dive | OpenClaw × Easylab