13

stages in the memory pipeline

4

search sources queried in parallel

4

new EasyClaw v2 modules integrated

Smart

semantic deduplication

Listwise

reranking via gpt-4.1-nano (RankGPT approach, coherent global ordering)

Optimized

context budget per query

3

compaction levels (full, detailed, condensed)

Deployed

on both instances (Max and Eva)

13 stages from raw data to fused and contextual knowledge

Ingestion
1

Extract

Hourly cron parses JSONL sessions into structured summaries (.daily-raw/, interactions.md). Zero LLM tokens.

2

Extract-MemoriesNEW

At session end, automatically analyzes content and extracts significant facts (preferences, decisions, context). Structured, deduplicated, stored in memory/extracts/.

Learn more
3

Reflections

2x/day, the agent writes autonomous introspective reflections about its behavior and decisions into memory/reflect/.

Consolidation
4

Distill

2x/day, Claude compresses the recent interactions and extractions into curated memory. Deduplication and contradiction resolution.

5

Auto-DreamNEW

During idle periods, automatically consolidates memory: merges similar memories, resolves contradictions, archives obsolete information.

Learn more
6

Shepherd

Every 3h, verifies MEMORY.md integrity against a human-approved baseline. Detects drift, restores if needed.

Indexation
7

Vector SQLite

Local embeddings (text-embedding-3-small) in SQLite with sqlite-vec. Semantic fallback search layer.

8

QMD

Hybrid BM25 + vector + reranking on indexed markdown files. 93% recall vs 55% vector-only.

9

Mem0 + Qdrant

Auto-extracts facts via Mem0 v1.0.5, stored in Qdrant server v1.17.0. Auto-recall injects top 5 memories. Fed by the Extract-Memories plugin.

10

GraphitiNEW

Temporal knowledge graph (Graphiti by Zep, FalkorDB backend). Typed entities and relations, with validity windows — old facts are invalidated, not deleted. Queryable triplets (entity -> relation -> entity).

Recall & Context
11

Memory SelectionNEW

On every message, semantic analysis identifies the most relevant memories. Multi-criteria scoring: relevance (40%), recency (20%), frequency (20%), importance (20%).

Learn more
12

Fusion

Merges results from QMD + Mem0/Qdrant + Graphiti in parallel. Listwise reranking via gpt-4.1-nano: cross-result comparison (RankGPT approach) for coherent global ordering, rather than independent per-item scores. Cross-source deduplication, position-based sorting, context budget enforced.

13

CompactionNEW

Multi-level context management during the session. Distant past summarized, recent past detailed, immediate context fully preserved. Critical elements protected.

Learn more

Each layer has different strengths. All run simultaneously, preceded by semantic selection and followed by compaction.

Agent Query
Semantic SelectionNEW

Multi-criteria scoring to identify relevant memories

parallel

QMD

BM25 + Vector + Rerank

Mem0

Qdrant + auto-facts

Graphiti

Temporal knowledge graph

SQLite

Vector fallback

Fusion: cross-source dedup + listwise rerank gpt-4.1-nano
Multi-Level CompactionNEW

Progressive context summarization, critical anchor protection

Enriched Response

QMD (Hybrid Search)

Hybrid (keywords + semantic)

Primary

Native OpenClaw memory backend. Combines BM25 (exact keyword matching) + vector search (semantic similarity) + LLM reranking using local quantized models.

Local quantized models (embeddings, reranker, query expansion) executed in loopback. 93% recall vs 55% with vector-only search.

Mem0 + Qdrant (Auto-Facts)

Auto-extracted facts and preferences

Facts

Mem0 auto-extracts facts after each conversation with intelligent deduplication. Auto-recall injects the 5 most relevant memories before each response. Fed by the Extract-Memories plugin which structures facts upstream.

Backend: Qdrant running as a standalone binary, loopback only. Multi-dimensional embeddings with cosine distance. Lightweight memory footprint. Extraction driven by Claude.

Graphiti (Temporal Knowledge Graph)

Temporal relational triplets with validity windows

Graph

Extracts entities and relationships from memory episodes using gpt-4.1-nano. Builds a queryable temporal graph (e.g., julien → corrected_by → bug gateway) where each fact has validity windows.

Graphiti by Zep, FalkorDB backend. Incremental updates, no global recomputation. Exposed via a local FastAPI wrapper (port 8002). Deployed on Max, Neo, with DD in progress.

Vector SQLite (Fallback)

Pure semantic (cosine similarity)

Fallback

The original vector index, still active as a safety net. OpenAI embeddings, stored in local SQLite.

Hybrid search: 70% vector + 30% full-text. Covers complete history.

A third recall dimension that complements textual similarity and curated memories.

Textual similarity and semantic memories answer one question: "what resembles my query?". The graph answers a different one: "what is connected to what I mention?". The two approaches complement each other, they do not replace each other.

ENTITY NEIGHBORHOODUser query"Tell me about project X"QMDtextual similarityMem0semantic memoriesGraphXYZWcollaboratesusesdeadlineFusion and injection into context

Principle

When the user talks about a project, similarity search returns passages that describe it. The graph returns the people working on it, the related technologies, the deadlines, the clients — even if no single passage co-mentions all of them. Relations live in the structure of the graph, not in the resemblance of words.

Three Complementary Layers

QMD

Textual similarity

Hybrid BM25 + vector search. Returns passages that resemble the query.

Mem0

Semantic memories

Extracted facts and curated experiences. Returns what the agent has remembered.

Graph

Structural neighborhood

Typed entities and relations. Returns what is connected, not what resembles.

Approach

  1. Lightweight detection of named entities in the user query.
  2. Targeted lookup of each entity's immediate neighbors in the graph.
  3. Compact, structured injection into the agent context: a handful of relations, not a raw dump.

Benefits

Compactness

A few lines of relations are enough to enrich context without blowing the token budget.

Determinism

Relations either exist or they don't. No fuzzy scoring, no threshold to tune.

Non-obvious relations

The graph surfaces connections that textual similarity alone would never have seen.

Performance

Real-time lookup. No LLM in the relation-recall loop.

LayerMaxEvaStatus
Vector SQLiteOKOKProduction
QMD (hybrid BM25+vec+rerank)OKOKProduction
Mem0 + Qdrant (auto-facts)OKOKProduction
Graphiti (temporal knowledge graph)NEWOKOKProduction (migrated from Cognee, 2026-04-16)
Memory Fusion (dedup + rerank)OKOKProduction
Extract-Memories (fact extraction)NEWOKOKProduction
Auto-Dream (long-term consolidation)NEWOK--Production
Memory Selection (semantic recall)NEWOKOKProduction
Multi-Level CompactionNEWOKOKProduction
DaemonFrequencyRole
Memory Extract daemonHourlyParse sessions, produce structured summaries and trigger reindexing
Extract-Memories pluginNEWSession endExtracts significant facts from conversations, structures and stores them in long-term memory
Memory Distill daemon2x/dayLLM compresses recent interactions and extractions into curated memory, then triggers reindexing
Auto-Dream pluginNEWDaily (3am)Consolidates long-term memory: merges duplicates, resolves contradictions, archives obsolete
Memory Shepherd daemonEvery 3hProtects the memory baseline and archives scratch notes
Agent-Reflect daemon2x/dayAgent writes autonomous reflection into its introspective journal
Graphiti + FalkorDB (Docker stack)NEWPermanentTemporal knowledge-graph stack (FalkorDB + graphiti-core via FastAPI wrapper on port 8002)

How each component works and why it exists.

Memory Extract daemon

Hourly cron (launchd)

Transforms raw conversation sessions into structured, searchable memory. First stage of the pipeline, it produces the raw data that all other systems consume.

HOW IT WORKS

  1. Scans all session files from the last 24 hours
  2. Extracts structured summaries (bounded retention) and maintains a rolling interactions log
  3. Triggers QMD reindexing after each run to keep vector embeddings up to date
  4. Zero LLM tokens — pure parsing, no API calls

WHY IT MATTERS

Without extraction, the agent would only have raw JSONL logs. This script creates the structured layer that all other memory systems build on.

extract-memories (plugin)

NEWSession end (automatic)

Analyzes every completed session and extracts significant facts: user preferences, technical decisions, project constraints, contacts. Facts are structured, categorized, and deduplicated before being stored in persistent memory.

HOW IT WORKS

  1. Triggers automatically when a conversation ends or reaches a length threshold
  2. A specialized model identifies significant facts, preferences, decisions, and context elements
  3. Each fact is categorized (preference, decision, constraint, contact) and associated with metadata (date, source, confidence)
  4. Already-known facts are merged or updated rather than duplicated
  5. Validated memories join the long-term store, available for Mem0 and distillation

WHY IT MATTERS

Before Extract-Memories, valuable information from conversations was lost at session end. This plugin ensures preferences, decisions, and project context are automatically retained without manual intervention.

Learn more

Memory Distill daemon

2x/day (launchd)

Compresses daily interactions and fact extractions into a curated, long-term memory.

HOW IT WORKS

  1. Reads recent summaries, the rolling interactions log, and extracts from the Extract-Memories plugin
  2. Uses Claude to summarize, merge, and deduplicate entries
  3. Rewrites the curated memory with the condensed result, preserving important context and dropping noise
  4. Triggers QMD reindexing after each distill cycle

WHY IT MATTERS

Raw interactions accumulate fast. Without distillation, the curated memory would grow indefinitely and lose signal. The LLM acts as a curator, keeping only what matters.

auto-dream (plugin)

NEWDaily at 3am (idle)

Consolidates long-term memory during idle periods, inspired by how human memory works during sleep. Complements and reinforces distill-memory's work.

HOW IT WORKS

  1. Detects periods without interaction (night, weekends, configurable inactivity threshold)
  2. Scans the entire memory to identify clusters of semantically close memories
  3. Merges redundant or complementary memories into a single, more complete and better-formulated fact
  4. When two memories contradict each other, keeps the most recent and archives the older one with an audit trail
  5. Archives obsolete memories (superseded by new information, too old without recall)

WHY IT MATTERS

Over time, memory accumulates hundreds of facts. Without consolidation, it becomes noisy: duplicates, outdated information, incomplete fragments. Auto-Dream maintains a compact, coherent memory that improves over time.

Learn more

memory-shepherd

Every 3 hours (launchd)

Protects the curated memory against slow identity drift caused by repeated LLM rewriting (distill + auto-dream).

HOW IT WORKS

  1. Maintains a human-verified baseline copy of the curated memory
  2. Every 3h, compares the current curated memory against the baseline using diff analysis
  3. If drift exceeds threshold: archives the corrupted version, restores the baseline, and logs the incident
  4. Also archives scratch notes (temporary memory) to prevent buildup

WHY IT MATTERS

Distill and Auto-Dream use an LLM to rewrite the curated memory. Over hundreds of cycles, small deviations compound — the agent's personality, instructions, or knowledge could silently shift. The shepherd anchors everything to a human-approved reference.

Agent-Reflect daemon

2x/day (launchd)

Gives the agent autonomous introspective capabilities.

HOW IT WORKS

  1. The agent writes free-form reflections about its own behavior, decisions, and interactions into its introspective journal
  2. Reflections are indexed by QMD and available for future context retrieval
  3. No human review required -- this is the agent thinking about itself

WHY IT MATTERS

Self-reflection enables behavioral improvement over time. The agent can recognize patterns in its own mistakes and adjust without explicit human correction.

memory-selection (core)

NEWOn every message (real-time)

Identifies and injects the most relevant memories into the agent's context before it formulates a response. The agent doesn't remember everything -- it remembers what's relevant.

HOW IT WORKS

  1. The incoming message is transformed into a semantic vector and analyzed to extract key themes and entities
  2. Semantically close memories are identified via a search in the vector database (Qdrant, pgvector)
  3. Each candidate memory receives a combined score: semantic relevance (40%), recency (20%), recall frequency (20%), importance (20%)
  4. The top N scored memories are selected (N is configurable, typically 5-15)
  5. Selected memories are injected into the agent's context, in a dedicated section, before the agent formulates its response

WHY IT MATTERS

An agent can have thousands of memories. Injecting all of them would be impossible and counterproductive. Semantic selection gives the agent exactly the right memories at the right time, like a human colleague who knows the project well.

Learn more

memory-fusion

On every query (plugin)

Deduplicates and reranks results from all 3 memory sources (QMD + Mem0/Qdrant + Graphiti) into one clean block. Runs after semantic selection to merge results from the different layers.

HOW IT WORKS

  1. Queries QMD, Mem0/Qdrant, and Graphiti in parallel with fault isolation: a failing source does not affect the others
  2. Deduplicates results by semantic similarity (an overlap threshold flags duplicates)
  3. Listwise reranking via gpt-4.1-nano: the model compares all results against each other and returns a complete ordering (RankGPT approach), rather than scoring each item independently — same model used by Mem0 and Graphiti for stack coherence
  4. Injects a single fused memory block respecting a defined context budget into the agent prompt

WHY IT MATTERS

Without fusion, the 3 sources would each inject their own results, creating duplicates and noise. Fusion gives the agent one clean, ranked memory context instead of three overlapping ones.

multi-level compaction (core)

NEWDuring session (continuous)

Intelligently manages context for long conversations to prevent information loss when the model's context window is reached. Inspired by operating system memory management.

HOW IT WORKS

  1. Continuously monitors context size relative to the model's limit
  2. At 70% capacity, triggers the first compaction level
  3. Progressively summarizes older exchanges while preserving key decisions, important facts, and the narrative thread
  4. Organizes context into 3 levels: full detail (last 20 messages), detailed summary (previous 50), condensed summary (rest)
  5. Elements marked as critical (decisions, validated code, explicit instructions) are protected from compaction via anchors

WHY IT MATTERS

Without compaction, long conversations lose their beginning (where the design was discussed) when context overflows. Multi-level compaction enables 3h+ sessions without losing coherence -- the agent keeps the overall plan, decisions along the way, and full detail of the most recent exchanges.

Learn more
Memory Pipeline -- Architecture Deep Dive | OpenClaw × Easylab