Memory Pipeline

From raw conversations to curated knowledge

Memory is what transforms a language model into a truly useful assistant. Our complete pipeline processes raw conversation sessions through 13 stages -- from fact extraction to real-time context compaction -- and makes them searchable through 4 parallel search layers. EasyClaw v2 enriches this pipeline with automatic memory extraction, long-term consolidation, semantic selection at recall time, and multi-level compaction during the session.

Key Figures

stages in the memory pipeline

search sources queried in parallel

new EasyClaw v2 modules integrated

Smart

semantic deduplication

Listwise

reranking via gpt-4.1-nano (RankGPT approach, coherent global ordering)

Optimized

context budget per query

compaction levels (full, detailed, condensed)

Deployed

on both instances (Max and Eva)

Processing Pipeline

13 stages from raw data to fused and contextual knowledge

Ingestion

Extract

Hourly cron parses JSONL sessions into structured summaries (.daily-raw/, interactions.md). Zero LLM tokens.

Extract-MemoriesNEW

At session end, automatically analyzes content and extracts significant facts (preferences, decisions, context). Structured, deduplicated, stored in memory/extracts/.

Learn more

Reflections

2x/day, the agent writes autonomous introspective reflections about its behavior and decisions into memory/reflect/.

Consolidation

Distill

2x/day, Claude compresses the recent interactions and extractions into curated memory. Deduplication and contradiction resolution.

Auto-DreamNEW

During idle periods, automatically consolidates memory: merges similar memories, resolves contradictions, archives obsolete information.

Learn more

Shepherd

Every 3h, verifies MEMORY.md integrity against a human-approved baseline. Detects drift, restores if needed.

Indexation

Vector SQLite

Local embeddings (text-embedding-3-small) in SQLite with sqlite-vec. Semantic fallback search layer.

QMD

Hybrid BM25 + vector + reranking on indexed markdown files. 93% recall vs 55% vector-only.

Mem0 + Qdrant

Auto-extracts facts via Mem0 v1.0.5, stored in Qdrant server v1.17.0. Auto-recall injects top 5 memories. Fed by the Extract-Memories plugin.

GraphitiNEW

Temporal knowledge graph (Graphiti by Zep, FalkorDB backend). Typed entities and relations, with validity windows — old facts are invalidated, not deleted. Queryable triplets (entity -> relation -> entity).

Recall & Context

Memory SelectionNEW

On every message, semantic analysis identifies the most relevant memories. Multi-criteria scoring: relevance (40%), recency (20%), frequency (20%), importance (20%).

Learn more

Fusion

Merges results from QMD + Mem0/Qdrant + Graphiti in parallel. Listwise reranking via gpt-4.1-nano: cross-result comparison (RankGPT approach) for coherent global ordering, rather than independent per-item scores. Cross-source deduplication, position-based sorting, context budget enforced.

CompactionNEW

Multi-level context management during the session. Distant past summarized, recent past detailed, immediate context fully preserved. Critical elements protected.

Learn more

4 Parallel Search Layers

Each layer has different strengths. All run simultaneously, preceded by semantic selection and followed by compaction.

Agent Query

Semantic SelectionNEW

Multi-criteria scoring to identify relevant memories

parallel

QMD

BM25 + Vector + Rerank

Mem0

Qdrant + auto-facts

Graphiti

Temporal knowledge graph

SQLite

Vector fallback

Fusion: cross-source dedup + listwise rerank gpt-4.1-nano

Multi-Level CompactionNEW

Progressive context summarization, critical anchor protection

Enriched Response

QMD (Hybrid Search)

Hybrid (keywords + semantic)

Primary

Native OpenClaw memory backend. Combines BM25 (exact keyword matching) + vector search (semantic similarity) + LLM reranking using local quantized models.

Local quantized models (embeddings, reranker, query expansion) executed in loopback. 93% recall vs 55% with vector-only search.

Mem0 + Qdrant (Auto-Facts)

Auto-extracted facts and preferences

Facts

Mem0 auto-extracts facts after each conversation with intelligent deduplication. Auto-recall injects the 5 most relevant memories before each response. Fed by the Extract-Memories plugin which structures facts upstream.

Backend: Qdrant running as a standalone binary, loopback only. Multi-dimensional embeddings with cosine distance. Lightweight memory footprint. Extraction driven by Claude.

Graphiti (Temporal Knowledge Graph)

Temporal relational triplets with validity windows

Graph

Extracts entities and relationships from memory episodes using gpt-4.1-nano. Builds a queryable temporal graph (e.g., julien → corrected_by → bug gateway) where each fact has validity windows.

Graphiti by Zep, FalkorDB backend. Incremental updates, no global recomputation. Exposed via a local FastAPI wrapper (port 8002). Deployed on Max, Neo, with DD in progress.

Vector SQLite (Fallback)

Pure semantic (cosine similarity)

Fallback

The original vector index, still active as a safety net. OpenAI embeddings, stored in local SQLite.

Hybrid search: 70% vector + 30% full-text. Covers complete history.

Relation-Based LookupNEW

A third recall dimension that complements textual similarity and curated memories.

Textual similarity and semantic memories answer one question: "what resembles my query?". The graph answers a different one: "what is connected to what I mention?". The two approaches complement each other, they do not replace each other.

Principle

When the user talks about a project, similarity search returns passages that describe it. The graph returns the people working on it, the related technologies, the deadlines, the clients — even if no single passage co-mentions all of them. Relations live in the structure of the graph, not in the resemblance of words.

Three Complementary Layers

QMD

Textual similarity

Hybrid BM25 + vector search. Returns passages that resemble the query.

Mem0

Semantic memories

Extracted facts and curated experiences. Returns what the agent has remembered.

Graph

Structural neighborhood

Typed entities and relations. Returns what is connected, not what resembles.

Approach

Lightweight detection of named entities in the user query.
Targeted lookup of each entity's immediate neighbors in the graph.
Compact, structured injection into the agent context: a handful of relations, not a raw dump.

Benefits

Compactness

A few lines of relations are enough to enrich context without blowing the token budget.

Determinism

Relations either exist or they don't. No fuzzy scoring, no threshold to tune.

Non-obvious relations

The graph surfaces connections that textual similarity alone would never have seen.

Performance

Real-time lookup. No LLM in the relation-recall loop.

Deployment Status

Layer	Max	Eva	Status
Vector SQLite	OK	OK	Production
QMD (hybrid BM25+vec+rerank)	OK	OK	Production
Mem0 + Qdrant (auto-facts)	OK	OK	Production
Graphiti (temporal knowledge graph)NEW	OK	OK	Production (migrated from Cognee, 2026-04-16)
Memory Fusion (dedup + rerank)	OK	OK	Production
Extract-Memories (fact extraction)NEW	OK	OK	Production
Auto-Dream (long-term consolidation)NEW	OK	--	Production
Memory Selection (semantic recall)NEW	OK	OK	Production
Multi-Level CompactionNEW	OK	OK	Production

Active Memory Daemons

Daemon	Frequency	Role
`Memory Extract daemon`	Hourly	Parse sessions, produce structured summaries and trigger reindexing
`Extract-Memories plugin`NEW	Session end	Extracts significant facts from conversations, structures and stores them in long-term memory
`Memory Distill daemon`	2x/day	LLM compresses recent interactions and extractions into curated memory, then triggers reindexing
`Auto-Dream plugin`NEW	Daily (3am)	Consolidates long-term memory: merges duplicates, resolves contradictions, archives obsolete
`Memory Shepherd daemon`	Every 3h	Protects the memory baseline and archives scratch notes
`Agent-Reflect daemon`	2x/day	Agent writes autonomous reflection into its introspective journal
`Graphiti + FalkorDB (Docker stack)`NEW	Permanent	Temporal knowledge-graph stack (FalkorDB + graphiti-core via FastAPI wrapper on port 8002)

Systems in Detail

How each component works and why it exists.

Memory Extract daemon

Hourly cron (launchd)

Transforms raw conversation sessions into structured, searchable memory. First stage of the pipeline, it produces the raw data that all other systems consume.

HOW IT WORKS

Scans all session files from the last 24 hours
Extracts structured summaries (bounded retention) and maintains a rolling interactions log
Triggers QMD reindexing after each run to keep vector embeddings up to date
Zero LLM tokens — pure parsing, no API calls

WHY IT MATTERS

Without extraction, the agent would only have raw JSONL logs. This script creates the structured layer that all other memory systems build on.

extract-memories (plugin)

NEWSession end (automatic)

Analyzes every completed session and extracts significant facts: user preferences, technical decisions, project constraints, contacts. Facts are structured, categorized, and deduplicated before being stored in persistent memory.

HOW IT WORKS

Triggers automatically when a conversation ends or reaches a length threshold
A specialized model identifies significant facts, preferences, decisions, and context elements
Each fact is categorized (preference, decision, constraint, contact) and associated with metadata (date, source, confidence)
Already-known facts are merged or updated rather than duplicated
Validated memories join the long-term store, available for Mem0 and distillation

WHY IT MATTERS

Before Extract-Memories, valuable information from conversations was lost at session end. This plugin ensures preferences, decisions, and project context are automatically retained without manual intervention.

Learn more

Memory Distill daemon

2x/day (launchd)

Compresses daily interactions and fact extractions into a curated, long-term memory.

HOW IT WORKS

Reads recent summaries, the rolling interactions log, and extracts from the Extract-Memories plugin
Uses Claude to summarize, merge, and deduplicate entries
Rewrites the curated memory with the condensed result, preserving important context and dropping noise
Triggers QMD reindexing after each distill cycle

WHY IT MATTERS

Raw interactions accumulate fast. Without distillation, the curated memory would grow indefinitely and lose signal. The LLM acts as a curator, keeping only what matters.

auto-dream (plugin)

NEWDaily at 3am (idle)

Consolidates long-term memory during idle periods, inspired by how human memory works during sleep. Complements and reinforces distill-memory's work.

HOW IT WORKS

Detects periods without interaction (night, weekends, configurable inactivity threshold)
Scans the entire memory to identify clusters of semantically close memories
Merges redundant or complementary memories into a single, more complete and better-formulated fact
When two memories contradict each other, keeps the most recent and archives the older one with an audit trail
Archives obsolete memories (superseded by new information, too old without recall)

WHY IT MATTERS

Over time, memory accumulates hundreds of facts. Without consolidation, it becomes noisy: duplicates, outdated information, incomplete fragments. Auto-Dream maintains a compact, coherent memory that improves over time.

Learn more

memory-shepherd

Every 3 hours (launchd)

Protects the curated memory against slow identity drift caused by repeated LLM rewriting (distill + auto-dream).

HOW IT WORKS

Maintains a human-verified baseline copy of the curated memory
Every 3h, compares the current curated memory against the baseline using diff analysis
If drift exceeds threshold: archives the corrupted version, restores the baseline, and logs the incident
Also archives scratch notes (temporary memory) to prevent buildup

WHY IT MATTERS

Distill and Auto-Dream use an LLM to rewrite the curated memory. Over hundreds of cycles, small deviations compound — the agent's personality, instructions, or knowledge could silently shift. The shepherd anchors everything to a human-approved reference.

Agent-Reflect daemon

2x/day (launchd)

Gives the agent autonomous introspective capabilities.

HOW IT WORKS

The agent writes free-form reflections about its own behavior, decisions, and interactions into its introspective journal
Reflections are indexed by QMD and available for future context retrieval
No human review required -- this is the agent thinking about itself

WHY IT MATTERS

Self-reflection enables behavioral improvement over time. The agent can recognize patterns in its own mistakes and adjust without explicit human correction.

memory-selection (core)

NEWOn every message (real-time)

Identifies and injects the most relevant memories into the agent's context before it formulates a response. The agent doesn't remember everything -- it remembers what's relevant.

HOW IT WORKS

The incoming message is transformed into a semantic vector and analyzed to extract key themes and entities
Semantically close memories are identified via a search in the vector database (Qdrant, pgvector)
Each candidate memory receives a combined score: semantic relevance (40%), recency (20%), recall frequency (20%), importance (20%)
The top N scored memories are selected (N is configurable, typically 5-15)
Selected memories are injected into the agent's context, in a dedicated section, before the agent formulates its response

WHY IT MATTERS

An agent can have thousands of memories. Injecting all of them would be impossible and counterproductive. Semantic selection gives the agent exactly the right memories at the right time, like a human colleague who knows the project well.

Learn more

memory-fusion

On every query (plugin)

Deduplicates and reranks results from all 3 memory sources (QMD + Mem0/Qdrant + Graphiti) into one clean block. Runs after semantic selection to merge results from the different layers.

HOW IT WORKS

Queries QMD, Mem0/Qdrant, and Graphiti in parallel with fault isolation: a failing source does not affect the others
Deduplicates results by semantic similarity (an overlap threshold flags duplicates)
Listwise reranking via gpt-4.1-nano: the model compares all results against each other and returns a complete ordering (RankGPT approach), rather than scoring each item independently — same model used by Mem0 and Graphiti for stack coherence
Injects a single fused memory block respecting a defined context budget into the agent prompt

WHY IT MATTERS

Without fusion, the 3 sources would each inject their own results, creating duplicates and noise. Fusion gives the agent one clean, ranked memory context instead of three overlapping ones.

multi-level compaction (core)

NEWDuring session (continuous)

Intelligently manages context for long conversations to prevent information loss when the model's context window is reached. Inspired by operating system memory management.

HOW IT WORKS

Continuously monitors context size relative to the model's limit
At 70% capacity, triggers the first compaction level
Progressively summarizes older exchanges while preserving key decisions, important facts, and the narrative thread
Organizes context into 3 levels: full detail (last 20 messages), detailed summary (previous 50), condensed summary (rest)
Elements marked as critical (decisions, validated code, explicit instructions) are protected from compaction via anchors

WHY IT MATTERS

Without compaction, long conversations lose their beginning (where the design was discussed) when context overflows. Multi-level compaction enables 3h+ sessions without losing coherence -- the agent keeps the overall plan, decisions along the way, and full detail of the most recent exchanges.

Learn more