Setup & Deployment

OpenClaw Memory Search: Complete Setup, Tuning & Troubleshooting Guide (2026)

20 min read · Updated 2026-04-14

By DoneClaw Team · We run managed OpenClaw deployments and write from hands-on production experience.

If you want OpenClaw memory search to actually work, you need more than a MEMORY.md file and crossed fingers. You need the right embedding provider, the right backend, sensible indexing rules, and a clear idea of when to use builtin memory search versus QMD versus the newer Active Memory layer. That is the real difference between an AI agent that merely stores notes and one that reliably recalls them when it matters. This guide walks through how OpenClaw memory search works, how to configure it, how to improve retrieval quality, and how to fix the most common reasons recall goes to hell. If you already read our guide on how the OpenClaw memory system and persistent context works, think of this as the hands-on version. If you have not, read that after this.

Why OpenClaw Memory Search Matters

OpenClaw does not rely on the model remembering everything in its context window forever. Instead, it stores durable notes and uses retrieval to pull the relevant parts back in later. That matters for three reasons.

According to the OpenClaw docs, memory search combines vector search and BM25 keyword search in parallel, then merges the results. In plain English, that means it can match both semantic meaning and exact strings. That combination is why a query like "gateway host" can still find a note that says "the machine running OpenClaw," while exact things like API keys, error strings, and config names still rank properly.

Lower token waste. You do not want to resend your entire history on every turn.
Better personalization. Preferences, routines, project context, and operating rules can resurface when relevant.
More reliable automation. IDs, config keys, hostnames, and past decisions can be found precisely instead of guessed.

OpenClaw Memory Search vs Active Memory vs the Memory System

People keep mixing these up. They are related, but they are not the same thing.

The memory system stores durable notes like MEMORY.md and memory/*.md for long-term context in your workspace files. Memory search retrieves relevant notes on demand using embeddings and keyword search for precise recall, configured via agents.defaults.memorySearch. Active Memory runs a dedicated memory sub-agent before the main reply for more natural recall in chat, configured via plugins.entries.active-memory. The QMD backend is a local-first search sidecar with reranking and extra collections for better retrieval quality at scale, configured via memory.backend: "qmd".

If you want the deeper conceptual background, DoneClaw has good companion reads including our guides on how the OpenClaw memory system works, how to build an AI assistant that remembers everything, OpenClaw best practices, OpenClaw cron jobs and heartbeats, and how to use MCP servers with OpenClaw.

How OpenClaw Memory Search Works

OpenClaw's documented retrieval pipeline has two parallel paths: vector search for semantic similarity and BM25 / full-text search for exact keywords. Then it merges and ranks the results.

That hybrid setup is the sweet spot. Pure vector search is great for fuzzy recall but can miss exact identifiers. Pure lexical search is great for literals but dumb about meaning. Hybrid retrieval gives you both.

OpenClaw also supports temporal decay so stale notes stop dominating fresh ones, MMR diversity so you do not get five nearly identical snippets, multimodal indexing with Gemini Embedding 2 for image and audio files in extraPaths, and session transcript indexing when you opt into QMD session memory.

Step 1: Start with the Builtin Engine

For most users, the builtin engine is the right starting point. It is simpler, has fewer moving parts, and works automatically if an embedding provider is configured.

A minimal configuration only requires specifying the provider. You can use openai, gemini, voyage, mistral, bedrock, ollama, or local.

Supported providers include: OpenAI (fast, easy default, requires API key), Gemini (supports multimodal indexing, requires API key), Voyage (good embedding quality, requires API key), Mistral (supported and auto-detected, requires API key), Bedrock (uses AWS credential chain, no direct API key), Ollama (local, must be set explicitly, no API key), and Local (uses a GGUF embedding model, no API key).

OpenClaw's docs note that the default local embedding model is embeddinggemma-300m-qat-Q8_0.gguf, about 0.6 GB, while QMD may download around 2 GB of GGUF models on first use for reranking and query expansion. That is a real consideration if you are running on a small VPS or Raspberry Pi.

{
  agents: {
    defaults: {
      memorySearch: {
        provider: "openai"
      }
    }
  }
}

Step 2: Pick the Right Embedding Strategy

This is where people overcomplicate things. Use OpenAI or Gemini if you want the smoothest setup. Use QMD if you want higher-quality search and fully local retrieval. Use local embeddings only if cost or privacy matters more than convenience.

For OpenAI, specify provider: "openai" and model: "text-embedding-3-small" in your memorySearch config. For Gemini, use provider: "gemini" and model: "gemini-embedding-001". For fully local operation, just set provider: "local".

If you are already running OpenClaw with local models, also check our guides on running OpenClaw with Ollama and running OpenClaw with LM Studio.

{
  agents: {
    defaults: {
      memorySearch: {
        provider: "openai",
        model: "text-embedding-3-small"
      }
    }
  }
}

{
  agents: {
    defaults: {
      memorySearch: {
        provider: "gemini",
        model: "gemini-embedding-001"
      }
    }
  }
}

{
  agents: {
    defaults: {
      memorySearch: {
        provider: "local"
      }
    }
  }
}

Step 3: Enable Better Ranking with Hybrid Search, MMR, and Temporal Decay

OpenClaw supports three upgrades that dramatically improve real-world recall quality.

Hybrid search is on by default, and that is correct. Leave it on unless you have a very specific reason not to.

If your search results keep surfacing redundant chunks from several daily notes, enable MMR (Maximal Marginal Relevance) for diversity.

If a note from six weeks ago keeps outranking something you wrote yesterday, enable temporal decay. OpenClaw's default half-life is 30 days, and evergreen files such as MEMORY.md are not decayed.

The following example shows a sensible production setup for anyone with a growing note history.

{
  agents: {
    defaults: {
      memorySearch: {
        query: {
          hybrid: {
            vectorWeight: 0.7,
            textWeight: 0.3,
            mmr: { enabled: true, lambda: 0.7 },
            temporalDecay: { enabled: true, halfLifeDays: 30 }
          }
        }
      }
    }
  }
}

Skip 60 minutes of setup — deploy in 60 seconds

DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.

Deploy Now

Step 4: Know When to Switch to QMD

QMD is where OpenClaw memory search gets serious.

According to the docs, QMD adds reranking, query expansion, extra path indexing, session transcript indexing, local-first retrieval, and automatic fallback to builtin search if QMD fails. That makes it ideal if you want OpenClaw to search beyond MEMORY.md and memory/, especially across project docs, external Markdown collections, or historical chat transcripts.

A minimal QMD setup only requires setting memory.backend to "qmd". You can add extra directories to index by specifying paths with a name, path, and glob pattern. You can also enable transcript indexing for session history recall.

OpenClaw creates the QMD home under ~/.openclaw/agents/<agentId>/qmd/, refreshes collections in the background, and by default updates roughly every 5 minutes. That is pretty good for a self-hosted stack.

{
  memory: {
    backend: "qmd"
  }
}

{
  memory: {
    backend: "qmd",
    qmd: {
      paths: [
        { name: "docs", path: "~/notes", pattern: "**/*.md" }
      ]
    }
  }
}

{
  memory: {
    backend: "qmd",
    qmd: {
      sessions: { enabled: true }
    }
  }
}

Step 5: Layer Active Memory on Top

The new Active Memory plugin is one of the most interesting additions in OpenClaw's April 2026 release. The changelog describes it as an optional blocking memory sub-agent that runs before the main reply for eligible conversational sessions. In practice, that means OpenClaw gets one bounded chance to search memory before the main model answers.

That solves a real problem: standard memory search is reactive. Either the agent has to decide to search, or the user has to say "remember this" or "search memory." Active Memory makes recall more natural.

Key configuration details: the recommended timeout is 3,000 to 5,000 ms for message mode, with the safe default in docs at 15,000 ms. The default summary budget is 220 characters. The default allowed chat type is direct. Example debug latency shown in docs is 842 ms.

Start with direct messages only, leave transcript persistence off, and tune from there. Hidden personalization in every group chat is a fast way to create confusing behavior.

{
  plugins: {
    entries: {
      "active-memory": {
        enabled: true,
        config: {
          enabled: true,
          agents: ["main"],
          allowedChatTypes: ["direct"],
          modelFallback: "google/gemini-3-flash",
          queryMode: "recent",
          promptStyle: "balanced",
          timeoutMs: 15000,
          maxSummaryChars: 220,
          persistTranscripts: false,
          logging: true
        }
      }
    }
  }
}

Real-World Setup Patterns

Pattern 1: Simple personal assistant. Use builtin memory search with OpenAI or Gemini. Add temporal decay. Skip QMD unless recall quality becomes a bottleneck.

Pattern 2: Research-heavy operator. Use QMD, index extra paths, enable MMR, and keep session transcript indexing on. This is ideal for someone who wants their agent to remember old project notes, technical docs, and prior investigations.

Pattern 3: Relationship-heavy chat assistant. Use memory search plus Active Memory, limited to direct chats. This is the best fit when preferences, habits, routines, and long-term personal context matter more than deterministic automation.

Troubleshooting OpenClaw Memory Search

This is the section most people actually need.

Problem 1: Memory search returns no results. The likely cause is that the index is empty or embeddings are not configured. Run openclaw memory status, force a reindex with openclaw memory index --force, and check that your embedding provider is actually configured. If using local embeddings, make sure the model downloaded correctly.

Problem 2: Results are only exact keyword matches. The likely cause is that vector embeddings are unavailable, so semantic retrieval is not doing its job. Run openclaw memory status --deep, confirm the embedding provider resolves, check env vars such as OPENAI_API_KEY or GEMINI_API_KEY, and reindex after fixing provider config.

Problem 3: QMD times out on slower hardware. OpenClaw's docs say the default QMD timeout is 4000 ms. Increase it by setting memory.qmd.limits.timeoutMs to 120000 in your config.

Problem 4: Empty results in group chats. The likely cause is QMD scope rules. By default, QMD search is surfaced in direct and channel sessions, not groups. Review memory.qmd.scope and explicitly allow the chat types you want.

Problem 5: Too many duplicate snippets. The likely cause is no MMR and no recency tuning. Enable MMR, enable temporal decay, and reduce redundant daily note sprawl.

Problem 6: Active Memory feels slow. The likely cause is you enabled it with a heavy model, broad context, and generous timeout. Switch to queryMode: "message" or "recent", keep thinking: "off", use a lightweight fallback model, and start with direct sessions only.

Performance and Cost Tradeoffs

Here is the honest breakdown of each setup option.

Builtin with OpenAI embeddings offers high quality at low to moderate cost with low complexity — best for most users. Builtin with Gemini embeddings is similar in quality and cost but also supports multimodal memory. Builtin with local embeddings provides medium to high quality at very low recurring cost with medium complexity — best for privacy-focused users. The QMD backend delivers the highest local quality at very low recurring cost after setup with medium to high complexity — ideal for power users, researchers, and large note collections. Active Memory layered on top improves conversational recall but adds inference latency at medium complexity — best for direct-chat assistants with rich personal context.

If your setup is still early, do not jump straight into the most elaborate stack. Start with builtin memory search, prove that your files are structured well, then graduate to QMD or Active Memory when you can explain exactly what problem you are solving.

Best Practices That Actually Move the Needle

These seven practices consistently make the biggest difference in memory search quality.

Write better notes, not more notes. Retrieval quality is downstream of note quality.
Keep MEMORY.md evergreen. Save stable facts there. Use daily files for transient stuff.
Turn on temporal decay once history grows. Otherwise old junk crowds new context.
Use QMD for breadth, not vanity. If you do not need extra paths or transcript search, builtin is fine.
Limit Active Memory to human-facing chat. It is a conversational enrichment layer, not a universal inference feature.
Inspect before tuning blindly. OpenClaw supports /verbose on and /trace on for Active Memory debugging.
Reindex after major config changes. Especially after swapping embedding models or providers.

Conclusion

If you want OpenClaw memory search to be useful, the winning formula is simple: start with the builtin engine, use a real embedding provider, enable temporal decay and MMR when your history grows, then adopt QMD or Active Memory only when you can justify the added complexity. That is the difference between an agent that occasionally remembers something cool and an agent that feels consistently sharp. And yes, that difference is worth the setup.

Skip the setup? DoneClaw deploys OpenClaw for you — $29/mo, cancel anytime, zero configuration.

Skip 60 minutes of setup — deploy in 60 seconds

DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.

Deploy Now

Frequently asked questions

What is OpenClaw memory search?

It is the retrieval layer that searches your memory files using semantic embeddings, keyword search, or both, then returns the most relevant snippets.

Does OpenClaw memory search work without an API key?

Yes. You can use provider: "local", provider: "ollama", or the QMD backend for local-first retrieval, though setup is a bit more involved.

What is the difference between memory search and Active Memory?

Memory search is the retrieval engine. Active Memory is a plugin-owned sub-agent that runs before the main reply to proactively surface relevant memories in eligible chat sessions.

When should I use QMD instead of the builtin engine?

Use QMD when you want better reranking, extra indexed directories, transcript recall, and a more capable local-first retrieval stack.

Why is OpenClaw memory search not finding obvious notes?

Usually because embeddings are not configured, the index is stale, QMD scope rules block the search, or your note structure is messy enough to sabotage ranking.