Setup & Deployment

How to Run OpenClaw with a Local LLM: Complete Ollama Setup Guide (2026)

14 min read · Updated 2026-04-09

By DoneClaw Team · We run managed OpenClaw deployments and write from hands-on production experience.

Running OpenClaw with a local LLM is the right move if you care about privacy, predictable costs, or low-latency local workflows. The good news: OpenClaw already supports this well. The bad news: a lot of guides still get the details wrong — especially around Ollama URLs, model selection, Docker networking, and tool calling. This guide fixes that. You’ll learn the exact setup that works in 2026, how to choose a model that won’t crawl, how to avoid the most common mistakes, and when local inference is genuinely better than cloud APIs. If you’re brand new to the platform, start with the OpenClaw beginner’s guide. If you already have OpenClaw running, keep going.

Why run OpenClaw with a local LLM?

A local model changes the economics and privacy profile of your agent.

Instead of sending every prompt to OpenAI, Anthropic, or Gemini, you run inference on your own machine through a local runtime such as Ollama. OpenClaw then talks to that runtime over HTTP.

That gives you a few immediate wins:

It also comes with tradeoffs:

That tradeoff is worth it for codebases, private notes, local documents, internal automation, and hobby setups where recurring API spend feels stupid.

Better privacy: prompts stay on your machine or local network
Lower marginal cost: after hardware and power, token cost is effectively $0
Offline resilience: your agent can still work during internet hiccups if your channel and tools allow it
No vendor rate limits: useful for heavy internal automation
Flexible routing: you can still keep cloud fallbacks for harder tasks
Local models are usually slower at first token than premium cloud models unless you have serious hardware
Tool use quality varies a lot by model
Long-context prompts can feel sluggish on weak CPUs or small GPUs
You become the ops team

What OpenClaw officially supports

OpenClaw’s current docs make three things clear:

1. Ollama is the easiest local path.

2. OpenClaw supports Ollama’s native API.

3. You should not point OpenClaw at Ollama’s `/v1` OpenAI-compatible endpoint if you care about reliable tool calling.

That last point matters. The official provider docs explicitly warn that using `http://host:11434/v1` can break tool use and cause models to print raw tool JSON instead of calling tools properly. The correct base URL is the native Ollama endpoint, like this:

If you only remember one thing from this article, remember that.

For broader setup context, these are also worth reading alongside this guide:

Best AI Model for OpenClaw in 2026
How to Run OpenClaw for Free
How to Run OpenClaw in Docker
OpenClaw Best Practices
How Much Does OpenClaw Actually Cost?

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://127.0.0.1:11434",
        apiKey: "ollama-local",
        api: "ollama"
      }
    }
  }
}

Local vs cloud: when does local actually make sense?

Here’s the blunt version: local is fantastic for privacy and cost control, but not every setup is powerful enough to make it pleasant.

Scenario: Private code, internal notes, personal docs; Local LLM is a good fit?: Yes; Why: Data stays local

Scenario: Heavy daily usage with recurring API bills; Local LLM is a good fit?: Yes; Why: Local can cut ongoing cost hard

Scenario: Weak laptop with 8 GB RAM; Local LLM is a good fit?: Maybe not; Why: Performance will annoy you

Scenario: Multi-step coding agent on a strong desktop or Mac; Local LLM is a good fit?: Yes; Why: Great balance of speed and privacy

Scenario: Best possible reasoning quality today; Local LLM is a good fit?: Usually no; Why: Top cloud models still win

Scenario: Always-on low-power gateway; Local LLM is a good fit?: Mixed; Why: Good for gateway logic, not for huge models

A useful pattern is hybrid routing:

That tends to beat the all-local and all-cloud extremes.

local for cheap, private, routine work
cloud for high-stakes reasoning, long-form writing, or complex coding

Hardware reality check

This is where most “just run it locally” advice turns into nonsense.

OpenClaw itself is lightweight compared with the model runtime. The real question is whether your hardware can serve the model fast enough for an agent workflow.

A practical hardware guide

Hardware: CPU-only mini PC; Realistic model class: 3B–8B; Good for: light chat, simple automations

Hardware: 16 GB unified memory Mac; Realistic model class: 7B–14B quantized; Good for: personal assistant tasks

Hardware: 24 GB GPU or 64 GB unified memory; Realistic model class: 14B–32B quantized; Good for: strong day-to-day OpenClaw use

Hardware: 48 GB+ GPU / 96–128 GB unified memory; Realistic model class: 32B+; Good for: serious coding and research workflows

You do not need a monster box to start. But you do need realistic expectations.

If you want your agent to feel crisp with tools, file access, and long prompts, 14B to 32B quantized models are usually the sweet spot. Small 7B-class models can work, but they often get weird around tool calls, structured outputs, and longer context.

Best local model options for OpenClaw right now

OpenClaw’s Ollama docs currently suggest `gemma4` as a local default, while also supporting stronger options you pull yourself. In practice, here’s the decision tree most people should use.

Model tier: Small; Example models: Gemma 4, smaller Qwen variants; Best for: low-cost chat, testing; Caution: weaker tool reliability

Model tier: Mid-range; Example models: 14B–24B models; Best for: best balance for most users; Caution: needs decent memory

Model tier: Strong local; Example models: 32B-class models; Best for: coding, agent workflows, longer prompts; Caution: slower on weak hardware

Model tier: Huge MoE / enterprise; Example models: giant local or semi-local stacks; Best for: specialized setups; Caution: overkill for most people

My recommendation

If your goal is “I want OpenClaw to feel useful, not merely functional,” choose one of these paths:

1. Budget setup: a solid 7B–14B model

2. Best overall: a 14B–32B quantized model

3. Serious coding: a 32B-class model if your hardware can handle it

If you don’t know what to pick, start with the model OpenClaw discovers automatically in Ollama, test speed and tool behavior, then upgrade once you feel the bottleneck.

1) Install Ollama

Install Ollama on the same machine as OpenClaw or on another machine on your LAN.

Then verify it works:

In another terminal:

If that returns JSON, the runtime is alive.

curl -fsSL https://ollama.com/install.sh | sh

ollama --version
ollama serve

curl http://127.0.0.1:11434/api/tags

2) Pull a model

Start simple.

Or choose a stronger model you already know your hardware can support.

Check what’s installed:

ollama pull gemma4

ollama list

3) Run OpenClaw onboarding

The cleanest setup path is still onboarding.

Choose Ollama when prompted.

OpenClaw will:

For headless or scripted installs, use non-interactive onboarding:

ask for the base URL, defaulting to `http://127.0.0.1:11434`
discover available models
suggest defaults
let you choose local-only or cloud-plus-local modes

openclaw onboard

openclaw onboard --non-interactive \
  --auth-choice ollama \
  --custom-base-url "http://127.0.0.1:11434" \
  --custom-model-id "gemma4" \
  --accept-risk

4) Verify model discovery

OpenClaw can auto-discover Ollama models when `OLLAMA_API_KEY` is set and you haven’t defined a manual provider block.

You should see your Ollama models with IDs like:

To set the default model:

export OLLAMA_API_KEY="ollama-local"
openclaw models list

ollama/gemma4
ollama/qwen2.5-coder:32b

openclaw models set ollama/gemma4

5) Optional: configure it manually

If you want a remote Ollama host or a tightly controlled config, define the provider explicitly.

That manual route is worth it when:

Ollama runs on another host
you want explicit context window limits
you want a curated model list instead of auto-discovery

{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://ollama-host:11434",
        apiKey: "ollama-local",
        api: "ollama",
        models: [
          {
            id: "gpt-oss:20b",
            name: "GPT-OSS 20B",
            reasoning: false,
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 8192,
            maxTokens: 81920
          }
        ]
      }
    }
  },
  agents: {
    defaults: {
      model: {
        primary: "ollama/gpt-oss:20b"
      }
    }
  }
}

Skip 60 minutes of setup — deploy in 60 seconds

DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.

Deploy Now

6) Test a real workflow

Don’t stop at `models list`. Test the thing you actually care about.

Examples:

Then send a real request through your preferred channel:

That tells you more than synthetic benchmarks ever will.

summarize a local file
draft a response
use a safe tool
compare latency against your old cloud setup

openclaw status
openclaw doctor
openclaw logs --follow

Docker and remote-host gotchas

A lot of failed setups are just networking mistakes.

If OpenClaw runs in Docker and Ollama runs on the host, `localhost` inside the container is the container itself, not your host machine.

Common working patterns

OpenClaw location: same host, no Docker; Ollama location: same host; Base URL to try: `http://127.0.0.1:11434`

OpenClaw location: Docker container; Ollama location: host machine; Base URL to try: `http://host.docker.internal:11434` on Docker Desktop

OpenClaw location: Docker on Linux; Ollama location: host machine; Base URL to try: host gateway mapping or host LAN IP

OpenClaw location: one server; Ollama location: another server on LAN; Base URL to try: `http://LAN-IP:11434`

If the model is on another machine, make sure that machine allows the connection and that Ollama is listening on the right interface.

Performance tuning that actually matters

If your local setup feels slow, these are the levers that matter most.

1) Use a smaller model first

This sounds obvious, but people still pull a huge model, hate the latency, then blame OpenClaw.

Start with a model that gives you acceptable time-to-first-token. Then scale up only if quality is the actual bottleneck.

2) Keep context under control

Agents are context-hungry. Large system prompts, memories, tool schemas, and long threads add up fast.

That’s why a model that feels fine in a raw chat window can feel sluggish in an agent runtime.

If performance sucks:

The persistent memory guide and the best practices article both help here.

3) Don’t run huge models on low-power ARM and expect magic

OpenClaw’s own Raspberry Pi docs are sane about this: the Pi is a fine gateway host because the heavy model work can happen elsewhere. The docs recommend a Raspberry Pi 4 or 5 with 2 GB minimum and 4 GB recommended for the gateway itself.

That is not the same thing as “run a giant local coding model on a Pi and enjoy it.” Different problem.

4) Use storage and memory sensibly

On lower-power systems, I’d rather run a smaller model well than a bigger model badly. Fast storage, enough RAM, and stable thermals beat bragging rights.

shorten giant system files
reduce unnecessary tool exposure
keep threads focused
prefer selective memory retrieval over stuffing everything into context

Real-world issues people are hitting in 2026

This is where the docs and GitHub issues are useful, because they show the difference between theory and the actual mess people run into.

ARM and Raspberry Pi compatibility issues still exist

A January 2026 GitHub issue documented an ARM startup failure tied to a missing native clipboard binary on Raspberry Pi. Another March 2026 issue showed a more serious CLI handshake problem on Raspberry Pi 5 arm64 systems: with a default 3-second handshake timeout, some CLI commands failed 100% of the time until the timeout was patched upward. In that report, the affected command succeeded when the timeout was raised to 15–30 seconds, and the measured handshake took roughly 7–8 seconds on the Pi 5.

That doesn’t mean “don’t use ARM.” It means:

If you want local inference plus a small always-on box, a smart architecture is often:

That split works surprisingly well.

ARM works, but it’s not always the smoothest path
lower-power systems magnify startup and initialization costs
for local inference, the model runtime matters even more than the gateway host
OpenClaw gateway on a Pi, mini PC, or VPS
Ollama on a stronger desktop, Mac, or workstation

Problem: OpenClaw can’t see Ollama

Symptoms

Fixes

1. Confirm Ollama is running:

2. Set the environment variable OpenClaw expects:

3. If using auto-discovery, remove any conflicting manual `models.providers.ollama` block.

`openclaw models list` doesn’t show local models
provider appears unavailable

curl http://127.0.0.1:11434/api/tags

export OLLAMA_API_KEY="ollama-local"

Problem: tool calling is flaky or broken

Symptoms

Fix

Do not use Ollama’s `/v1` endpoint. Use the native base URL without a suffix:

Not:

This is the single most common bad config.

the model prints JSON instead of calling tools
tool actions fail inconsistently

http://127.0.0.1:11434

http://127.0.0.1:11434/v1

Problem: Docker container can’t reach the local model

Symptoms

Fixes

1. Try `host.docker.internal` on Docker Desktop.

2. On Linux, use your host LAN IP or a host-gateway mapping.

3. Confirm firewall rules are not blocking port `11434`.

connection refused
timeouts when using `localhost`

Problem: responses are painfully slow

Symptoms

Fixes

1. Switch to a smaller model.

2. Trim context-heavy prompts and memory stuffing.

3. Reduce simultaneous workloads.

4. Move Ollama to stronger hardware.

long wait before first token
every action feels heavier than expected

Problem: local model works in chat but feels dumb in agent mode

Why it happens

Agent mode is harder than plain chat. There’s more instruction following, more tool schemas, more context, and more ways to fail.

Fixes

use a stronger model class
reduce unnecessary tools
test a narrower workflow first
keep cloud fallback for the hardest tasks

Best deployment patterns

If you want the shortest path to something sane, pick one of these.

Pattern 1: All-in-one workstation

Best for solo users with decent hardware.

Pattern 2: Lightweight gateway + strong inference box

Best for always-on home setups.

Pattern 3: Hybrid local + cloud routing

Best overall for serious users.

That hybrid setup is the one I’d recommend to most people.

OpenClaw on your main machine
Ollama on the same machine
base URL `http://127.0.0.1:11434`
OpenClaw gateway on a Pi, mini PC, or cheap server
Ollama on a stronger Mac, desktop, or GPU box
base URL points to the inference machine over LAN
local model for routine tasks
cloud fallback for difficult reasoning
lower cost without trashing quality

Conclusion

If you want OpenClaw with a local LLM, use Ollama first, use the native API instead of `/v1`, keep your model size realistic, and don’t confuse “the gateway runs on small hardware” with “the model should too.” That combination gets you the upside people actually want: private inference, near-zero marginal cost, and a setup that doesn’t fall apart the second you ask the agent to do real work. If you want the next step after this article, read OpenClaw for Coding, OpenClaw Docker Complete Guide, and OpenClaw as a Service. They pair well with a local-first setup.

Skip the setup? DoneClaw deploys OpenClaw for you — $29/mo, cancel anytime, zero configuration.

Skip 60 minutes of setup — deploy in 60 seconds

DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.

Deploy Now

Frequently asked questions

Is OpenClaw with a local LLM worth it?

Yes, if privacy, recurring cost, or local control matter to you. No, if your hardware is weak and you expect cloud-model quality and speed from a tiny box.

What is the easiest local LLM setup for OpenClaw?

Ollama. It has the cleanest official OpenClaw support, automatic model discovery, and a straightforward onboarding flow.

Can I use Ollama’s OpenAI-compatible `/v1` endpoint?

You can, but you probably shouldn’t. OpenClaw’s docs explicitly warn that /v1 can break tool calling. Use the native Ollama API instead.

What hardware do I need?

For comfortable day-to-day use, a machine that can handle a 14B–32B quantized model is the sweet spot. Smaller systems work, but quality and latency drop fast.

Can I run OpenClaw on Raspberry Pi with a local model?

You can run the gateway on a Raspberry Pi just fine. Running a serious local model on the same Pi is a different question, and usually not the best idea.

Does OpenClaw support remote Ollama hosts?

Yes. You can point OpenClaw at a remote Ollama host by setting the correct baseUrl and, if needed, defining the provider manually.

Can I mix local and cloud models in OpenClaw?

Yes, and that’s often the smartest setup. Use local for cheap private work and keep cloud fallbacks for hard tasks.