Setup & Deployment

OpenClaw with Ollama: Running Local LLMs for Complete Privacy and Control

17 min read · Updated 2026-03-07

By DoneClaw Team · We run managed OpenClaw deployments and write from hands-on production experience.

Imagine having a powerful AI assistant that runs entirely on your own hardware—no API calls to external servers, no data leaving your network, and no usage limits. That's exactly what combining OpenClaw with Ollama delivers. Whether you're a developer wanting full control over your AI stack, a privacy-conscious user, or someone looking to avoid ongoing API costs, this setup gives you a self-contained AI agent that works with your messaging apps while keeping everything local. In this guide, you'll learn how to set up OpenClaw with Ollama from scratch, configure it correctly (avoiding the common pitfalls that trip up most users), choose the right model for your hardware, and troubleshoot the issues that commonly arise when running local language models with OpenClaw.

Why Run OpenClaw with Local LLMs?

The combination of OpenClaw and Ollama represents a fundamentally different approach to AI assistants. Instead of relying on cloud APIs like OpenAI, Anthropic, or Google, you run the language model directly on your machine or server. This approach offers several compelling advantages:

**Complete Data Privacy**: Your conversations, files, and any data processed by your AI assistant never leave your local network. This is particularly important for developers working with sensitive code, businesses handling confidential information, or anyone who simply values privacy.

**No API Costs**: While cloud AI APIs charge per token, local models have no marginal cost. Once you've invested in the hardware, running a local LLM is effectively free. For heavy users, this can translate to significant savings compared to paying for API access.

**No Rate Limits or Availability Issues**: Local models run on your hardware, meaning you're not subject to API rate limits, service outages, or changes in provider pricing. Your AI assistant is available 24/7 regardless of external factors.

**Customization Control**: With local models, you have full control over the model configuration, context length, and system prompts. You can optimize for your specific use case without relying on third-party configurations.

Prerequisites

Before setting up OpenClaw with Ollama, ensure you have the following:

**A computer or server** running Linux, macOS, or Windows (via WSL2)
**NVIDIA GPU with CUDA** (recommended for acceptable performance) or Apple Silicon (M1/M2/M3)
**At least 16GB RAM** for smaller models; 32GB+ recommended for larger models
**GPU with 6GB+ VRAM** for local inference (though CPU-only inference is possible)
**Docker and Docker Compose** (if running OpenClaw in containers)
**Basic command-line familiarity**

Step 1: Install Ollama

Ollama is the backbone of your local LLM setup. It provides a simple way to run open-source language models locally with an OpenAI-compatible API.

**macOS and Linux Installation**

The simplest way to install Ollama is via the official installer:

This script installs Ollama as a system service and sets up the necessary dependencies.

**Windows Installation**

On Windows, Ollama runs inside WSL2 (Windows Subsystem for Linux). First, enable WSL2:

Then, inside your WSL2 terminal, run the same installation command:

**Verify Ollama is Running**

After installation, verify Ollama is working:

This should return an empty list initially. You'll download your first model in the next step.

curl -fsSL https://ollama.com/install.sh | sh

wsl --install

curl -fsSL https://ollama.com/install.sh | sh

ollama list

Step 2: Choose and Install Your Model

One of the most critical decisions is selecting the right model. The model you choose impacts response quality, speed, and the hardware requirements.

**Recommended Models for OpenClaw**

According to Ollama's official documentation and community testing, these models work best with OpenClaw: Llama 3.2 3B (~2GB, 6GB VRAM) for budget setups and basic tasks. Phi-4 (~7GB, 8GB VRAM) for good reasoning with lower resources. Qwen 2.5 3B (~2GB, 6GB VRAM) for code generation and efficiency. Llama 3.1 8B (~5GB, 8GB VRAM) for balanced performance. Mistral 7B (~4GB, 8GB VRAM) for general purpose, fast inference. Qwen 2.5 Coder 14B (~9GB, 12GB VRAM) for code-heavy tasks. DeepSeek-R1 14B (~9GB, 12GB VRAM) for advanced reasoning. GLM-4.7 Flash (~25GB, 25GB VRAM) for best local performance.

**Important**: OpenClaw requires a context window of at least 64k tokens when using local models. This means you need significantly more memory than just the model weights. The hardware recommendations above account for reasonable context sizes.

**Installing a Model**

To install a model, use the `ollama pull` command:

The download size varies by model—expect anywhere from 2GB to 25GB depending on your choice.

**Verify Model Installation**

You should see your installed model(s) listed with their sizes.

# For a coding-focused setup
ollama pull qwen2.5-coder:14b

# For a general-purpose assistant
ollama pull llama3.1:8b

# For best local reasoning (requires more VRAM)
ollama pull deepseek-r1:14b

ollama list

Step 3: Configure OpenClaw to Use Ollama

This is where many users run into problems. The configuration must be exact—small mistakes lead to the frustrating "0/200k tokens" issue where OpenClaw appears to connect but never receives responses.

**Basic Configuration**

Edit your OpenClaw configuration file (typically located at `~/.openclaw/openclaw.json`):

**Critical Configuration Points**:

1. **baseUrl MUST end with `/v1`**: This is the most common mistake. Use `http://127.0.0.1:11434/v1` not `http://127.0.0.1:11434`.

2. **api must be set correctly**: Use `"openai-responses"` or `"openai-completions"`—not the default format that works with cloud APIs.

3. **Model ID must match exactly**: The `id` field must match exactly what `ollama list` shows, including the colon and version tag (e.g., `qwen2.5-coder:14b`).

4. **apiKey can be any string**: Ollama doesn't validate the API key, but OpenClaw requires the field to be present.

**Using Model Auto-Discovery**

Alternatively, you can let OpenClaw auto-discover your Ollama models by NOT specifying an explicit models.providers.ollama block. Simply ensure Ollama is running and accessible.

{
  "models": {
    "providers": {
      "ollama": {
        "api": "openai-responses",
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-is-awesome",
        "id": "qwen2.5-coder:14b"
      }
    }
  }
}

Step 4: Docker Configuration (If Using Containers)

If you're running OpenClaw in Docker (the recommended approach for most users), networking works differently. Inside a container, `127.0.0.1` refers to the container itself, not your host machine.

**Option A: Use host.docker.internal (Docker Desktop)**

**Option B: Use --network=host (Linux)**

If you're on Linux without Docker Desktop, add this to your docker-compose.yml:

Then use `http://127.0.0.1:11434/v1` in your configuration.

**Option C: Use WSL2 with Ollama**

WSL2 has its own IP address, so you can't use localhost directly. The recommended approach is:

1. Start Ollama with explicit host binding:

2. Find your WSL2 IP: `ip addr show eth0`

3. Use that IP in your OpenClaw config:

{
  "models": {
    "providers": {
      "ollama": {
        "api": "openai-responses",
        "baseUrl": "http://host.docker.internal:11434/v1",
        "apiKey": "ollama-is-awesome",
        "id": "qwen2.5-coder:14b"
      }
    }
  }
}

services:
  openclaw:
    network_mode: host
    # ... rest of config

OLLAMA_HOST=0.0.0.0:11434 ollama serve

{
  "models": {
    "providers": {
      "ollama": {
        "api": "openai-responses",
        "baseUrl": "http://<your-wsl2-ip>:11434/v1",
        "apiKey": "ollama-is-awesome",
        "id": "qwen2.5-coder:14b"
      }
    }
  }
}

Skip 60 minutes of setup — deploy in 60 seconds

DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.

Deploy Now

Step 5: Connect Messaging Channels

Once your local LLM is working, connect it to your messaging apps:

This opens the interactive configuration for connecting Telegram, Discord, WhatsApp, Slack, or iMessage. Follow the prompts to set up your preferred channels.

openclaw configure --section channels

Step 6: Test Your Setup

After configuration, test that everything works:

If you see tokens counting up from 0/200k, your setup is working. If it stays at 0, check the troubleshooting section below.

# Restart the gateway to pick up new configuration
openclaw gateway restart

# Check the status
openclaw doctor

# Test with a simple message through your connected channel

Hardware Recommendations by Use Case

Your hardware determines what you can realistically run. Here's a practical guide:

**Budget Setup (~$500-800)**

Hardware: NVIDIA RTX 3060 (12GB VRAM) or RTX 4060. RAM: 16GB system RAM. Recommended Model: Llama 3.2 3B or Qwen 2.5 3B. Experience: Decent for simple conversations, basic coding tasks. Response times of 2-5 seconds.

**Mid-Range Setup (~$1500-2500)**

Hardware: NVIDIA RTX 4070 Super (12GB VRAM) or RTX 4080 (16GB). RAM: 32GB system RAM. Recommended Model: Llama 3.1 8B or Qwen 2.5 Coder 14B. Experience: Good for most tasks. 1-3 second response times. Can handle complex coding and reasoning.

**Enthusiast/Production Setup (~$4000+)**

Hardware: NVIDIA RTX 4090 (24GB) or multiple 3090s/4090s. RAM: 64GB+ system RAM. Recommended Model: DeepSeek-R1 14B or GLM-4.7 Flash. Experience: Near-cloud-model quality. Sub-second response times. Handles large context windows well.

**Apple Silicon (Mac)**

M1/M2/M3 with 18GB+ unified memory can run 7B models at decent speeds. M3 Pro/Max with 36GB+ can run 14B models. Note: Performance is good but not as fast as comparable NVIDIA hardware for local inference.

Troubleshooting Common Issues

Even with correct configuration, local LLM setups can be finicky. Here are solutions to the most common problems:

**Issue 1: "0/200k Tokens" - No Response**

Symptoms: The OpenClaw TUI shows tokens at 0/200k and never progresses. The model appears connected but produces no output.

Causes: Incorrect API format in config, wrong baseUrl (missing /v1), Docker networking issues, or model not compatible with tool calling.

Solutions: 1. Double-check your baseUrl ends with `/v1`. 2. Verify `api` is set to `"openai-responses"` or `"openai-completions"`. 3. For Docker: use `host.docker.internal` or `--network=host`. 4. Try a different model—some models don't handle tool calls well.

**Issue 2: Empty Responses**

Symptoms: The model responds but with blank or truncated output.

Causes: Model too large for available memory, context window exhausted, or model not designed for instruction following.

Solutions: 1. Reduce context length in settings. 2. Use a smaller model. 3. Add more system RAM or VRAM. 4. Try a different model known for instruction following (Llama 3.1, Qwen).

**Issue 3: Slow Response Times**

Symptoms: Responses take several minutes.

Causes: Model too large for GPU, using CPU instead of GPU, or insufficient VRAM causing swap usage.

Solutions: 1. Use a smaller model. 2. Ensure CUDA is properly installed: `nvidia-smi`. 3. Check Ollama isn't using CPU: `ollama ps`. 4. Close other GPU applications.

**Issue 4: Ollama Service Not Found**

Symptoms: Connection refused errors.

Causes: Ollama service not running, wrong IP address (especially in WSL2/Docker), or firewall blocking port 11434.

Solutions: 1. Start Ollama: `ollama serve`. 2. Verify it's running: `curl http://127.0.0.1:11434/api/tags`. 3. Check firewall rules. 4. For WSL2, use the WSL2 IP, not localhost.

**Issue 5: Model Works with Curl but Not OpenClaw**

Symptoms: You can query Ollama directly with curl, but OpenClaw fails.

Causes: Mismatched API format, wrong Content-Type headers, or model ID doesn't match exactly.

Solutions: 1. Test with the exact format OpenClaw uses. 2. Verify model ID exactly matches `ollama list` output. 3. Check OpenClaw logs: `openclaw gateway logs`.

Advanced: Optimizing Your Local Setup

Once you have the basics working, here are some optimizations:

**Increase Context Window**

For better long conversations, increase the context window. Add to your config:

Note: This significantly increases memory usage.

**Use Model Quantization**

Ollama supports quantized models that use less VRAM with minimal quality loss. Look for `-q4` or `-q5` variants:

**Run Multiple Models**

You can configure OpenClaw to use different models for different tasks:

This lets you fall back to cloud models when local models struggle.

{
  "models": {
    "defaults": {
      "maxTokens": 128000
    }
  }
}

ollama pull qwen2.5-coder:14b-q4_K_M

{
  "models": {
    "providers": {
      "ollama": {
        "api": "openai-responses",
        "baseUrl": "http://127.0.0.1:11434/v1",
        "apiKey": "ollama-is-awesome",
        "id": "qwen2.5-coder:14b"
      },
      "cloud": {
        "provider": "openai",
        "id": "gpt-4o"
      }
    }
  }
}

Performance Benchmarks

If you're wondering what to expect in terms of speed, here's a realistic breakdown based on community testing with various hardware configurations: RTX 3060 12GB with Llama 3.2 3B achieves ~45 tokens/second (~2.2 seconds per 100-word response). RTX 4070 Super with Llama 3.1 8B achieves ~35 tokens/second (~2.9 seconds). RTX 4070 Super with Qwen 2.5 Coder 14B achieves ~18 tokens/second (~5.5 seconds). RTX 4090 with Qwen 2.5 Coder 14B achieves ~42 tokens/second (~2.4 seconds). RTX 4090 with DeepSeek-R1 14B achieves ~38 tokens/second (~2.6 seconds). M3 Max Mac with Llama 3.1 8B achieves ~25 tokens/second (~4.0 seconds).

These numbers represent typical inference speeds with medium-quality settings. Actual performance varies based on system load, temperature, and specific model configurations.

Security Considerations

When running OpenClaw with Ollama locally, keep these security practices in mind:

**Network Exposure**: Never expose your Ollama instance to the public internet. The default configuration binds to localhost only, which is secure. If you need remote access, use a VPN or secure tunnel.

**File Permissions**: Ensure your OpenClaw configuration files have appropriate permissions (600 or 400) to prevent unauthorized access to any stored credentials.

**Model Sources**: Only download models from trusted sources like the official Ollama library. Malicious models could contain harmful code or backdoors.

**Memory Security**: When running sensitive workloads, consider enabling full disk encryption. Local models process data in system RAM, which could potentially be accessed if physical security is compromised.

Conclusion

Running OpenClaw with Ollama transforms your AI assistant from a cloud-dependent service into a truly personal tool. You get complete privacy, no ongoing API costs, and 24/7 availability independent of external services. The setup requires some upfront effort—choosing hardware, installing software, configuring correctly—but the payoff is a self-contained AI system you fully own. Start with a smaller model if your hardware is limited, and upgrade as you get comfortable. The local LLM ecosystem is evolving rapidly, with new models and optimizations arriving monthly. Your local AI assistant today will be significantly more capable a year from now, all without changing your infrastructure.

Skip the setup? DoneClaw deploys OpenClaw for you — $29/mo with 7-day free trial, zero configuration.

Skip 60 minutes of setup — deploy in 60 seconds

DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.

Deploy Now

Frequently asked questions

Can I use OpenClaw with Ollama without a GPU?

Yes, but it's not recommended. CPU-only inference on modern models is extremely slow—expect response times of several minutes per message. A GPU makes local LLM practical. If you must run without a GPU, use smaller models (3B parameters or less).

How much does it cost to run OpenClaw with Ollama?

The marginal cost is zero after initial hardware investment. A typical mid-range setup (RTX 4070, 32GB RAM) costs around $1500-2000. Electricity costs are minimal—around $5-15/month depending on usage.

Is local LLM quality comparable to cloud models like GPT-4?

For many tasks, modern local models (14B+) come remarkably close. For general conversation and coding assistance, models like Qwen 2.5 Coder 14B or DeepSeek-R1 14B perform impressively. For the most advanced reasoning, cloud models still have an edge, but the gap is closing rapidly.

Can I access my local OpenClaw from outside my home network?

Yes, but requires port forwarding or a VPN. For security, it's best to use a tunnel service like ngrok or Tailscale if you need remote access. Never expose OpenClaw directly to the internet without authentication.

What's the difference between Ollama and running models directly?

Ollama provides an OpenAI-compatible API, handles model management, and optimizes runtime. It makes running local models nearly as easy as using cloud APIs. Without Ollama, you'd need to manage model files, dependencies, and API servers manually.

Can I switch between local and cloud models?

Yes! You can configure multiple providers and switch between them in your config, or use local models as the primary with cloud models as fallbacks.

How do I update my local models?

Simply pull the latest version: `ollama pull <model-name>`. This downloads updates while preserving your configuration.

What's the recommended model for coding tasks?

Qwen 2.5 Coder 14B is currently recommended for coding-heavy workflows. It was specifically trained on code and performs exceptionally well on programming tasks. For general-purpose coding and conversation, Llama 3.1 8B is an excellent choice.