Setup & Deployment
OpenClaw + LM Studio: Complete Local AI Agent Setup Guide (2026)
26 min read · Updated 2026-03-13
By DoneClaw Team · We run managed OpenClaw deployments and write from hands-on production experience.
Running OpenClaw with LM Studio gives you a fully local AI agent that never sends a byte of data to the cloud. No API bills. No rate limits. No privacy compromises. Your conversations, your files, your automations — all processed on hardware you control. This guide walks you through the complete OpenClaw LM Studio setup, from choosing the right GPU and model to configuring hybrid fallback chains that keep your agent running when local inference hiccups. Whether you're privacy-conscious, cost-optimizing, or just tired of paying $20+/month for cloud AI subscriptions, this is the definitive guide for 2026.
Why LM Studio Over Ollama for OpenClaw?
If you've looked into running OpenClaw with local models, you've probably seen the Ollama setup guides. Ollama is excellent — but LM Studio serves a different audience and brings distinct advantages.
LM Studio vs Ollama: Head-to-Head Comparison
The critical difference for OpenClaw: LM Studio supports the OpenAI Responses API, which keeps reasoning tokens separate from final output. This matters for WhatsApp and Telegram channels where you don't want the model's internal "thinking" showing up in messages.
OpenClaw's official documentation recommends LM Studio as the preferred backend for its "best current local stack."
When Ollama Is Still the Better Choice
Ollama wins for headless server deployments, Docker containers, and automated pipelines. If you're running OpenClaw on a Raspberry Pi or a $5/month VPS, Ollama's lighter footprint makes more sense. For desktop and workstation setups, LM Studio is the stronger pick.
- Interface: LM Studio has a full GUI + built-in chat, Ollama is CLI-only
- Model Discovery: LM Studio offers visual search & download, Ollama uses pull commands
- API Compatibility: LM Studio supports OpenAI Responses API, Ollama uses Chat Completions
- GPU Management: LM Studio auto-detects and optimizes, Ollama requires manual configuration
- Multi-Model Loading: LM Studio loads multiple models simultaneously, Ollama runs one at a time by default
- Built-in RAG: LM Studio has native document chat, Ollama requires external tools
- Platform Support: LM Studio runs on Windows, macOS, Linux; Ollama on macOS, Linux (Windows via WSL)
- Best For: LM Studio suits desktop users and visual workflows; Ollama suits server deployments and scripting
Hardware Requirements: What You Actually Need
Let's be honest about hardware. Local LLMs are more accessible than ever, but "accessible" doesn't mean "runs on anything." Here's what you need based on your goals.
Hardware Tiers for OpenClaw + LM Studio
The honest truth about the "Entry" tier: OpenClaw expects large context windows and strong prompt injection resistance. Running a heavily quantized 7B model works for basic tasks, but the OpenClaw docs explicitly warn that "small cards truncate context and leak safety." If you're serious about using OpenClaw as your daily agent, aim for the Sweet Spot tier or higher.
NVIDIA GPU Recommendations (March 2026)
NVIDIA recently published their own guide for running OpenClaw on RTX GPUs, confirming this is a supported workflow. Their Tensor Cores and CUDA acceleration provide the best inference performance for local LLMs.
Apple Silicon note: M2/M3/M4 Macs with 32GB+ unified memory are surprisingly capable. Metal acceleration in LM Studio handles 13B models well, though you'll see lower tokens/second compared to NVIDIA CUDA.
- Entry tier: 8 GB VRAM (RTX 3060), 16 GB RAM — runs 7B quantized (Q4) at ~25-40 tok/s
- Sweet Spot tier: 16 GB VRAM (RTX 4060 Ti), 32 GB RAM — runs 13B quantized or 7B full at ~40-60 tok/s
- Enthusiast tier: 24 GB VRAM (RTX 4090), 64 GB RAM — runs 30B quantized or 13B full at ~80-135 tok/s
- Workstation tier: 48 GB+ VRAM (dual GPU / Mac Studio), 128 GB RAM — runs 70B+ models at ~30-50 tok/s
- Apple Silicon: Unified memory (M2 Pro 16 GB), shared RAM — runs 7-13B models at ~15-30 tok/s
- Best value: RTX 4060 Ti 16GB (~$400) — runs 13B models comfortably with 32K context
- Newer option: RTX 5060 Ti 16GB (~$450) — slightly faster with GDDR7
- Luxury option: RTX 4090 24GB (~$1,600) — runs 30B+ models at 135+ tokens/second
Choosing the Right Model for OpenClaw
Not all local models work well with OpenClaw. The agent framework demands specific capabilities: strong instruction following, tool use, large context windows, and resistance to prompt injection. Here's what actually works in March 2026.
OpenClaw's Official Recommendation: MiniMax M2.5
The OpenClaw docs explicitly recommend MiniMax M2.5 as the best local model. It features a 196K context window (most local models max out at 8-32K), Claude Opus-class performance on software engineering tasks, industry-leading BrowseComp and Wide Search benchmarks for autonomous tool use, best multilingual performance, and is designed for agentic workflows — not just chat, but actual task execution.
The trade-off: M2.5 is large. The full model needs significant VRAM (48GB+ recommended for full precision, 24GB+ for quantized). If your hardware can handle it, this is the model to run.
Model Comparison
Practical recommendation: If you have 24GB VRAM, go with MiniMax M2.5 (quantized). If you have 16GB, Qwen3-Next or Llama 4 Scout are excellent alternatives. For 8GB cards, Mistral Large 3 (quantized) is your best bet, but expect limitations on complex agentic tasks.
- MiniMax M2.5: MoE (~400B total, ~45B active), 24 GB min VRAM (Q4), 196K context, excellent agent quality, ~45 tok/s on RTX 4090
- Qwen3-Next: 32B params, 16 GB min VRAM (Q4), 128K context, very good agent quality, ~65 tok/s on RTX 4090
- Llama 4 Scout: 17B active (109B total MoE), 16 GB min VRAM (Q4), 512K context, very good agent quality, ~55 tok/s on RTX 4090
- DeepSeek V3.2: MoE (~37B active), 24 GB min VRAM (Q4), 128K context, very good agent quality, ~40 tok/s on RTX 4090
- Gemma 3 27B: 27B params, 16 GB min VRAM (Q4), 128K context, good agent quality, ~50 tok/s on RTX 4090
- Mistral Large 3: 24B params, 12 GB min VRAM (Q4), 128K context, good agent quality, ~70 tok/s on RTX 4090
- Models to avoid: tiny models (1-3B), pure chat models without instruction tuning, models with <16K context windows, and heavily quantized large models (Q2/Q3)
Step-by-Step Setup: OpenClaw + LM Studio
Step 1: Install LM Studio. Download LM Studio from lmstudio.ai. On Windows, download the .exe installer and run it. On macOS, download the .dmg, drag to Applications, and grant permissions when prompted. On Linux, use the install script. Launch LM Studio after installation and skip any welcome prompts.
Step 2: Download Your Model. Click "Discover" in the left sidebar, search for your chosen model (e.g., "MiniMax M2.5" or "Qwen3"), and select the largest quantization your hardware can handle. For 24GB VRAM, pick Q5_K_M or Q6_K. For 16GB, use Q4_K_M. Click Download and wait. Pro tip: while the model downloads, proceed with OpenClaw installation.
Step 3: Start the LM Studio Server. Go to "Local Server" in the left sidebar, select your downloaded model from the dropdown, set the context window to 32,768 tokens minimum (higher if your VRAM allows), and click "Start Server." Verify it's running by testing with curl.
Step 4: Install OpenClaw. If you don't have OpenClaw installed yet, use the install script or install via npm (requires Node.js 22+).
Step 5: Configure OpenClaw for LM Studio. This is where the magic happens. You need to tell OpenClaw about your local LM Studio server. You can use the onboarding wizard (openclaw onboard) or configure manually for more control.
Key configuration details: "mode": "merge" merges your local provider with existing cloud providers instead of replacing them. "api": "openai-responses" uses the Responses API which keeps reasoning tokens separate from output. "apiKey": "lmstudio" is a placeholder since LM Studio accepts any string. "cost" set to all zeros since inference is local.
Step 6: Set Up the Hybrid Fallback Chain. Running 100% local is great, but what happens when your GPU is busy or LM Studio crashes? Set up a fallback chain with your local model as primary, Claude Sonnet 4.5 as first fallback, and Claude Opus as second fallback. You get privacy-by-default with reliability as a safety net.
Step 7: Connect a Messaging Channel. Your local agent needs a way to reach you. The most common setup is Telegram — run openclaw onboard and select Telegram when prompted.
Step 8: Start Everything. Make sure LM Studio server is running, start the OpenClaw gateway with openclaw gateway start, then verify with openclaw test. Send a test message through your Telegram or Discord bot.
curl -fsSL https://lmstudio.ai/install.sh | bash
curl http://127.0.0.1:1234/v1/models
curl -fsSL https://openclaw.ai/install.sh | bash
npm install -g openclaw@latest
{
"models": {
"mode": "merge",
"providers": {
"lmstudio": {
"baseUrl": "http://127.0.0.1:1234/v1",
"apiKey": "lmstudio",
"api": "openai-responses",
"models": [
{
"id": "minimax-m2.5-gs32",
"name": "MiniMax M2.5 GS32",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 196608,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "lmstudio/minimax-m2.5-gs32" },
"models": {
"lmstudio/minimax-m2.5-gs32": { "alias": "MiniMax Local" }
}
}
}
}
{
"agents": {
"defaults": {
"model": {
"primary": "lmstudio/minimax-m2.5-gs32",
"fallbacks": [
"anthropic/claude-sonnet-4-5",
"anthropic/claude-opus-4-6"
]
},
"models": {
"lmstudio/minimax-m2.5-gs32": { "alias": "MiniMax Local" },
"anthropic/claude-sonnet-4-5": { "alias": "Sonnet" },
"anthropic/claude-opus-4-6": { "alias": "Opus" }
}
}
}
}
# Start OpenClaw gateway
openclaw gateway start
# Verify the connection
openclaw testSkip 60 minutes of setup — deploy in 60 seconds
DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.
Deploy NowAdvanced Configuration
Running Multiple Local Models: LM Studio can load multiple models simultaneously (RAM permitting). This is useful for using different models for different tasks. You can switch between them with /model commands in your chat, or configure different agents to use different models.
GPU Memory Optimization: If you're running close to your VRAM limit, reduce context window (each halving roughly halves VRAM overhead), enable Flash Attention if available (reduces memory usage by 2-4x), use GPU offloading for models that don't fully fit in VRAM, and stick to Q4_K_M quantization as the sweet spot for quality vs. size.
Remote LM Studio: You don't have to run LM Studio on the same machine as OpenClaw. If you have a powerful GPU desktop but want OpenClaw on a server, start LM Studio on the GPU machine, change the bind address to 0.0.0.0, and update the baseUrl in OpenClaw config. Security warning: if doing this over the internet, use a Tailscale tunnel or SSH port forwarding since LM Studio's server has no authentication by default.
Auto-Starting LM Studio on Boot: For a truly always-on setup, you want LM Studio to start automatically. On Linux, use a systemd service. On macOS, add LM Studio to Login Items. On Windows, add a shortcut to shell:startup.
{
"models": {
"mode": "merge",
"providers": {
"lmstudio": {
"baseUrl": "http://127.0.0.1:1234/v1",
"apiKey": "lmstudio",
"api": "openai-responses",
"models": [
{
"id": "minimax-m2.5-gs32",
"name": "MiniMax M2.5 (Main Agent)",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 196608,
"maxTokens": 8192
},
{
"id": "mistral-large-3-q4",
"name": "Mistral Large 3 (Fast Tasks)",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 131072,
"maxTokens": 4096
}
]
}
}
}
}
# SSH tunnel approach (secure)
ssh -L 1234:127.0.0.1:1234 user@gpu-machine-ip
# Then use localhost in OpenClaw config
# "baseUrl": "http://127.0.0.1:1234/v1"
# Create a systemd service file
sudo tee /etc/systemd/system/lmstudio.service << 'EOF'
[Unit]
Description=LM Studio Server
After=network.target
[Service]
Type=simple
User=your-username
ExecStart=/usr/bin/lms server start
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl enable lmstudio
sudo systemctl start lmstudioCost Analysis: Local vs. Cloud
One of the biggest reasons to run OpenClaw locally is cost. Let's put real numbers on it.
Monthly Cost Comparison: Light usage (50 queries/day) costs $8-15/mo with Claude Sonnet or $10-20/mo with GPT-4o, but $0/mo locally. Medium usage (200 queries/day) costs $30-60/mo or $40-80/mo cloud, $0 locally. Heavy usage (500+ queries/day) costs $80-150/mo or $100-200/mo cloud, $0 locally. Agent mode (always-on with heartbeats and cron) costs $50-120/mo or $60-150/mo cloud, $0 locally. Electricity for running a GPU 24/7 is approximately $5-15/month depending on your card and local rates.
Break-Even Analysis: An RTX 4060 Ti 16GB (~$400) with $30-60/mo cloud savings breaks even in 7-13 months. An RTX 4090 24GB (~$1,600) with $80-150/mo savings breaks even in 11-20 months. A Mac Studio M3 Ultra (~$4,000) with $100-200/mo savings breaks even in 20-40 months.
For medium to heavy users, a mid-range GPU pays for itself in under a year. If you're already paying for ChatGPT Plus ($20/mo) or Claude Pro ($20/mo) and can replace it with a local setup, you're looking at a 12-20 month payback period on an RTX 4060 Ti.
Troubleshooting Common Issues
"Connection Refused" When OpenClaw Tries to Reach LM Studio: If OpenClaw logs show ECONNREFUSED 127.0.0.1:1234, confirm LM Studio server is actually running (check the server tab in the GUI), verify the model is loaded, test directly with curl http://127.0.0.1:1234/v1/models, and check if another application is using port 1234.
Slow Response Times / Hanging Requests: If messages take 30+ seconds or time out, the first request after loading a model is always slow (10-30s cold load). Also check that your context window setting matches what your model variant actually supports. If VRAM is maxed (check with nvidia-smi), use a smaller quantization or reduce context window. GPU thermal throttling starts above 85°C.
Model Produces Garbage / Ignores Instructions: If responses are incoherent or tool calls are malformed, make sure you're using "api": "openai-responses" not "api": "openai". Models under 7B parameters reliably fail at OpenClaw's complex system prompts. Q2 and Q3 quantizations can degrade instruction-following quality — stick to Q4_K_M or higher. If conversation history exceeds the model's context window, enable compaction in OpenClaw.
LM Studio Crashes Under Load: Most commonly caused by out of memory. Reduce context window, use smaller quantization, or upgrade GPU. Update NVIDIA drivers to the latest version. Delete and re-download corrupt models. On systems with limited RAM, ensure adequate swap space (2x RAM recommended).
Privacy and Security Considerations
Running OpenClaw with LM Studio is the strongest privacy option available. Here's exactly what stays local and what doesn't.
What Stays On Your Machine: all model inference, conversation history and memory files, SOUL.md and agent configuration, tool execution, and document analysis.
What Might Still Leave Your Machine: web search queries go to search engines, channel messages route through Telegram/Discord/WhatsApp servers, fallback models send prompts to cloud providers if local fails, and some skills make external API calls.
Hardening for Maximum Privacy: For the most private setup, remove cloud fallbacks, disable web search, use local-only skills, monitor outbound connections with tools like Little Snitch (Mac) or ufw (Linux), and for extreme privacy, run on an air-gapped machine with no internet access.
- All model inference stays local (prompt processing, response generation)
- Conversation history and memory files stay local
- SOUL.md, USER.md, and all agent configuration stay local
- Tool execution (file access, command running) stays local
- Web search queries still go to search engines
- Channel messages route through their respective servers (Telegram, Discord, WhatsApp)
- Fallback models send prompts to cloud providers if local fails
- Some skills (weather, email, calendar) make external API calls
Real-World Performance: What to Expect
After running OpenClaw with LM Studio daily for several weeks, here's what practical performance looks like.
Task Performance by Model Size: 7B models handle simple Q&A well but are limited or unreliable for code generation, multi-step automation, tool calling, long conversations, and complex reasoning. 13B models are good for most tasks — email drafting, code generation, and tool calling usually work, though multi-step automation and long conversations may show some degradation. 30B+ models are excellent across the board with reliable tool calling, strong code generation, and maintained context in long conversations.
Bottom line: For OpenClaw to function as a genuine AI agent (not just a chatbot), you need at minimum a 13B parameter model. The 7B models work for simple chat but break down on the agentic features — tool calling, multi-step tasks, memory management — that make OpenClaw special.
Latency Comparison: Local inference on an RTX 4060 Ti wins on first-token latency (0.5-2s vs 0.8-1.5s for cloud) due to no network round trip, but can be slower on long responses. For typical conversational exchanges (50-200 tokens), local and cloud feel roughly equivalent.
Conclusion
Running OpenClaw with LM Studio is the best way to get a fully private, zero-cost AI agent in 2026. The setup takes 30-60 minutes, the hardware requirements are reasonable for anyone with a decent GPU, and the hybrid fallback configuration means you never sacrifice reliability for privacy. Start with the Sweet Spot tier (16GB GPU, 32GB RAM), load MiniMax M2.5 or Qwen3-Next, and configure cloud fallbacks as a safety net. You'll handle 90%+ of your daily AI agent interactions locally — for free — and only fall back to cloud providers for edge cases. The local AI ecosystem has matured dramatically. A year ago, running a capable AI agent locally was a research project. Today, it's a 30-minute setup with production-grade reliability.
Skip the setup? DoneClaw deploys OpenClaw for you — $29/mo with 7-day free trial, zero configuration.
Skip 60 minutes of setup — deploy in 60 seconds
DoneClaw handles Docker, servers, security, and updates. Your OpenClaw agent is ready to chat in under a minute.
Deploy NowFrequently asked questions
Can I run OpenClaw + LM Studio on a laptop?
Yes, if your laptop has a dedicated GPU with 8GB+ VRAM (like an RTX 3060 Mobile or RTX 4060 Mobile). Expect higher temperatures and reduced battery life. Apple Silicon MacBooks (M2 Pro/Max and newer) work well thanks to efficient Metal acceleration. Gaming laptops with RTX 40-series cards are surprisingly capable, but you'll want to plug in — local inference drains battery fast.
How much electricity does running a local LLM 24/7 cost?
An RTX 4060 Ti draws ~30W at idle and ~160W under load. Assuming 2 hours of active inference per day and 22 hours at idle, that's approximately 33 kWh/month. At the US average of $0.16/kWh, that's roughly $5.30/month. Even at European rates (~$0.30/kWh), it's under $10/month — still far cheaper than cloud API costs for active users.
Can I use LM Studio and Ollama simultaneously with OpenClaw?
Yes. OpenClaw supports multiple providers in the models.providers section. You can configure LM Studio on port 1234 and Ollama on port 11434, then use different models from each provider. This lets you mix and match models from both backends.
What's the minimum model size for reliable OpenClaw agent behavior?
Based on extensive testing, 13B parameters (or equivalent MoE active parameters) is the practical minimum for reliable agent behavior. This means models like Qwen3-Next 32B, Llama 4 Scout (17B active), or Mistral Large 3 24B. The 7B models can chat but fail frequently on tool calling and multi-step task execution.
Is LM Studio really free? What's the catch?
Since July 2025, LM Studio is 100% free for all uses, including commercial. No account required, no telemetry, no data collection. The company makes money through enterprise support contracts and partnerships. There's no catch for individual users.