A single 3.2 MB binary that runs large language models on your GPU, or speaks to any cloud API. 59 native tools, 84 commands, multi-agent collaboration, RAG, Team Mode and voice I/O — all inside one C++17 process.
Every keystroke, CloseCrab is calling tools in parallel, reasoning, writing files. Below is a real session fragment.
Each tool is compiled C++ in the binary — no Python bridge, no npm, no runtime.
llama.cpp + CUDA 12 runs GGUF models directly on your GPU. Qwen, Llama, DeepSeek, Mistral — switch with one config line.
Main agent dispatches sub-agents in parallel. Each carries its own context window, results converge for continued reasoning.
File, Shell, Grep, Edit, Web, Git, REPL, Notebook, MCP, Memory, RAG, Hooks… all C++ native.
Full Opus 4.7 window. Auto-compact, memory system, and RAG retrieval stacked — never lose context.
Local vector store. Tree-sitter parses code → embeds → FAISS index. Relevant code retrieved per query.
Multi-client parallel inference, shared knowledge base, leaderboard, achievements. One GPU serves the whole team.
Whisper.cpp ASR + built-in TTS. Talk to the AI, the AI talks back. Zero-latency loop.
Every tool call goes through allow/ask/deny rules. Path-scoped, command-scoped, network-scoped — your choice.
Model Context Protocol clients and servers built in. Plug any MCP tool into the agent loop.
Coordinator splits complex tasks into parallel sub-agents, each with its own context window and tool subset.
provider: local
model_path: models/qwen-7b.Q4_K_M.gguf
gpu_layers: 35
provider: anthropic
model: claude-opus-4-7
api_key: $ANTHROPIC_KEY
provider: openai
model: gpt-4o
api_key: $OPENAI_KEY
provider: ollama
base_url: http://localhost:11434
model: llama3.1
provider: lmstudio
base_url: http://localhost:1234/v1
model: deepseek-coder
provider: openai
base_url: https://your-proxy/v1
model: any-model
A 3.2 MB binary. No installer needed. Drop into PATH, edit one config line, run.