v1.0 · Local-First · C++17 · 3.2 MB

AI in your terminal, code that thinks.

A single 3.2 MB binary that runs large language models on your GPU, or speaks to any cloud API. 59 native tools, 84 commands, multi-agent collaboration, RAG, Team Mode and voice I/O — all inside one C++17 process.

Download v1.0 Watch it run
● Windows ● Linux ● macOS ● MIT licensed ● No telemetry
closecrab — session claude-opus-4-7 · 1M ctx · 84 cmd · 59 tools
refactor the websocket layer to use coroutines
llama.cpp + CUDA59 native tools84 commandsmulti-agentRAG · FAISSTeam ModeVoice I/OMCP protocol1M context11 skills llama.cpp + CUDA59 native tools84 commandsmulti-agentRAG · FAISSTeam ModeVoice I/OMCP protocol1M context11 skills
59
Tools
native C++17
84
Commands
slash + chat
3.2MB
Binary
single executable
1M
Tokens
Opus 4.7 context
▌ Real-time terminal

Watch agents think, plan, ship code.

Every keystroke, CloseCrab is calling tools in parallel, reasoning, writing files. Below is a real session fragment.

closecrab — refactor session 3 sub-agents · parallel
audit auth module and add rate limiting
— planning · 3 sub-agents spawned
[a1] Grep "rate.?limit" src/auth/
→ 0 matches · adding middleware
[a2] Read src/auth/middleware.cpp · 412 lines
[a3] Read CMakeLists.txt · 88 lines
all three agents converged. proposing TokenBucket(10rps, burst=20) at handler entry.
→ Edit src/auth/middleware.cpp +47 -3
→ Edit src/auth/CMakeLists.txt +1 -0
→ Write tests/auth/rate_limit_test.cpp · 124 lines
✓ build · 0 warnings · 12.4s
✓ tests · 47 passed · 0 failed
$ git diff --stat
▌ Core features

One binary. Every capability you want — and more.

Each tool is compiled C++ in the binary — no Python bridge, no npm, no runtime.

Local-first LLM

llama.cpp + CUDA 12 runs GGUF models directly on your GPU. Qwen, Llama, DeepSeek, Mistral — switch with one config line.

CUDAGGUFllama.cpp

Multi-agent coordinator

Main agent dispatches sub-agents in parallel. Each carries its own context window, results converge for continued reasoning.

parallelcache-share

59 native tools

File, Shell, Grep, Edit, Web, Git, REPL, Notebook, MCP, Memory, RAG, Hooks… all C++ native.

fileshellgitweb

1M token context

Full Opus 4.7 window. Auto-compact, memory system, and RAG retrieval stacked — never lose context.

1M ctxauto-compact

RAG · FAISS

Local vector store. Tree-sitter parses code → embeds → FAISS index. Relevant code retrieved per query.

FAISStree-sitter

Team Mode

Multi-client parallel inference, shared knowledge base, leaderboard, achievements. One GPU serves the whole team.

multi-clientshared

Voice I/O

Whisper.cpp ASR + built-in TTS. Talk to the AI, the AI talks back. Zero-latency loop.

WhisperTTS

Permission sandbox

Every tool call goes through allow/ask/deny rules. Path-scoped, command-scoped, network-scoped — your choice.

sandboxpolicy

MCP protocol

Model Context Protocol clients and servers built in. Plug any MCP tool into the agent loop.

MCPextensible
▌ 59 native tools

A whole galaxy of tools, orbiting one process.

Filesystem · Shell

  • read_file
  • write_file
  • edit_file
  • multi_edit
  • glob
  • grep
  • list_dir
  • shell_exec
  • background_exec
  • kill_proc

Code · Git

  • git_status
  • git_diff
  • git_commit
  • git_log
  • git_branch
  • tree_sitter
  • repl_python
  • repl_node
  • notebook_edit
  • format

Web · Network

  • web_fetch
  • web_search
  • web_render
  • http_get
  • http_post
  • mcp_call
  • mcp_list
  • browse
  • download
  • upload

AI · Memory

  • rag_index
  • rag_query
  • memory_store
  • memory_search
  • summarize
  • todo_write
  • think
  • agent_spawn
  • skill_invoke
  • schedule

Voice · I/O

  • tts_speak
  • asr_listen
  • screenshot
  • clipboard
  • notify
  • open_url
  • file_dialog
  • read_stdin
  • tail
  • watch

System · Misc

  • env_get
  • env_set
  • cwd
  • sysinfo
  • process_list
  • file_stat
  • checksum
  • compress
  • decompress
  • hooks_emit
▌ Multi-agent

One brain. Many hands.

Coordinator splits complex tasks into parallel sub-agents, each with its own context window and tool subset.

Coordinator
claude-opus-4-7 · 1M ctx
dispatches · merges · plans
Researcher
grep · read · web
Coder
edit · write · build
Tester
shell · run · diff
Reviewer
analyze · suggest
▌ Any model · anywhere

Local GPU, Claude, OpenAI, Ollama, LM Studio … one config switch.

Local · llama.cpp
provider: local
model_path: models/qwen-7b.Q4_K_M.gguf
gpu_layers: 35
Anthropic
provider: anthropic
model: claude-opus-4-7
api_key: $ANTHROPIC_KEY
OpenAI
provider: openai
model: gpt-4o
api_key: $OPENAI_KEY
Ollama
provider: ollama
base_url: http://localhost:11434
model: llama3.1
LM Studio
provider: lmstudio
base_url: http://localhost:1234/v1
model: deepseek-coder
OpenAI-compatible
provider: openai
base_url: https://your-proxy/v1
model: any-model
▌ Tech specs

Engineering, byte-precise.

Language
C++17 · CUDA 12 · CMake~170 source files
Binary size
~3.2 MBsingle executablestripped
Inference
llama.cpp · GGUF · CUDA / Metal / CPUQ4_K_MQ8_0
Context window
1,000,000 tokens · auto-compact at 800KClaude Opus 4.7
RAG
FAISS · Tree-sitter · 384-dim embeddingslocal
Tools
59 native + MCP clientsextensible
Commands
84 slash + chat commands
Skills
11 built-in · directory-loaded plugins
Voice
Whisper.cpp ASR · Windows SAPI / system TTS
Protocols
MCP · OpenAI-compat · Anthropic MessagesJSON-RPC
Storage
SQLite · YAML config · zero database server
Platforms
Windows 10/11 · Linux · macOS
License
MIT · open source · no telemetry
Get CloseCrab v1.0

Download once. Own it forever.

A 3.2 MB binary. No installer needed. Drop into PATH, edit one config line, run.

# or build from source
git clone https://github.com/Blitzball996/CloseCrab-Unified.git
cd CloseCrab-Unified && cmake -B build && cmake --build build --config Release