v1.0 · Local-First · C++17 · 3.2 MB

AI in your terminal, code that thinks.

A single 3.2 MB binary that runs large language models on your GPU, or speaks to any cloud API. 59 native tools, 84 commands, multi-agent collaboration, RAG, Team Mode and voice I/O — all inside one C++17 process.

⬇ Download v1.0 ▶ Watch it run

● Windows ● Linux ● macOS ● MIT licensed ● No telemetry

closecrab — session claude-opus-4-7 · 1M ctx · 84 cmd · 59 tools

›refactor the websocket layer to use coroutines▍

▌ Real-time terminal

Watch agents think, plan, ship code.

Every keystroke, CloseCrab is calling tools in parallel, reasoning, writing files. Below is a real session fragment.

closecrab — refactor session 3 sub-agents · parallel

›audit auth module and add rate limiting
— planning · 3 sub-agents spawned
[a1] Grep "rate.?limit" src/auth/
  → 0 matches · adding middleware
[a2] Read src/auth/middleware.cpp · 412 lines
[a3] Read CMakeLists.txt · 88 lines
all three agents converged. proposing TokenBucket(10rps, burst=20) at handler entry.
  → Edit src/auth/middleware.cpp +47 -3
  → Edit src/auth/CMakeLists.txt +1 -0
  → Write tests/auth/rate_limit_test.cpp · 124 lines
✓ build · 0 warnings · 12.4s
✓ tests · 47 passed · 0 failed
$ git diff --stat
›▍

▌ Core features

One binary. Every capability you want — and more.

Each tool is compiled C++ in the binary — no Python bridge, no npm, no runtime.

◈

Local-first LLM

llama.cpp + CUDA 12 runs GGUF models directly on your GPU. Qwen, Llama, DeepSeek, Mistral — switch with one config line.

CUDAGGUFllama.cpp

⬡

Multi-agent coordinator

Main agent dispatches sub-agents in parallel. Each carries its own context window, results converge for continued reasoning.

parallelcache-share

⌬

59 native tools

File, Shell, Grep, Edit, Web, Git, REPL, Notebook, MCP, Memory, RAG, Hooks… all C++ native.

fileshellgitweb

∞

1M token context

Full Opus 4.7 window. Auto-compact, memory system, and RAG retrieval stacked — never lose context.

1M ctxauto-compact

◎

RAG · FAISS

Local vector store. Tree-sitter parses code → embeds → FAISS index. Relevant code retrieved per query.

FAISStree-sitter

◬

Team Mode

Multi-client parallel inference, shared knowledge base, leaderboard, achievements. One GPU serves the whole team.

multi-clientshared

◯

Voice I/O

Whisper.cpp ASR + built-in TTS. Talk to the AI, the AI talks back. Zero-latency loop.

WhisperTTS

▣

Permission sandbox

Every tool call goes through allow/ask/deny rules. Path-scoped, command-scoped, network-scoped — your choice.

sandboxpolicy

⌥

MCP protocol

Model Context Protocol clients and servers built in. Plug any MCP tool into the agent loop.

MCPextensible

▌ 59 native tools

A whole galaxy of tools, orbiting one process.

Filesystem · Shell

read_file
write_file
edit_file
multi_edit
glob
grep
list_dir
shell_exec
background_exec
kill_proc

Code · Git

git_status
git_diff
git_commit
git_log
git_branch
tree_sitter
repl_python
repl_node
notebook_edit
format

AI · Memory

rag_index
rag_query
memory_store
memory_search
summarize
todo_write
think
agent_spawn
skill_invoke
schedule

Voice · I/O

tts_speak
asr_listen
screenshot
clipboard
notify
open_url
file_dialog
read_stdin
tail
watch

System · Misc

env_get
env_set
cwd
sysinfo
process_list
file_stat
checksum
compress
decompress
hooks_emit

▌ Multi-agent

One brain. Many hands.

Coordinator splits complex tasks into parallel sub-agents, each with its own context window and tool subset.

◉

Coordinator

claude-opus-4-7 · 1M ctx

dispatches · merges · plans

◐

Researcher

grep · read · web

◑

Coder

edit · write · build

◒

Tester

shell · run · diff

◓

Reviewer

analyze · suggest

▌ Any model · anywhere

Local GPU, Claude, OpenAI, Ollama, LM Studio … one config switch.

◆ Local · llama.cpp

provider: local
model_path: models/qwen-7b.Q4_K_M.gguf
gpu_layers: 35

◆ Anthropic

provider: anthropic
model: claude-opus-4-7
api_key: $ANTHROPIC_KEY

◆ OpenAI

provider: openai
model: gpt-4o
api_key: $OPENAI_KEY

◆ Ollama

provider: ollama
base_url: http://localhost:11434
model: llama3.1

◆ LM Studio

provider: lmstudio
base_url: http://localhost:1234/v1
model: deepseek-coder

◆ OpenAI-compatible

provider: openai
base_url: https://your-proxy/v1
model: any-model

▌ Tech specs

Engineering, byte-precise.

Language

C++17 · CUDA 12 · CMake~170 source files

Binary size

~3.2 MBsingle executablestripped

Inference

llama.cpp · GGUF · CUDA / Metal / CPUQ4_K_MQ8_0

Context window

1,000,000 tokens · auto-compact at 800KClaude Opus 4.7

RAG

FAISS · Tree-sitter · 384-dim embeddingslocal

Tools

59 native + MCP clientsextensible

Commands

84 slash + chat commands

Skills

11 built-in · directory-loaded plugins

Voice

Whisper.cpp ASR · Windows SAPI / system TTS

Protocols

MCP · OpenAI-compat · Anthropic MessagesJSON-RPC

Storage

SQLite · YAML config · zero database server

Platforms

Windows 10/11 · Linux · macOS

License

MIT · open source · no telemetry

▌ Mobile remote

Leave the desk. Keep the session.

Drive a running CloseCrab from your phone — full touch terminal, live thinking bar, team leaderboard — over Tailscale, ZeroTier or a Cloudflare tunnel. Code stays on your machine.

📟 Real PTY terminal, finger-friendly — swipe, quick-keys, mobile input
🌊 One calm thinking bar instead of stacked "waiting…" lines
🏆 Team Mode leaderboard — local board + cross-host global board
🎮 9 mini-games load while CloseCrab boots
🔒 License-bound + token-gated — fails closed, never anonymous

Explore Mobile Remote →

CloseCrab● live

› add rate limiting to auth

— 3 sub-agents planning

[a1] Grep rate.?limit → 0

→ Edit middleware.cpp +47 -3

✓ build · 47 tests pass

Thinking…

^CTabEscYes

Get CloseCrab v1.0

Download once. Own it forever.

A 3.2 MB binary. No installer needed. Drop into PATH, edit one config line, run.

⬇ Windows · .exe ⬇ Linux · .tar.gz ⬇ macOS · .dmg

# or build from source
git clone https://github.com/Blitzball996/CloseCrab-Unified.git
cd CloseCrab-Unified && cmake -B build && cmake --build build --config Release