Local agents,
the easy way.
locca sets up llama.cpp, helps you
pick GGUF models that fit your hardware, benchmarks them, and
wires the
pi
coding agent (or any OpenAI-compatible client) to whatever
you're running. One CLI, no flag spelunking.
Why locca
Defaults that respect your hardware.
Most local-LLM tools either bury llama.cpp under abstraction or dump every flag in your lap. locca picks defaults that work. When you want to override them, the flags are right there.
llama.cpp, handled
locca install-llama drops a prebuilt binary
into ~/.locca/bin — auto-detects Vulkan, CUDA, HIP,
Metal, or CPU. No compiler, no sudo. Then it runs with tuned
defaults: flash attention, quantized KV cache, batch sizing.
Catalog with fit hints
The first-run wizard and locca switch show every
curated model with a fits — 5.6 GB dl,
14.3 GB RAM, 256k ctx hint based on your detected hardware.
No more 30 GB downloads that won't run.
Search & download
locca search qwen fuzzy-searches HuggingFace,
locca download pulls a GGUF straight into your
models dir, locca delete reclaims the disk.
Vision adapters (mmproj*.gguf) auto-attach to their parent.
Bench in one command
locca bench wraps llama-bench
with a friendlier summary — live tok/s and ctx during the
run, results table at the end. Compare quants and ctx sizes
without touching a flag.
Doctor & optimise
locca doctor sweeps hardware, server state,
and the last 64 KiB of log for known issues — outdated chat
templates, OOMs, ctx truncation. locca optimise
hands the same data to pi and asks for concrete tweaks.
OpenAI-compatible
Point Cursor, Claude Code, or any OpenAI client at the local
server. locca api prints every reachable LAN and
Tailscale URL — probed live. Already running llama-server?
locca detects it on /health, marks it
attached, and uses it instead of spawning a duplicate.
Get going
Install.
One command, then a wizard. locca picks a models folder with you, fetches llama.cpp if it’s missing, downloads a starter GGUF, and installs the pi coding agent.
Surface
A small set of commands.
Run locca
with no args for the menu, or jump straight to what you need.
locca piLaunch the pi coding agent against your local server.locca serveStartllama-serverwith a picked model, detached.locca switchCatalog-aware picker — installed models + curated catalog with fit hints.locca benchRun llama-bench with a friendlier summary.locca doctorHealth check — hardware, server, log warnings, config sanity.locca optimiseHave pi review the deployment and rank concrete tweaks.locca apiPrint OpenAI-compatible connection info + LAN URLs.locca logsTail the server log (locca-spawned servers only).locca downloadPull a GGUF from HuggingFace into your models dir.locca searchFuzzy-search HuggingFace for GGUF models.locca deleteRemove a model directory you no longer need.locca stopStop the running server.locca install-llamaDownload / update a prebuilt llama.cpp binary into~/.locca/bin. Auto-detects backend.locca configView / edit settings — get, set, reset, list, path.locca setupRe-run the setup wizard.
Bonus
One command into the pi coding agent.
locca pi qwen fuzzy-matches the first
*qwen*.gguf in your models dir, brings up the
server if it isn’t already running, and registers itself as a
custom OpenAI-compatible provider in
~/.pi/agent/models.json. Switch model, switch
brain — locca switch gpt-oss-20b.
Where pi keeps its stuff.
Once locca pi drops you into the agent,
these are the paths worth knowing. Global config lives under
~/.pi/agent/; per-project overrides go in
.pi/ at your repo root.
~/.pi/agent/settings.jsonModel, theme, thinking level, retries, telemetry. Project overrides at.pi/settings.json.~/.pi/agent/models.jsonCustom OpenAI-compatible providers. locca owns theloccaentry and rewrites it on everylocca pi— leave the rest alone.~/.pi/agent/skills/Drop in pi-skills packages — each one a folder with aSKILL.md. Project skills also load from.pi/skills/and ancestors up to the git root.~/.pi/agent/AGENTS.mdGlobal instructions loaded at startup. Per-projectAGENTS.mdfiles in cwd or any ancestor merge in too.~/.pi/agent/SYSTEM.mdReplaces the default system prompt entirely. UseAPPEND_SYSTEM.mdif you only want to tack things on.~/.pi/agent/prompts/Reusable prompt templates. Drop afoo.mdin here and run it mid-session with/foo.~/.pi/agent/extensions/TypeScript modules registering custom tools, slash commands, and UI panels.~/.pi/agent/keybindings.jsonOverride key bindings if the defaults clash with your terminal.~/.pi/agent/sessions/JSONL session logs grouped by working directory — handy for resuming or grepping a past chat.
locca pi, no restart needed.