Bring your own AI

Cabinet doesn't host inference. You connect the providers you already pay for, and Cabinet routes your agents' calls to them. There's no Cabinet middleman, no inference markup, no opaque quota. You pay your own bill, you use your own quota.

Supported providers today

Cabinet runs through a provider adapter layer. As of v0.4.0, eight CLI providers ship in the default runtime — all local, so your prompts go through a tool you've already installed and authenticated:

Adapter	Provider	Install
`claude_local`	Anthropic Claude (via Claude Code CLI)	`npm install -g @anthropic-ai/claude-code`
`codex_local`	OpenAI GPT (via Codex CLI)	`npm install -g @openai/codex` or `brew install --cask codex`
`gemini_local`	Google Gemini (via Gemini CLI)	See provider docs
`opencode_local`	OpenCode	See provider docs
`pi_local`	Pi (Inflection)	See provider docs
	+ 3 more CLI providers	See provider settings in app

You need at least one CLI installed. The defaults Cabinet boots with are claude_local and codex_local; everything else turns on per-agent. Each adapter has shared runtime controls — model picker, effort sliders, dynamic listModels(), brand icons.

Coming next

Direct API access (no CLI required) and more open-weights models are on the way:

Provider	How it'll connect	Status
Hermes (Nous)	Local via Ollama	Planned
Llama / Mistral / Qwen	Local via Ollama	Planned
LM Studio	Local, no key	Planned
Anthropic API (direct)	API key, no CLI required	Planned
OpenAI API (direct)	API key, no CLI required	Planned
xAI Grok (direct)	API key	Planned
Image generation	FLUX, DALL·E, local SD	Planned

Track progress on the Roadmap.

Why a mix of models

Most AI products lock you to one model behind one API. Cabinet's bet is the opposite: your team should be a mix of models, picked per job. The strategy lead runs on Opus because it writes carefully. The triage agent runs on Haiku because it's cheap and fast. The local agent runs on a local model because the data is sensitive. Same cabinet, different brains.

What each provider is good for

Provider	Strengths	When to pick
Anthropic Claude	Long careful writing, strong reasoning, big context	Defaults for leads, editors, anything where tone matters
OpenAI GPT	Fast, cheap, strong tool use	Triage, dispatchers, high-volume routine work
Google Gemini	Long context, low cost per token, vision	Research synthesis, document QA, long-form summaries
xAI Grok	Fresh news access, lighter touch	Trend analysts, social listening
Local (Ollama / LM Studio)	Offline, private, free	Sensitive data, air-gapped work, hobby experiments

Setting an agent's model

In persona.md:

---
name: GTM Lead
model: claude-opus-4-7      # the default for this agent
---

Or override per task at the composer:

┌───────────────────────────────────┐
│  → GTM Lead   ▾  gpt-4.1   ▾     │   ← override the model here
│  ◯ low  ● medium  ◯ high          │
└───────────────────────────────────┘

Or per heartbeat:

heartbeats:
  - cron: "0 9 * * 1-5"
    model: claude-haiku-4-5    # cheap for the daily pulse
    prompt: "Inbox triage."
  - cron: "0 17 * * 5"
    model: claude-opus-4-7     # opus for the weekly synthesis
    prompt: "Friday wrap."

Routing rules

In .cabinet you can set global routing:

providers:
  defaults:
    lead: claude-opus-4-7
    specialist: claude-sonnet-4-6
    triage: gpt-4.1
  fallbacks:
    claude-opus-4-7:
      - claude-sonnet-4-6
      - gpt-4.1
  budgets:
    daily:
      maxCostUsd: 25
    perTask:
      maxCostUsd: 5

If a request hits a budget cap or a 429, Cabinet falls back to the next model in the list and writes a note in the run transcript.

What Cabinet never does

Forwards your prompts to a Cabinet-hosted backend.
Caches your prompts for "model improvement."
Charges you for inference.
Locks you to a provider after install.

If you delete your API key, the provider disappears. Your prompts and outputs stay in your cabinet folder, on your disk.

Local-only mode

Want every call to stay on your machine? Set providers.localOnly: true in .cabinet:

providers:
  localOnly: true
  defaults:
    lead: ollama/llama-3.3-70b
    specialist: ollama/qwen-2.5-coder

Cabinet refuses any non-local provider. If an agent's persona points at claude-opus-4-7, the run errors with a clear "local-only" message instead of silently falling through.

Read on

Persona — where you set an agent's model.
Routines — per-routine model overrides.
Tips → Performance & cost — the cheap-fast / careful-slow recipe.