cllama: The Governance Proxy
When a reasoning model tries to govern itself, the guardrails are part of the same cognitive process they are trying to constrain. This is the fundamental problem with prompt-level safety: the judge and the defendant share the same brain.
cllama is a separate process sitting between the runner and the LLM provider. The runner thinks it is talking directly to the model. It never sees the proxy. This is principle number eight: think twice, act once.
How It Works
The proxy sits on the network path between every agent in the pod and the LLM providers. When an agent makes an API call to what it believes is OpenAI or Anthropic, the request goes to cllama instead. The proxy evaluates, routes, and logs the request, then forwards it to the real provider.
Agent → (bearer token) → cllama proxy → (real API key) → LLM Provider
↓
audit log + dashboardA single proxy instance serves the entire pod. Bearer tokens resolve which agent is calling, so the proxy can apply per-agent policy, budgets, and logging.
Credential Starvation
Isolation is achieved by strictly separating secrets:
- The proxy holds the real API keys. Provider credentials (OpenRouter, Anthropic, OpenAI, Gemini/Google, Vercel AI Gateway, xAI) are configured in the pod-level
cllama-defaults.envblock and never enter agent containers. - Agents get unique bearer tokens. Each agent (and each ordinal of a scaled agent) receives a unique token generated during
claw up. - No credentials, no bypass. Because agents lack the credentials to call providers directly, all successful inference must pass through the proxy -- even if a malicious prompt tricks the agent into ignoring its configured base URL.
Keys Never Enter Agent Containers
Provider API keys belong in x-claw.cllama-defaults.env at the pod level. They are injected into the cllama proxy container only. Agent containers receive bearer tokens, not API keys.
Identity Resolution
The proxy uses bearer tokens to resolve caller identity. Each token maps to a specific agent (or agent ordinal), which means the proxy can:
- Apply per-agent policy and cost budgets
- Track per-agent token usage and spend
- Log which agent made which request
- Enforce different model access per agent
The token format is <agent-id>:<secret>, generated fresh on every claw up. The proxy loads a principals file mapping tokens to agent identities and their compiled contract context.
When a request arrives, the proxy:
- Extracts the
<agent-id>from the bearer token. - Loads the agent's context from
CLAW_CONTEXT_ROOT/<agent-id>/. - Validates the
<secure-secret>againstmetadata.jsonprincipals. - Checks the requested model against the agent's allowed models.
Token validation is fail-closed: unknown or missing tokens are denied before any provider call is made.
Transport Model
The proxy exposes a canonical ingress surface matrix — a small set of runner-facing HTTP surfaces that together form the cllama transport contract. See ADR-023 for the architectural rationale.
| Surface | Path | Payload family | Default use |
|---|---|---|---|
| OpenAI Chat Completions | POST /v1/chat/completions | OpenAI-compatible chat/completions | All non-Anthropic providers unless an explicit exception is documented |
| Anthropic Messages | POST /v1/messages | Anthropic Messages | Anthropic-family providers and explicit Anthropic-wire exceptions |
| Property | Value |
|---|---|
| Listen port | 0.0.0.0:8080 |
| Base URL (as seen by runner) | http://cllama-<type>:8080/v1 |
| Auth header | Authorization: Bearer <agent-id>:<secure-secret> |
Clawdapus configures each agent's runner to use the proxy URL as its LLM base URL, and the runner targets one of the canonical ingress paths beneath that base URL. Provider identity (google/gemini-*, anthropic/*, etc.) stays in operator-facing model refs — runners must not invent synthetic provider prefixes such as cllama/google. Two distinct code paths handle OpenAI format (messages[]) and Anthropic format (top-level system field).
OpenAI Format
Requests to /v1/chat/completions are handled as OpenAI format. The payload contains a messages[] array and a model field. The proxy rewrites the model field to the operator-assigned provider and model, then forwards the request to the resolved upstream endpoint.
Anthropic Format
Requests to /v1/messages are handled as Anthropic format. The payload uses a top-level system field rather than embedding system messages in the messages array. The proxy forwards Anthropic-specific headers (Anthropic-Version, Anthropic-Beta) and routes directly to the Anthropic provider.
Format Bridging
When the resolved provider uses Anthropic format but the incoming request is OpenAI format (/v1/chat/completions), the proxy routes through OpenRouter instead, which accepts OpenAI format for all models. This transparent bridging means agents do not need to know which provider or format their assigned model requires.
Pure Passthrough
In passthrough mode, the proxy rewrites the model field and forwards. It does not touch the messages array. No prompt decoration, no system message injection -- those capabilities are reserved for the cllama-policy proxy type.
The Interception Pipeline
The cllama specification defines a full bidirectional interception pipeline with five phases. The runner never knows the proxy exists -- it thinks it is talking directly to the model.
Pre-flight
Identity resolution, token validation, and model authorization. Invalid tokens are rejected before any downstream work begins.
Outbound Interception (Agent to Provider)
Before the LLM sees the prompt, the proxy can evaluate and modify the outbound request:
- Context aggregation -- The proxy parses the
enforcerules from the agent-specificAGENTS.md. These rules form the behavioral contract that governs what the agent is allowed to do. - Tool scoping -- If the agent's request contains
tools, the proxy evaluates each tool against the agent's identity and active policy modules. Tools not authorized for the agent's contracted role are silently dropped before the request reaches the provider. - Prompt decoration (pre-prompting) -- The proxy may modify the outbound
messagesarray to inject operator-defined rules, priorities, or warnings. This decoration happens transparently -- the agent has no visibility into what was added. - Policy blocking -- If the outbound prompt violates a loaded policy module or
enforcerule, the proxy may short-circuit the request entirely and return an error or a mock response. The agent never reaches the provider. - Forced model routing and rate limiting -- Even if the agent requests a specific model (e.g.,
gpt-4o), the proxy may seamlessly rewrite themodelfield to use a different, operator-approved model (e.g.,claude-3-haiku). The agent never knows its model was downgraded. Combined with rate limiting via429 Too Many Requestsresponses, this enforces strict compute budgets across the fleet. - Late runtime context assembly -- Volatile context (subscribed feeds, memory recall, the current time line, and live Discord channel deltas) is appended as a late runtime-context message rather than concatenated onto the first system prompt. OpenAI-compatible requests receive a later
systemmessage inserted immediately before the invoking user message; Anthropic requests receive a trailingusercontent block. The stable system contract and the existing first non-system message stay byte-stable across turns, which preserves prompt-cache identity on cache-supported providers and keeps OpenRouter sticky routing pinned to a stable conversation.
Stable Contract, Volatile Tail
Feed headers no longer carry the volatile refreshed <ts> line in model-visible text -- unchanged feed content with a TTL refresh now produces byte-identical bytes. The STALE tag still appears when a feed fetch failed and the rendered text is from the last good fetch.
Channel Context Cursors
Live Discord channel context is fetched as a delta-since-watermark instead of a full tail every turn. The proxy keeps a per-agent vector cursor (one entry per visible channel) and rewrites the channel-context feed URL with after=<channel_id>:<message_id> watermarks before sending it to claw-wall. The cursor is committed only after a successful 2xx response is recorded by the session-history writer, so streaming truncation, 5xx upstream errors, and 4xx rejections all leave the cursor untouched and the same delta replays on the next mention. When claw-wall caps a delta response, cllama appends a coverage_partial=true omitted_after_cursor=N newest_returned=... annotation so partial coverage is visible rather than silently swallowed; the cursor still advances to the newest returned message. The wall backfills Discord history on startup before the first forward poll, and feed headers include backfill_status so a partial or rate-limited backing window is visible to operators. See Social Topology · Channel Context Feed for the wire shape.
The cursor ledger lives at $CLAW_CONTEXT_LEDGER_DIR/<agent-id>/cursor.json. The default path is $CLAW_SESSION_HISTORY_DIR/context-ledger (i.e. inside the existing read-write session-history mount). When session history is disabled, cursors fall back to in-memory only and every cold start re-bootstraps with a 24h tail.
Provider Execution
The proxy strips the dummy token, attaches the real provider API key, and forwards the request upstream.
Inbound Interception (Provider to Agent)
After the provider responds but before the runner sees the result, the proxy can evaluate and amend:
- Response amendment -- The proxy evaluates the provider's response against the
enforcerules in the agent's contract and active policy modules. If the response violates the tone, instructions, or restrictions defined in the contract, the proxy rewrites the content before the agent sees it. - PII leakage blocking -- The proxy can detect and redact personally identifiable information. If the provider's response contains data that should not flow back to the agent (customer names, account numbers, internal identifiers), the proxy strips or masks it.
- Drift scoring -- The proxy quantifies how far the provider's raw response drifted from the agent's ideal behavior as defined in its contract, emitting a structured log of the drift score. The scoring methodology is organization-specific and not defined by the cllama standard.
Egress
The (potentially amended) response is returned to the agent.
Passthrough vs Policy
The reference passthrough implementation currently performs identity resolution, model rewriting, and cost tracking only. It does not touch the messages array -- no prompt decoration, no response amendment. Full bidirectional interception is the cllama-policy proxy type, which is future work.
Context Mount Structure
The proxy needs to know who each agent is and what it is allowed to do. Clawdapus provides this through a shared context mount -- a directory tree with per-agent subdirectories containing the compiled contract and identity metadata.
Host-Side Layout
During claw up, Clawdapus generates context files under the runtime directory:
.claw-runtime/context/
├── crypto-crusher-0/
│ ├── AGENTS.md # Compiled contract (includes, enforce, guide)
│ ├── CLAWDAPUS.md # Infrastructure map (surfaces, skills, topology)
│ └── metadata.json # Identity, bearer token, handles, policy modules
├── crypto-crusher-1/
│ ├── AGENTS.md
│ ├── CLAWDAPUS.md
│ └── metadata.json
└── analyst/
├── AGENTS.md
├── CLAWDAPUS.md
└── metadata.json| File | Purpose |
|---|---|
AGENTS.md | The agent's compiled behavioral contract, including inlined enforce and guide content from INCLUDE directives. |
CLAWDAPUS.md | Infrastructure context: surfaces, mount paths, peer handles, feeds, and available skills. |
metadata.json | Machine-readable identity (handles, allowed models, bearer token auth). |
Container-Side Mount
The host directory is bind-mounted into the cllama container at /claw/context/<agent-id>/. The proxy reads CLAW_CONTEXT_ROOT (defaults to /claw/context) and loads each subdirectory as an agent identity.
The context/ directory segment is required in both host and container paths.
The context/ Segment Is Required
The mount path must include the context/ directory segment. The proxy expects CLAW_CONTEXT_ROOT to point at the directory containing agent subdirectories, not directly at an agent's files.
Context Mount Contents
The agentctx struct currently holds only three fields: AgentsMD, ClawdapusMD, and Metadata (used for bearer token auth). There are no outbound service credentials, no feed manifests, and no decoration config in the context mount today.
Scaled Services
For services with count > 1, context is generated per ordinal. A service named crypto-crusher with count: 3 produces three separate context directories: crypto-crusher-0/, crypto-crusher-1/, crypto-crusher-2/. Each ordinal gets its own bearer token, its own compiled contract, and its own audit trail.
The metadata.json file in each directory contains the bearer token secret used for authentication. The proxy validates incoming tokens against these metadata files to resolve caller identity.
Environment Variables
The cllama container receives its configuration through environment variables injected by claw up.
| Variable | Description |
|---|---|
CLAW_POD | The name of the pod (e.g., crypto-ops). |
CLAW_CONTEXT_ROOT | Path to the shared context mount root (defaults to /claw/context). |
CLAW_SESSION_HISTORY_DIR | Path to the read-write session history mount (defaults to /claw/session-history). When set, also seeds the default CLAW_CONTEXT_LEDGER_DIR. |
CLAW_CONTEXT_LEDGER_DIR | Path where per-agent channel cursors are persisted (defaults to $CLAW_SESSION_HISTORY_DIR/context-ledger). When unset, cursors fall back to in-memory and every restart re-bootstraps with a 24h tail. |
CLLAMA_FEED_MAX_RESPONSE_BYTES | Per-feed byte cap applied at fetch time before formatting. Default 32768. Invalid or non-positive values fall back to the default. |
CLLAMA_FEED_MAX_TOTAL_BYTES | Aggregate cap across all formatted feed blocks injected into one request. Default 65536. Invalid or non-positive values fall back to the default. |
PROVIDER_API_KEY_* | Real provider API keys -- OPENAI_API_KEY, ANTHROPIC_API_KEY, OPENROUTER_API_KEY, GEMINI_API_KEY / GOOGLE_API_KEY, AI_GATEWAY_API_KEY, etc. |
Where Provider Keys Go
Provider keys are configured in the pod YAML under x-claw.cllama-defaults.env. They are injected into the cllama proxy container only. They must not appear in regular agent environment: blocks.
x-claw:
pod: my-fleet
cllama-defaults:
proxy: [passthrough]
env:
OPENROUTER_API_KEY: "${OPENROUTER_API_KEY}"
ANTHROPIC_API_KEY: "${ANTHROPIC_API_KEY}"
GEMINI_API_KEY: "${GEMINI_API_KEY}"For native Gemini routing, declare models as google/<model>. GEMINI_API_KEY is the primary env name; GOOGLE_API_KEY is accepted as a lower-priority alias. GOOGLE_BASE_URL can override the default OpenAI-compatible Google endpoint when needed.
cllama-env, Not environment
Provider API keys belong in x-claw.cllama-defaults.env (or service-level x-claw.cllama-env), never in the service's compose environment: block. Putting real keys in environment: defeats credential starvation -- the agent container would have direct provider access.
Feed Injection Budgets
cllama applies two byte budgets when it injects subscribed feeds into a request: a per-feed cap (read at fetch time, before formatting) and an aggregate cap across all formatted feed blocks. Feeds are injected in manifest order until the aggregate cap is reached.
These budgets are intentionally bounded by default -- 32 KB per feed and 64 KB aggregate -- so a small pod cannot accidentally turn feed subscriptions into unbounded prompt stuffing. The defaults are independent of the feed source window: a claw-wall channel-awareness feed configured with a large x-claw.context.channel.max-chars can return far more than 32 KB, but cllama will still cap what reaches the model unless you raise its budgets too.
Raise both caps together through x-claw.cllama-defaults.env (or service-level x-claw.cllama-env):
x-claw:
pod: trading-desk
cllama-defaults:
proxy: [passthrough]
env:
CLLAMA_FEED_MAX_RESPONSE_BYTES: "262144" # accept up to 256 KB from any one feed
CLLAMA_FEED_MAX_TOTAL_BYTES: "393216" # 384 KB across all injected feeds combinedInvalid or non-positive values fall back to the bounded defaults, so a typo cannot silently unbound injection. Set CLLAMA_FEED_MAX_TOTAL_BYTES high enough to hold the sum of every feed a turn carries (market/style context, scaffolds, memory recall, channel awareness, channel context) -- the aggregate cap is shared across all of them, not per feed.
When the aggregate cap does drop a feed, cllama no longer fails silently: the model sees an explicit --- FEED: <name> skipped (total feed size cap reached; block_bytes=… total_before=… max_total_bytes=…) --- notice in the runtime context, and a structured feed_injection telemetry event records the outcome (see Telemetry Fields). Context snapshots store the actual provider-visible blocks and skip notices.
Skip is in manifest order
The aggregate cap drops whole feeds in manifest order once the budget is exhausted; there is no per-feed priority or reservation yet. If a large feed earlier in the manifest can starve a later one, raise CLLAMA_FEED_MAX_TOTAL_BYTES rather than relying on ordering.
Pod Configuration
Declaring a cllama Proxy
The proxy is declared in claw-pod.yml via the cllama field on a service's x-claw block:
services:
analyst:
x-claw:
agent: analyst
cllama: passthrough
cllama-env:
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}The cllama value specifies the proxy type. Currently only passthrough ships as a reference implementation.
Provider Keys with YAML Anchors
For pods with multiple services using the same provider keys, use YAML anchors to stay DRY:
x-claw-env: &cllama-keys
OPENAI_API_KEY: ${OPENAI_API_KEY}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
services:
analyst:
x-claw:
agent: analyst
cllama: passthrough
cllama-env: *cllama-keys
researcher:
x-claw:
agent: researcher
cllama: passthrough
cllama-env: *cllama-keysNative Gemini Routing
Direct Gemini works through Google's OpenAI-compatible endpoint. Use the google/<model> provider prefix and seed the key through x-claw.cllama-env.
services:
analyst:
x-claw:
agent: analyst
cllama: passthrough
models:
primary: google/gemini-2.5-flash
cllama-env:
GEMINI_API_KEY: ${GEMINI_API_KEY}
# optional override for proxies or alternate endpoints
GOOGLE_BASE_URL: ${GOOGLE_BASE_URL}If both GEMINI_API_KEY and GOOGLE_API_KEY are present, cllama prefers GEMINI_API_KEY as the active seed key.
Vercel AI Gateway Routing
Vercel AI Gateway works through its OpenAI-compatible endpoint. Use the vercel/<provider>/<model> provider prefix and seed the gateway key through x-claw.cllama-env.
services:
analyst:
x-claw:
agent: analyst
cllama: passthrough
models:
primary: vercel/anthropic/claude-sonnet-4.6
cllama-env:
AI_GATEWAY_API_KEY: ${AI_GATEWAY_API_KEY}
# optional override for proxies or alternate endpoints
AI_GATEWAY_BASE_URL: ${AI_GATEWAY_BASE_URL}The OpenAI-compatible /v1/chat/completions path forwards anthropic/claude-sonnet-4.6 to Vercel as the upstream model. The Anthropic /v1/messages path remains native Anthropic-only.
Count Expansion with cllama
When a service declares both cllama and count > 1, each ordinal gets its own bearer token and context directory. The proxy authenticates each ordinal independently:
services:
analyst:
x-claw:
agent: analyst
cllama: passthrough
count: 3This produces analyst-0, analyst-1, and analyst-2, each with:
- A unique bearer token in format
analyst-N:<secret> - A context directory at
/claw/context/analyst-N/ - Independent telemetry tagged with
claw_id: analyst-N
Cost Accounting
The proxy extracts token usage from every LLM response, multiplies by the pricing table, and tracks cost per agent, per provider, and per model. This gives operators real-time visibility into spend without relying on provider dashboards that aggregate across all API keys.
$ claw ps
TENTACLE STATUS CLLAMA DRIFT
crypto-crusher-0 running healthy 0.02
crypto-crusher-1 running healthy 0.04
crypto-crusher-2 running WARNING 0.31Telemetry and Audit
Every request through the proxy produces a structured JSON log entry on stdout. Clawdapus collects these for the claw audit command and for the Master Claw's fleet governance decisions.
Telemetry Fields
| Field | Description |
|---|---|
ts | ISO-8601 UTC timestamp. |
claw_id | The calling agent's identifier. |
type | Event type: request, response, error, intervention, feed_fetch, feed_injection, provider_pool, memory_op. |
intervention | Why the proxy modified a prompt, dropped a tool, or amended a response. References the specific policy module or rule. |
model | The model used for the request. |
tokens_in | Input token count. |
tokens_out | Output token count. |
cost_usd | Estimated cost for the request/response pair. |
latency_ms | Request duration in milliseconds. |
static_system_hash | sha256 of the stable system contract (messages[0] for OpenAI / top-level system for Anthropic). Should be byte-stable across turns when nothing about the agent's contract changed. |
first_system_hash | sha256 of the first system message in the assembled payload. v1 mirrors static_system_hash; reserved for future Anthropic cache_control differentiation. |
first_non_system_hash | sha256 of the first non-system message. Stable on multi-turn runners; expected to drift on single-turn Discord runners and surfaces that drift via this field. |
dynamic_context_hash | sha256 of the late runtime-context block (memory + feeds + time + channel deltas). Changes per turn when new context arrives. |
tools_hash | sha256 of the canonicalized tools[] payload. |
cached_tokens | Provider-reported usage.prompt_tokens_details.cached_tokens when present. |
cache_write_tokens | Provider-reported usage.prompt_tokens_details.cache_write_tokens when present. |
Event-specific fields may also be present depending on type:
status_code,latency_ms,tokens_in,tokens_out,cost_usd,cached_tokens,cache_write_tokens— request/response/error eventsstatic_system_hash,first_system_hash,first_non_system_hash,dynamic_context_hash,tools_hash— request events (prompt assembly fingerprint)feed_name,feed_url,fetched_at,cached— feed fetch eventsfeed_name,source,feed_status(included/empty/skipped_total_cap),feed_truncated,feed_source_bytes,feed_source_exact,feed_content_bytes,feed_block_bytes,feed_total_before,feed_total_after,feed_max_response_bytes,feed_max_total_bytes—feed_injectionevents (one per manifest entry, recording whether the feed actually reached the provider-visible context after the per-feed and aggregate byte caps)provider,key_id,action,reason,cooldown_until— provider pool eventsmemory_service,memory_op,memory_status,memory_blocks,memory_bytes,memory_removed— memory telemetry events
Every request/response pair produces two log events: one with type: "request" on ingress and one with type: "response" on egress. Error events use type: "error". Intervention events use type: "intervention". Token counts and cost estimates are extracted from the provider's response headers or body and attached to the response event.
Spec Divergences
The reference implementation has a few known divergences from the spec document:
- The
interventionfield is typed as*stringwith noomitemptytag. Every event emits"intervention": null, even when no intervention occurred. This is intentional -- it ensures log parsers can rely on the field always being present. - The implementation uses
tsfor the timestamp field. The spec (section 5) previously listedtimestamp. - The spec (section 5) omits
errorfrom its type enum and usesintervention_reasonwhere the reference logger usesintervention.
These divergences are documented here as practical guidance. The reference implementation is the source of truth for runtime behavior.
Structured, Not Self-Reported
Drift is independently scored from proxy telemetry -- not self-reported by the agent. The proxy provides a verifiable history of exactly what the bot tried to do versus what it was allowed to do.
Operator Dashboard
The cllama proxy serves a real-time web UI for operator visibility.
| Property | Value |
|---|---|
| Host port | 8181 (default) |
| Container port | 8081 |
The dashboard shows:
- Live agent activity -- which agent is calling, which model, right now
- Provider status and error rates
- Cost breakdown per agent, per model, per time window
- Token usage across the pod
The dashboard updates in real time as agents make LLM calls. No polling, no delay.
Ecosystem Implementations
Passthrough Reference
The reference image (ghcr.io/mostlydev/cllama) implements the v1 API contract as a pure transparent proxy:
- Bearer-token identity resolution and validation.
- Environment validation (
CLAW_POD,CLAW_CONTEXT_ROOT, provider credentials). - OpenAI and Anthropic API format passthrough with format bridging.
- Per-agent token usage and cost tracking.
- Structured audit logging of all traffic.
- Real-time operator dashboard.
- No prompt decoration, no response amendment.
This image is used for testing and serves as the starting point for building custom policy engines.
Future: cllama-policy
The next planned implementation is cllama-policy, which adds bidirectional interception -- prompt decoration, tool scoping, response amendment, and drift scoring. The passthrough reference establishes the transport and identity contract; cllama-policy builds the governance logic on top.
Third-Party Engines
Any OpenAI-compatible proxy that consumes the Clawdapus context mount layout can act as a governance layer. The spec defines the contract, not the implementation. Operators can build proprietary engines incorporating advanced DLP, RAG-based context injection, or conversational configuration.
ClawRouter
ClawRouter is a specialized cllama implementation focused on forced model routing, rate limiting, and compute metering. It intercepts model requests, evaluates them against organizational budgets or provider availability, and dynamically routes, downgrades, or rate-limits requests to contain costs across a fleet of untrusted agents.
Security Model
Credential Isolation
The proxy enforces a strict credential boundary. Agent containers never see real provider API keys. The flow is:
claw upgenerates a dummy bearer token for each agent.- The agent's runner is configured with the proxy URL and dummy token.
- The proxy receives the dummy token, validates it, strips it, and attaches the real provider key.
- The agent cannot extract the real key because it only communicates with the proxy, never directly with the provider.
Network Isolation
Within the pod's Docker network, agents can reach the proxy at http://cllama-<type>:8080. They cannot reach the provider directly because no provider credentials exist in their environment. Even if an agent attempted to call the provider API directly, it would lack authentication.
Token Validation
Bearer tokens are validated against the principals field in each agent's metadata.json. A request with an invalid or missing token is rejected before any provider call is made. This is fail-closed: unknown tokens are denied, not passed through.
Implementation Notes
These notes reflect the current state of the reference implementation (cllama/ submodule) and are useful for debugging or extending.
Proxy Handler
The proxy handler (cllama/internal/proxy/handler.go) is pure passthrough. It rewrites the model field in the request body and forwards everything else unchanged. There is no prompt decoration, no system message injection, and no middleware hook system.
Logger Internals
The logger (cllama/internal/logging/logger.go) writes one JSON object per line to stdout. The intervention field is declared as *string (pointer to string) with no omitempty struct tag, so Go's JSON marshaler emits "intervention": null on every event. This is intentional -- it ensures log parsers can rely on the field always being present.
Image Resolution
Operator flow is explicit now:
claw pullfetches the pinned cllama image the currentclawbinary expects.claw upstays strict and tells you to runclaw pullwhen the proxy image is missing.claw up --fixperforms that remediation automatically.
Build and Publish
For end users, prefer claw pull. The raw multi-arch build below is contributor-only release tooling:
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t ghcr.io/mostlydev/cllama:<tag> \
--push cllama/The cllama/ directory is a git submodule pointing to a private SSH repo. Fresh clones leave it empty. The published image on ghcr.io is public, so end users pull the pre-built image rather than building from source.
Limitations
Current constraints to be aware of:
- Single proxy type only. Multi-proxy is represented in the data model, but the runtime currently fails fast if more than one proxy type is declared per pod. Proxy chaining is a Phase 5 feature.
- Passthrough only. The
cllama-policyproxy type (full bidirectional interception with prompt decoration, tool scoping, and response amendment) is future work. The reference implementation does identity, routing, and cost tracking. - No per-turn hooks. The Clawdapus
Driverinterface has four methods (Validate,Materialize,PostApply,HealthProbe) -- all run once at deploy/startup. There is no per-turn or per-request hook. Any per-request context enrichment must go through cllama or a runner-native mechanism. - Intervention field quirk. The cllama logger emits
"intervention": nullon every event (the field has noomitemptytag). This is expected behavior, not a missing value. - Spec divergences. The specification uses
intervention_reasonwhere the reference implementation usesintervention, and omitserrorfrom its type enum. Thetstimestamp field replacedtimestamp. Consumers should handle both forms.
See the full cllama specification on GitHub for the formal standard.
