Model choice is a tiering decision, not a single bet. The public agents run on Cloudflare Workers AI at the edge — cheap, fast, close to the request. Registered partners get Gemini or Claude via API for deep, ad-hoc work. A gateway sits between the agents and the providers, so the choice is configuration, never a rewrite.
Every agent on this surface needs a model behind it. Committing the whole roster to one provider is the mistake: it is a cost, latency and lock-in trap, and in a regulated space it leaves you exposed when a vendor changes terms, a model is retired, or a client demands data residency.
The strategy is deliberately tiered. The default runtime is Cloudflare Workers AI — edge models that run cheaply and close to the request, used for the always-on public work. Registered partners get Gemini or Claude via API for deep, ad-hoc tasks. And the agents never call a provider directly — everything routes through a model gateway, so “which model” stays a configuration decision, not a code rewrite.
No agent in this roster imports a provider SDK. Calls go to a gateway that handles routing, caching, observability and the policy of who gets which model. Swapping edge for frontier — or one frontier vendor for another — is a config change.
// The model gateway — one seam, many models agent → model gateway → Cloudflare Workers AI (edge default) → Claude via API (partner, deep tasks) → Gemini via API (partner, deep tasks) → local / on-prem weights (residency, PCI) // the gateway owns: routing, tier policy, caching, // rate limits and the audit trail (Langfuse)
Model choice maps onto the same access tiers as the rest of the surface. Open and reference work is served by edge models; member and partner depth unlocks frontier APIs for the heavy reasoning.
| Tier | Default model | Used for |
|---|---|---|
| Open | Cloudflare Workers AI (edge) | Monitoring, summaries, public Q&A, tutoring |
| Member | Edge + selective frontier | Decision-content agents, assessment |
| Partner | Gemini / Claude via API | Deep ad-hoc reasoning, drafting, diligence |
The always-on work — monitoring, summarising, public Q&A, tutoring — is high-volume. Edge models keep that economically sane; frontier APIs would not.
Workers AI runs inference close to the request. For interactive public agents that responsiveness is the experience.
The Workers AI catalogue runs open models — Llama, Gemma-family embeddings and more — so the default tier is itself portable.
Long-context regulatory reasoning, complex drafting and diligence are where Claude or Gemini pay for themselves — reserved for partners and deep tasks.
This is the same portability stance as the 2nth-ai/agent-platform control plane these agents run on — the partner-copyable plane of Cerbos policy, Langfuse audit and the model gateway. The rule is explicit: never lock to a single model or SDK. The platform supports the full range from open weights running locally (Ollama, on-prem) through to frontier APIs (Claude, Gemini). The gateway is the seam that makes that a configuration choice, end to end.
Tiering by cost is right, but it has edges. In a regulated, money-adjacent context these are the failure modes to watch:
The default tier trades capability for cost and speed. For nuanced regulatory reasoning, route to a frontier model — do not let a small model bluff a compliance answer.
Models get retired and pricing shifts. The gateway is what stops that from being a re-platform — but someone still has to watch the providers.
Where PCI scope or residency demands it, the right model is the local one — even if a frontier API would reason better. See the local-models leaf.
No tier of model signs off compliance, interprets regulation or moves money. The strategy chooses a model; a human still owns the decision.
Stay on the edge for the bulk of the work: monitoring, summarising, public Q&A, tutoring and any high-volume, latency-sensitive interaction. It is cheaper, faster and entirely good enough for retrieval-and-summarise over a moderated tree.
Route to Claude or Gemini — for partners — when the task is long-context regulatory reasoning, complex drafting, or diligence where the cost of a shallow answer is high. Route to local / on-prem weights when residency or PCI scope makes sending data to any API a non-starter. The point of the gateway is that this routing is a policy you set, not a decision baked into code.