pay.2nth.ai › Tree › ai › model-strategy

ai · Model strategy · Leaf

Edge by default, frontier on demand.

Model choice is a tiering decision, not a single bet. The public agents run on Cloudflare Workers AI at the edge — cheap, fast, close to the request. Registered partners get Gemini or Claude via API for deep, ad-hoc work. A gateway sits between the agents and the providers, so the choice is configuration, never a rewrite.

Model strategy Edge default Partner APIs Gateway Portable

01 · What it is

A tiered, portable model strategy.

Every agent on this surface needs a model behind it. Committing the whole roster to one provider is the mistake: it is a cost, latency and lock-in trap, and in a regulated space it leaves you exposed when a vendor changes terms, a model is retired, or a client demands data residency.

The strategy is deliberately tiered. The default runtime is Cloudflare Workers AI — edge models that run cheaply and close to the request, used for the always-on public work. Registered partners get Gemini or Claude via API for deep, ad-hoc tasks. And the agents never call a provider directly — everything routes through a model gateway, so “which model” stays a configuration decision, not a code rewrite.

02 · How it works

A gateway between the agents and the models.

No agent in this roster imports a provider SDK. Calls go to a gateway that handles routing, caching, observability and the policy of who gets which model. Swapping edge for frontier — or one frontier vendor for another — is a config change.

// The model gateway — one seam, many models

  agent  →  model gateway  →  Cloudflare Workers AI   (edge default)
                       →  Claude via API          (partner, deep tasks)
                       →  Gemini via API          (partner, deep tasks)
                       →  local / on-prem weights  (residency, PCI)

  // the gateway owns: routing, tier policy, caching,
  // rate limits and the audit trail (Langfuse)

null

03 · Tiering onto Open / Member / Partner

The model follows the access tier.

Model choice maps onto the same access tiers as the rest of the surface. Open and reference work is served by edge models; member and partner depth unlocks frontier APIs for the heavy reasoning.

Tier	Default model	Used for
Open	Cloudflare Workers AI (edge)	Monitoring, summaries, public Q&A, tutoring
Member	Edge + selective frontier	Decision-content agents, assessment
Partner	Gemini / Claude via API	Deep ad-hoc reasoning, drafting, diligence

04 · Why edge by default

Cost, latency and footprint.

Cost at scale

The always-on work — monitoring, summarising, public Q&A, tutoring — is high-volume. Edge models keep that economically sane; frontier APIs would not.

Latency & locality

Workers AI runs inference close to the request. For interactive public agents that responsiveness is the experience.

Open weights, not a single vendor

The Workers AI catalogue runs open models — Llama, Gemma-family embeddings and more — so the default tier is itself portable.

Frontier where it earns its keep

Long-context regulatory reasoning, complex drafting and diligence are where Claude or Gemini pay for themselves — reserved for partners and deep tasks.

05 · Portability is the requirement

Never locked to one model.

Model-portable by design

This is the same portability stance as the 2nth-ai/agent-platform control plane these agents run on — the partner-copyable plane of Cerbos policy, Langfuse audit and the model gateway. The rule is explicit: never lock to a single model or SDK. The platform supports the full range from open weights running locally (Ollama, on-prem) through to frontier APIs (Claude, Gemini). The gateway is the seam that makes that a configuration choice, end to end.

06 · Where the model choice bites

Cheaper is not always safer.

Tiering by cost is right, but it has edges. In a regulated, money-adjacent context these are the failure modes to watch:

Edge models are smaller models

The default tier trades capability for cost and speed. For nuanced regulatory reasoning, route to a frontier model — do not let a small model bluff a compliance answer.

Provider terms move

Models get retired and pricing shifts. The gateway is what stops that from being a re-platform — but someone still has to watch the providers.

Data residency overrides cost

Where PCI scope or residency demands it, the right model is the local one — even if a frontier API would reason better. See the local-models leaf.

A model is never the accountable party

No tier of model signs off compliance, interprets regulation or moves money. The strategy chooses a model; a human still owns the decision.

07 · When to reach for frontier

And when the edge is plenty.

Stay on the edge for the bulk of the work: monitoring, summarising, public Q&A, tutoring and any high-volume, latency-sensitive interaction. It is cheaper, faster and entirely good enough for retrieval-and-summarise over a moderated tree.

Route to Claude or Gemini — for partners — when the task is long-context regulatory reasoning, complex drafting, or diligence where the cost of a shallow answer is high. Route to local / on-prem weights when residency or PCI scope makes sending data to any API a non-starter. The point of the gateway is that this routing is a policy you set, not a decision baked into code.

08 · Connections

Where this sits in the tree.

Local & on-prem models

The residency-and-PCI tier of the same strategy — when self-hosting open weights beats any API.

Research & regulatory-watch agent

The agent that lives this strategy: edge for monitoring, frontier for partner drafting.

know.2nth.ai

agent-platform architecture

The partner-copyable control plane — Cerbos policy, Langfuse audit, model gateway — that these agents run on.

09 · Resources

The platforms and model docs.

PlatformCloudflare Workers AIdevelopers.cloudflare.com/workers-ai CatalogueWorkers AI — model cataloguedevelopers.cloudflare.com/workers-ai/models PlatformCloudflare AI Gatewaydevelopers.cloudflare.com/ai-gateway Model docsAnthropic — Claude API documentationdocs.anthropic.com Model docsGoogle — Gemini API documentationai.google.dev/gemini-api