pay.2nth.ai Tree ai model-strategy
ai · Model strategy · Leaf

Edge by default, frontier on demand.

Model choice is a tiering decision, not a single bet. The public agents run on Cloudflare Workers AI at the edge — cheap, fast, close to the request. Registered partners get Gemini or Claude via API for deep, ad-hoc work. A gateway sits between the agents and the providers, so the choice is configuration, never a rewrite.

Model strategy Edge default Partner APIs Gateway Portable

A tiered, portable model strategy.

Every agent on this surface needs a model behind it. Committing the whole roster to one provider is the mistake: it is a cost, latency and lock-in trap, and in a regulated space it leaves you exposed when a vendor changes terms, a model is retired, or a client demands data residency.

The strategy is deliberately tiered. The default runtime is Cloudflare Workers AI — edge models that run cheaply and close to the request, used for the always-on public work. Registered partners get Gemini or Claude via API for deep, ad-hoc tasks. And the agents never call a provider directly — everything routes through a model gateway, so “which model” stays a configuration decision, not a code rewrite.

A gateway between the agents and the models.

No agent in this roster imports a provider SDK. Calls go to a gateway that handles routing, caching, observability and the policy of who gets which model. Swapping edge for frontier — or one frontier vendor for another — is a config change.

// The model gateway — one seam, many models

  agent  →  model gateway  →  Cloudflare Workers AI   (edge default)
                       →  Claude via API          (partner, deep tasks)
                       →  Gemini via API          (partner, deep tasks)
                       →  local / on-prem weights  (residency, PCI)

  // the gateway owns: routing, tier policy, caching,
  // rate limits and the audit trail (Langfuse)

null

The model follows the access tier.

Model choice maps onto the same access tiers as the rest of the surface. Open and reference work is served by edge models; member and partner depth unlocks frontier APIs for the heavy reasoning.

TierDefault modelUsed for
OpenCloudflare Workers AI (edge)Monitoring, summaries, public Q&A, tutoring
MemberEdge + selective frontierDecision-content agents, assessment
PartnerGemini / Claude via APIDeep ad-hoc reasoning, drafting, diligence

Cost, latency and footprint.

Cost at scale

The always-on work — monitoring, summarising, public Q&A, tutoring — is high-volume. Edge models keep that economically sane; frontier APIs would not.

Latency & locality

Workers AI runs inference close to the request. For interactive public agents that responsiveness is the experience.

Open weights, not a single vendor

The Workers AI catalogue runs open models — Llama, Gemma-family embeddings and more — so the default tier is itself portable.

Frontier where it earns its keep

Long-context regulatory reasoning, complex drafting and diligence are where Claude or Gemini pay for themselves — reserved for partners and deep tasks.

Never locked to one model.

Model-portable by design

This is the same portability stance as the 2nth-ai/agent-platform control plane these agents run on — the partner-copyable plane of Cerbos policy, Langfuse audit and the model gateway. The rule is explicit: never lock to a single model or SDK. The platform supports the full range from open weights running locally (Ollama, on-prem) through to frontier APIs (Claude, Gemini). The gateway is the seam that makes that a configuration choice, end to end.

Cheaper is not always safer.

Tiering by cost is right, but it has edges. In a regulated, money-adjacent context these are the failure modes to watch:

Edge models are smaller models

The default tier trades capability for cost and speed. For nuanced regulatory reasoning, route to a frontier model — do not let a small model bluff a compliance answer.

Provider terms move

Models get retired and pricing shifts. The gateway is what stops that from being a re-platform — but someone still has to watch the providers.

Data residency overrides cost

Where PCI scope or residency demands it, the right model is the local one — even if a frontier API would reason better. See the local-models leaf.

A model is never the accountable party

No tier of model signs off compliance, interprets regulation or moves money. The strategy chooses a model; a human still owns the decision.

And when the edge is plenty.

Stay on the edge for the bulk of the work: monitoring, summarising, public Q&A, tutoring and any high-volume, latency-sensitive interaction. It is cheaper, faster and entirely good enough for retrieval-and-summarise over a moderated tree.

Route to Claude or Gemini — for partners — when the task is long-context regulatory reasoning, complex drafting, or diligence where the cost of a shallow answer is high. Route to local / on-prem weights when residency or PCI scope makes sending data to any API a non-starter. The point of the gateway is that this routing is a policy you set, not a decision baked into code.

Where this sits in the tree.

The platforms and model docs.