Use Case

Hard enforcement, not advisory warnings

Most token budget tooling is advisory — it tells you after the fact that you've exceeded a threshold. OrchVynt intercepts invocations that would breach the budget before they reach the model. The cost gets enforced, not reported.

Budget scopes you can configure

Per-workflow — set a ceiling on the total spend a specific workflow is allowed. A document analysis workflow can spend up to $X total.

Per-session — rolling budget across a conversation session. Accumulates across turns until the cap is hit.

Per-user-tier — free tier users get 10k tokens per session; premium users get 100k. Defined in policy, not in application code.

Per-invocation cost — reject or downgrade any single invocation that would cost more than a configured ceiling.

budgets.yaml

budgets: scopes: - name: doc-analysis-workflow scope: workflow max_usd: 2.50 on_breach: reject - name: free-tier-session scope: session user_tier: free max_tokens: 10000 on_breach: downgrade_model downgrade_target: gpt-4o-mini - name: invocation-cap scope: invocation max_usd: 0.10 on_breach: reject telemetry: true

What happens on budget breach

Reject

The invocation is rejected before reaching the model. OrchVynt returns a budget-exceeded error to the caller. Use for hard financial caps — invocations don't happen when the budget is gone.

Downgrade

When the primary model would exceed the budget, route to a cheaper model configured as the downgrade target. The invocation completes, but at reduced cost and quality.

Alert only

Emit a telemetry event and optionally notify via webhook when a threshold is reached — without blocking the invocation. Useful for cost visibility without hard enforcement.

Set your first budget in five minutes

Read the Docs Get Early Access