Platform
Use Cases
Multi-Agent Routing Fallback Chains Token Budgets Human-in-the-Loop Integrations Docs Blog Pricing
Sign In Get Early Access
Home / Platform
The OrchVynt Platform

A control plane that sits between your agents and the model APIs

Routing, fallback, budget enforcement, and human oversight — all managed declaratively. Zero changes to agent code. Full telemetry out of the box.

Get Early Access Read the Docs

How the control plane fits in your stack

Just as a service mesh sits between services and handles cross-cutting concerns — mTLS, retries, load balancing — OrchVynt sits between your agents and the model APIs, handling orchestration concerns that don't belong in agent code.

OrchVynt control plane architecture: application layer routing through OrchVynt to multiple LLM providers with feedback and observability paths

OrchVynt intercepts agent invocations at the API boundary. Your agents call orchvynt.route() instead of the model client directly. OrchVynt applies your policy, routes to the right model, enforces budgets, records telemetry — then proxies the response back unchanged.

Policy-driven model selection per invocation

Not every agent invocation needs GPT-4. Route to cheaper models for drafting tasks, expensive models for final review. Define the policy once; OrchVynt executes it on every call.

Per-workload routing rules (route by task type, user tier, content sensitivity)
A/B routing with traffic splits — evaluate model quality empirically
Latency-aware fallback triggers — route away from slow models automatically
Custom routing plugins via webhook for complex scoring logic
routing-policy.yaml
routing: policy: cost_then_quality rules: - workload: classification model: gpt-4o-mini max_latency_ms: 3000 - workload: extraction model: claude-3-haiku - workload: synthesis model: gpt-4o quality_threshold: 0.85 ab_split: enabled: true variants: - model: gpt-4o weight: 70 - model: claude-3-5-sonnet weight: 30 timeout_ms: 8000

Define the survival path once

OrchVynt activates the fallback chain automatically on 429s, 5xx errors, timeout thresholds, or explicit cost caps. One definition, one place to change, one place to observe.

Provider-level fallback (OpenAI → Anthropic → local model via Ollama)
Model-level fallback (GPT-4o → GPT-4o-mini when cost cap hit)
Configurable activation triggers — status codes, latency, cost thresholds
Fallback activation events emitted as structured telemetry
fallback-chain.yaml
fallback: chain: - tier: primary model: gpt-4o provider: openai - tier: secondary model: claude-3-5-sonnet provider: anthropic - tier: emergency model: mistral-7b provider: ollama endpoint: http://local-gpu:11434 triggers: on_status_code: [429, 500, 502, 503] on_latency_p99_ms: 9000 on_cost_per_invocation_usd: 0.08 emit_telemetry: true

Not advisory limits — hard enforcement

OrchVynt intercepts invocations that would exceed per-workflow or per-session budgets before they reach the model. The invocation doesn't happen; the budget is respected.

Per-workflow token caps — different budget ceilings for different workflows
Per-session rolling budgets — track accumulation across a conversation
Cost-per-invocation accounting at the provider rate
Budget breach telemetry with configurable rejection or degradation actions
Abstract visualization of token budget enforcement — a glowing gauge or meter showing usage tracking with an enforcement threshold line

Insert approval checkpoints that pause workflow execution

HITL gates are a governance primitive, not a safety net. Enterprise teams need them for compliance — financial decisions, PII handling, regulated content — not just because they don't trust the model.

Configurable trigger conditions — confidence threshold, content flag, explicit rule
Webhook + Slack notification on gate open
Timeout handling — auto-approve or auto-reject after configurable window
Full audit trail per HITL event — timestamps, reviewer, decision, reason
hitl-gate.yaml
hitl: gates: - id: compliance-review trigger: confidence_below: 0.70 action: pause_and_notify notify: webhook: https://hooks.company.com/hitl slack_channel: #ai-review-queue timeout: duration_minutes: 30 on_timeout: auto_reject audit_log: true

Workflow observability — in the stack you already use

Every routing decision, fallback activation, budget enforcement event, and HITL resolution is emitted as structured telemetry. OrchVynt doesn't ask you to learn a new observability tool — it drops into Datadog, Grafana, Honeycomb, or any OpenTelemetry-compatible backend.

OpenTelemetry trace export

Emits OTLP traces compatible with any OpenTelemetry backend. Every agent hop is a span with routing decision metadata attached.

Prometheus metrics endpoint

Exposes /metrics endpoint. Scrape it with your existing Prometheus instance. Pre-built Grafana dashboard included.

Structured JSON event log

Every orchestration event logged as structured JSON. Ship to S3, GCS, or local filesystem. Compliance export in CSV format.

Get the control plane running in under 10 minutes

Pull the Docker image, write a three-line config, point your agents at localhost:4821.