The OrchVynt Platform

A control plane that sits between your agents and the model APIs

Routing, fallback, budget enforcement, and human oversight — all managed declaratively. Zero changes to agent code. Full telemetry out of the box.

Get Early Access Read the Docs

Architecture

How the control plane fits in your stack

Just as a service mesh sits between services and handles cross-cutting concerns — mTLS, retries, load balancing — OrchVynt sits between your agents and the model APIs, handling orchestration concerns that don't belong in agent code.

OrchVynt control plane architecture: application layer routing through OrchVynt to multiple LLM providers with feedback and observability paths

OrchVynt intercepts agent invocations at the API boundary. Your agents call orchvynt.route() instead of the model client directly. OrchVynt applies your policy, routes to the right model, enforces budgets, records telemetry — then proxies the response back unchanged.

Routing Engine

Policy-driven model selection per invocation

Not every agent invocation needs GPT-4. Route to cheaper models for drafting tasks, expensive models for final review. Define the policy once; OrchVynt executes it on every call.

Per-workload routing rules (route by task type, user tier, content sensitivity)

A/B routing with traffic splits — evaluate model quality empirically

Latency-aware fallback triggers — route away from slow models automatically

Custom routing plugins via webhook for complex scoring logic

routing-policy.yaml

routing: policy: cost_then_quality rules: - workload: classification model: gpt-4o-mini max_latency_ms: 3000 - workload: extraction model: claude-3-haiku - workload: synthesis model: gpt-4o quality_threshold: 0.85 ab_split: enabled: true variants: - model: gpt-4o weight: 70 - model: claude-3-5-sonnet weight: 30 timeout_ms: 8000

Fallback Chains

Define the survival path once

OrchVynt activates the fallback chain automatically on 429s, 5xx errors, timeout thresholds, or explicit cost caps. One definition, one place to change, one place to observe.

Provider-level fallback (OpenAI → Anthropic → local model via Ollama)

Model-level fallback (GPT-4o → GPT-4o-mini when cost cap hit)

Configurable activation triggers — status codes, latency, cost thresholds

Fallback activation events emitted as structured telemetry

fallback-chain.yaml

fallback: chain: - tier: primary model: gpt-4o provider: openai - tier: secondary model: claude-3-5-sonnet provider: anthropic - tier: emergency model: mistral-7b provider: ollama endpoint: http://local-gpu:11434 triggers: on_status_code: [429, 500, 502, 503] on_latency_p99_ms: 9000 on_cost_per_invocation_usd: 0.08 emit_telemetry: true

Token Budget Enforcement

Not advisory limits — hard enforcement

OrchVynt intercepts invocations that would exceed per-workflow or per-session budgets before they reach the model. The invocation doesn't happen; the budget is respected.

Per-workflow token caps — different budget ceilings for different workflows

Per-session rolling budgets — track accumulation across a conversation

Cost-per-invocation accounting at the provider rate

Budget breach telemetry with configurable rejection or degradation actions

Abstract visualization of token budget enforcement — a glowing gauge or meter showing usage tracking with an enforcement threshold line

Human-in-the-Loop Gates

Insert approval checkpoints that pause workflow execution

HITL gates are a governance primitive, not a safety net. Enterprise teams need them for compliance — financial decisions, PII handling, regulated content — not just because they don't trust the model.

Configurable trigger conditions — confidence threshold, content flag, explicit rule

Webhook + Slack notification on gate open

Timeout handling — auto-approve or auto-reject after configurable window

Full audit trail per HITL event — timestamps, reviewer, decision, reason

hitl-gate.yaml

hitl: gates: - id: compliance-review trigger: confidence_below: 0.70 action: pause_and_notify notify: webhook: https://hooks.company.com/hitl slack_channel: #ai-review-queue timeout: duration_minutes: 30 on_timeout: auto_reject audit_log: true

Observability

Workflow observability — in the stack you already use

Every routing decision, fallback activation, budget enforcement event, and HITL resolution is emitted as structured telemetry. OrchVynt doesn't ask you to learn a new observability tool — it drops into Datadog, Grafana, Honeycomb, or any OpenTelemetry-compatible backend.

OpenTelemetry trace export

Emits OTLP traces compatible with any OpenTelemetry backend. Every agent hop is a span with routing decision metadata attached.

Prometheus metrics endpoint

Exposes /metrics endpoint. Scrape it with your existing Prometheus instance. Pre-built Grafana dashboard included.

Structured JSON event log

Every orchestration event logged as structured JSON. Ship to S3, GCS, or local filesystem. Compliance export in CSV format.

Get the control plane running in under 10 minutes

Pull the Docker image, write a three-line config, point your agents at localhost:4821.

Get Early Access Read the Docs