Policy-driven routing across multiple agents and models
Different tasks have different cost and quality profiles. OrchVynt routes each invocation to the right model based on workload type, user tier, latency requirements, or any combination of signals — defined declaratively, not hardcoded in agent code.
The problem with hardcoded model selection
When model selection is hardcoded into agent logic, every change requires a code deploy. When a new model comes out, you update 12 files. When you want to A/B test two providers, you add branching logic to agent code that should only contain business logic.
OrchVynt separates routing policy from agent logic entirely. The agent calls orchvynt.route(). The policy lives in YAML. You change the routing strategy without touching your agents.
What you can express in a routing policy
Workload-based routing
Route classification tasks to cheaper models, synthesis tasks to higher-quality models. Tag each invocation with a workload type and define the mapping in YAML.
Traffic splits & A/B routing
Split traffic across models by weight. Evaluate GPT-4o vs Claude 3.5 Sonnet on 30/70 for a workflow — observe quality metrics in your telemetry backend, then shift the split based on evidence.
Latency-aware routing
Set per-workload latency targets. OrchVynt monitors p99 latency per model and reroutes automatically when a model exceeds the threshold — without waiting for a human to notice.
Context-aware selection
Pass structured context with each invocation — user tier, content sensitivity flag, geographic region. Routing rules can express conditions like "use GPT-4o for enterprise tier users."