Every multi-agent system starts clean. The first version has three agents, clear purpose separation, and readable code. The routing logic is a function that's 15 lines. The fallback is a single try/except. The budget tracking is a comment that says "TODO: add real tracking."
Six months later, the same codebase has 12 agents. The routing function is 200 lines with conditions accumulated from five sprint cycles. The fallback logic has been copy-pasted to four agents because there was "no time to refactor." The budget tracking TODO is still a TODO. There's a Slack bot that pages the on-call when spend looks unusual, but no one fully trusts it. A new engineer joined last month and still doesn't fully understand how the routing decision gets made.
This is code spaghetti. It doesn't start as spaghetti. It accretes into it. And the accretion pattern in multi-agent AI systems is highly predictable: orchestration logic — routing decisions, fallback strategies, budget rules — starts in one place, gets copied to multiple places as the system grows, and each copy diverges under independent maintenance pressure until the codebase has multiple different implementations of the same policy.
Why orchestration logic accretes
The fundamental reason is that orchestration concerns are cross-cutting — they apply to every agent invocation — but the easiest place to implement them is inside each agent. It's easier to add a try/except to the agent you're working on right now than to refactor the shared routing abstraction. It's easier to copy the budget check from agent A to agent B than to extract it to a centralized service.
This is exactly the same accretion pattern that leads to duplicated authentication logic, duplicated retry logic, and duplicated rate-limiting logic in traditional microservice architectures. The mature solution for those concerns is infrastructure: auth middleware, retry policies in the service mesh, rate limiting at the ingress layer. The mature solution for AI orchestration concerns is a control plane.
But there's an additional pressure in AI systems that doesn't exist in traditional services: the pace of change. New models are released on a quarterly cadence. Provider pricing changes regularly. Regulatory requirements for AI systems are evolving. Every change in the external environment becomes a change request against every agent that hardcodes the policy that's affected.
The properties of declarative orchestration
Declarative orchestration means the orchestration policy is declared — stated explicitly, in a structured form — rather than encoded imperatively in application code. The properties that flow from this distinction are worth enumerating concretely:
Single source of truth. One YAML file declares the routing policy for the entire system. When you want to know "what model does workflow X use for synthesis tasks?" you look in one place. You don't grep through eight agent files to see if they all agree.
Change without deployment. Updating the routing policy doesn't require a code change, a test run, and a deployment cycle. It's a config change applied to the control plane. The agents are unaware the policy changed.
Reviewable and auditable changes. The change to "switch synthesis tasks from GPT-4o to Claude-3-5-Sonnet" is a single line in a YAML file. It has a git diff. It can be reviewed in a pull request, questioned in a code review comment, and reverted with a single command if it has an unexpected effect. The equivalent change in an imperative codebase is scattered across several agent files, harder to see in aggregate, and harder to reverse cleanly.
Consistent enforcement. The same routing policy applies to every invocation that matches the rule — not "the policy as implemented in agent A" and "the policy as implemented in agent B." Consistency is a property of centralized enforcement, not of well-intentioned duplication.
Separation of concerns. Agent code does what agents should do: problem solving, reasoning, tool use. Orchestration policy — routing, fallback, budget, HITL — is in the control plane. Engineers working on agent behavior don't need to think about routing strategy. Engineers working on routing strategy don't need to understand agent implementation details.
The refactor cost is real
One honest objection to this framing is: we have an existing system. Extracting orchestration logic from a mature codebase is expensive and risky. You're right — it is. This is the most compelling argument for doing it early.
Every sprint cycle of deferred extraction is another sprint cycle of accreted complexity. The routing logic in agent A gets a new condition. Agent B's copy of that logic doesn't get the corresponding update. The divergence compounds. By the time you decide to extract the logic, it's not a refactor — it's an archaeology project, because before you can extract it, you have to figure out what the intended behavior actually is from the divergent implementations.
Teams that extract orchestration logic at agent #2 or #3 — before the divergence accumulates — consistently report that the extraction takes a day or two and leaves the codebase cleaner. Teams that wait until agent #10 report that the extraction takes weeks and introduces regressions.
A concrete before/after
Imperative routing in agent code (before):
# In agent_synthesis.py
def invoke_synthesis(prompt, user_tier):
if user_tier == "premium":
model = "gpt-4o"
timeout = 30
elif user_tier == "standard":
model = "gpt-4o-mini"
timeout = 15
else:
model = "gpt-4o-mini"
timeout = 10
for attempt in range(3):
try:
response = openai_client.invoke(prompt, model=model, timeout=timeout)
# Is this the right quality threshold? Not sure. Copied from agent_draft.py
if response.quality_score > 0.75:
return response
elif attempt < 2:
model = "gpt-4o" # upgrade on quality failure
except (RateLimitError, TimeoutError):
if attempt == 2:
raise
time.sleep(2 ** attempt)
raise MaxRetriesExceeded()
The same behavior, declarative in control plane config (after):
routing:
policy: user_tier_with_quality_cascade
rules:
- tier: premium
model: gpt-4o
quality_threshold: 0.75
- tier: standard
model: gpt-4o-mini
quality_threshold: 0.75
upgrade_on_below_threshold: gpt-4o
- tier: default
model: gpt-4o-mini
quality_threshold: 0.75
upgrade_on_below_threshold: gpt-4o
fallback:
triggers:
on_status_code: [429]
on_latency_ms: [10000, 15000, 30000] # per tier
chain:
- model: gpt-4o-mini
- model: gpt-4o # fallback to quality upgrade
- model: claude-3-haiku # provider fallback
The agent code becomes:
def invoke_synthesis(prompt, user_tier):
return orchvynt.route("synthesis", prompt, context={"user_tier": user_tier})
Three lines. The routing logic — all of it — lives in config. The agent doesn't know about model names, timeout values, quality thresholds, or retry behavior. It calls the control plane and gets a response. The control plane is responsible for making that response come from the right model under the right conditions.
The maintenance argument is the strongest one
Architectural arguments for declarative orchestration often focus on elegance and correctness. The strongest practical argument is maintenance cost over time.
In an imperative codebase, adding a new workload type requires identifying every agent that implements routing logic and updating each one. Adding a new model provider requires updating every agent that has provider configuration. Changing a timeout value requires finding every place that value is set and verifying they all get updated consistently.
In a declarative codebase with a control plane, adding a new workload type is a new rule in one config file. Adding a new provider is a new tier in the fallback chain. Changing a timeout is one line in one file.
Multiply the maintenance cost difference by the number of routing changes your system sees over a year. The engineering hours saved are the ROI on the architectural investment.
