Agent Architectures¶
To benchmark planning, five distinct architectural paradigms, ranging from a baseline single-agent system to various multi-agent systems (MAS), were implemented and compared.
To build on a multi-agent formulation (Zhou et al., 2024; Guo et al., 2024), an agent system is denoted by $S = \{A, E, C, \Omega\}$ where $A = \{a_1, \ldots, a_n\}$ (where $n \ge 1$) is a set of agents, $E$ is a shared environment, $C$ is communication topology, and $\Omega$ is an orchestration policy.
Each agent $a_i$ will have a representation by tuple $S_i = (\phi_i, A_i, M_i, \pi_i)$
where,
- $\phi_i$ represents the reasoning policy (typically LLM)
- $A_i = \{ \mathrm{ToolCall}(t, \theta) \mid t \in T,\ \theta \in \Theta_t \}$ represents the action space consisting of tool usage, where $T$ is the set of available tools (e.g., web search, code execution) and $\Theta_t$ represent valid parameter configuration for tool $t$
- $M_i$ represents the internal memory
- $\pi_i: H \to A_i$ represents the decision function mapping observation histories to actions
The observation history space $H$ contains sequences of action-observation pairs. The decision function $\pi_i$ is instantiated by reasoning policy $\phi_i$ (the LLM): given a history $h_{i,t}$, the LLM generates a reasoning trace and selects the next action.
For example, a history
$$ h_{i,t} = \Big[ \big(\texttt{zotero\_search\_item}(query=\text{"List all multi-agents papers in zotero"}), \text{"Found 5 files"}\big), \dots \Big] $$
is processed by $\phi_i$ to produce the next tool call $\alpha_{i,t+1}$.
At timestep $t$, agent $a_i$ selects an action $\alpha_{i,t} \in A_i$ according to:
$$ \alpha_{i,t} = \pi_i(h_{i,t}), \quad o_{i,t} = E(\alpha_{i,t}), \quad h_{i,t+1} = f_i(h_{i,t}, \alpha_{i,t}, o_{i,t}) $$
where $E$ denotes the environment and $h_{i,0} = \{s_0\}$ contains the initial task specification. The history update function $f_i: H \times A_i \times O \to H$ appends the new action-observation pair to agent's history: $h_{i,t+1} = f_i(h_{i,t}, \alpha_{i,t}, o_{i,t}) = h_{i,t} \oplus (\alpha_{i,t}, o_{i,t})$, subject to context window truncation when $|h_{i,t+1}| > \text{MAX\_TOKENS}$. This update mechanism applies uniformly to both SAS and MAS configurations. Communication between agents happens through explicit message passing in the orchestration layer.
Multi-Agent System (MAS): A Multi-Agent System is an agent system $S$ with $|A| > 1$, where agents interact through communication topology $C$ and orchestration policy $\Omega$.
# Agent system implementation - define base agent structure here
Single Agent¶
A Single-Agent System contains a single reasoning locus ($|A| = 1$, where $A$ is the agent set). All perception, reasoning, and action occur within a single sequential loop, producing computational complexity $O(k)$ where $k$ is the number of reasoning iterations. SAS has zero communication overhead and minimal memory $O(k)$, but limited capacity for decomposition or verification.
The Single Agent acts as our baseline architecture. At each environment step, the agent reads the latest observation, appends it to its ongoing continuous conversation history, and invokes the LLM to output exactly one action. If a search is triggered, the simulation buffers the state, resolves the search oracle, and provides the recipe to the agent before executing physical changes.
A single LLM agent (gpt-5-mini) maintains a conversation history and interacts with the environment in a loop. Receives environment observations, appends them to the conversation history, and calls the LLM with the full history and the system prompt. Finally, return the model's action to the environment.
If the agent outputs a search: action, the oracle recipe (Dagan et al., 2024) tool is invoked and the result is fed back into the conversation for a follow-up LLM call.
In this implementation, we have a single agent; there's no communication involved. The orchestration ($\Omega$) is direct, and the complexity is $O(k)$ per step.
# Single Agent implementation
Independent MAS¶
In this architecture $|A| > 1$, and agents interact through communication topology $C$ and orchestration policy $\Omega$.
Communication topology $C$ defines information flow patterns between agents. In Independent topology:
The Independent Architecture mitigates hallucination and variance by executing $N$ disparate agent threads in parallel. All workers receive identical prompt states and generate physical action proposals simultaneously. A majority vote determines the final executed action at each step, providing computational stability through redundancy. If the vote universally proposes a search, the outcome is returned to all worker instances.
This configuration maximises parallelisation and minimises coordination overhead, making it suitable for ensemble-style reasoning baselines.
In this configuration, we have $n$ agents whose outputs are aggregated.
In the design architecture, we have two phases:
Phase 1 — Parallel Exploration
Many agents call the LLM in parallel (using asyncio.gather). Each agent uses temperature=0.7 for diverse proposals, and there's no peer communication — agents cannot see each other's outputs.
Phase 2 — Synthesis Aggregation
All $n$ proposals are concatenated into a single context string. One final LLM call (the aggregator, temperature=0.0) reads all proposals and synthesises a single concrete action. There's no voting, no comparison, and no error-correction — just synthesis.
Mathematically formulated as agent-to-aggregator only, no peer communication:
$$ C = \{(a_i, a_{\text{agg}}) : \forall i\} $$
# Independent MAS implementation
Centralized MAS¶
The Centralised Architecture introduces a two-step sequential hierarchy: an Orchestrator and a Worker. During a single environment step, the Orchestrator evaluates the observation and dictates a high-level, 1-sentence strategic plan (e.g., "We need to craft wooden planks before making sticks."). Subsequently, the context and the explicit plan are passed to an executor worker, who handles precision grounding into low-level token-action syntax.
# Centralized MAS implementation
Decentralized MAS¶
The Decentralised Architecture facilitates iterative reasoning. Multiple agents propose initial actions in parallel. Then, they engage in $R$ debate rounds. During a debate pass, the context window is updated with the peer proposals, allowing agents to criticise peers and refine their subsequent proposal. At the end of the final $R$-dimensional step, a majority vote determines the final action.
This system introduces peer information fusion without hierarchy. This is distributed in multiple phases:
Observation Phase
Plancraft environment sends the context, for example: Inventory: iron ore, Environment: crafting table nearby.
Phase 1 — Independent Proposals
Each agent independently proposes their answers without any influence from the others. For example: Agent1 replies craft iron_ingot, Agent2 replies craft_sword and Agent3 replies craft iron_ingot.
Phase 2 — Debate Rounds
In this phase, each agent sees each other's answers, and they might update their proposals. This process repeats for d rounds.
Phase 3 — Voting
Now the system picks the most common answer, i.e. craft iron_ingot.
This system exists because single agents are noisy and inconsistent. Multiple agents can catch mistakes. It has fault tolerance towards failure, and its reasoning quality improves.
Communication and Coordination are intertwined. The debate rounds serve both purposes simultaneously. Each agent updates its own position based on that information (coordination — peer consensus $\Omega = \text{Consensus}$).
Communication all-to-all is represented by:
$$ C = \{(a_i, a_j) : \forall\, i,j,\ i \neq j\} $$
There is no separate planning or directive step. The debate round serves both as an information exchange and as a steering mechanism. Final coordination is a majority vote.
Simulating two rounds:
Call 1–3: Initial Proposals (parallel)
Input context: self.conversation (just the raw observation)
System prompt: SYSTEM_PROMPT
Temperature: 0.7
→ 3 proposals: [P1, P2, P3]
Call 4–6: Debate round 0 (parallel)
Input context: self.conversation + [
user: \"Other agents proposed:
- Agent 1: P1
- Agent 2: P2
- Agent 3: P3
Given these proposals, output your updated action.\"
]
→ 3 updated proposals: [P1', P2', P3']
Call 7–9: Debate round 1 (parallel)
Input context: self.conversation + [
user: \"Other agents proposed:
- Agent 1: P1'
- Agent 2: P2'
- Agent 3: P3'
Given these proposals, output your updated action.\"
]
→ 3 final proposals: [P1'', P2'', P3'']
No more LLM calls. Mechanical vote:
Counter([P1'', P2'', P3''])
→ most common wins
Total: 9 LLM calls. 0 role differentiation. Same prompt every time.
# Decentralized MAS implementation
Hybrid MAS¶
The Hybrid Architecture intertwines Centralised and Decentralised approaches. An overarching Orchestrator first reads the scenario and outputs a directive limit. In response, a pool of concurrent Workers asynchronously generates their specific sub-action proposals. Finally, the Orchestrator aggregates these worker proposals, reviews them against the original directive, and mandates a final, authoritative action.
Hybrid MAS: $A = \{a_{\text{orch}}, a_1, \ldots, a_n\}$, $C =$ star $+$ peer edges, $\Omega =$ hierarchical $+$ lateral.
Combines an orchestrated hierarchy with limited peer communication ($O(rnk + pn)$ where $p$ is the number of peer rounds). This inherits orchestrator control while enabling lateral exchange between agents.
Simulating one round with Hybrid and peer_rounds=1:
Call 1: Orchestrator directive (single)
Input context: self.conversation (same raw observation)
System prompt: SYSTEM_PROMPT + \"[Orchestrator] Analyse the situation and
issue a directive... Do NOT output the final action yet.\"
→ directive: D
Call 2–4: Workers propose (parallel)
Input context: self.conversation + [
model: D ← orchestrator's directive injected
user: \"Worker: propose an action based on the directive.\"
]
System prompt: SYSTEM_PROMPT
Temperature: 0.7
→ 3 proposals: [P1, P2, P3]
Call 5–7: Peer round 0 (parallel)
Input context: self.conversation + [
model: D ← directive STILL in context
user: \"Worker: propose an action based on the directive.\"
user: \"Peer proposals:
- Worker 1: P1
- Worker 2: P2
- Worker 3: P3
Considering your peers' proposals, output your refined action.\"
]
→ 3 refined proposals: [P1', P2', P3']
Call 8: Orchestrator synthesis (single)
Input context: self.conversation + [
model: D
user: \"Worker: propose an action based on the directive.\"
model: \"- Worker 1: P1'
- Worker 2: P2'
- Worker 3: P3'\"
user: \"Orchestrator: workers have exchanged proposals.
Select the single best action.\"
]
→ final_action
# Hybrid MAS implementation