HMAS-2: Hybrid Multi-agent System - v2
HMAS-2 implements a hierarchical multi-agent system with a central planner and iterative plan refinement through agent feedback. The algorithm emphasizes collaborative plan validation before execution. Link to Paper
Main Algorithm Pseudocode

In our HMAS-2 Implementation, a leader agent is initially chosen as shown in line 1. Then, for each timestep, each agent parses their observations (\(o_a\)) from the environment states, and then generates a text perception (\(p_a\)) given those observations through the \perception module (\ref{perception_prompt}). This perception (\(p_a\)) is then appended to the \(\texttt{GlobalState}\).
Then, the leader agent generates a plan for the team through Action Proposal given the team's task, the Global State, the Step History, and reviews from other agents in line 13. This will provide the leader agent with all agents' current perceptions and past perceptions and actions.
Then each agent will review the plan and provide feedback through Plan Review in line 16. If the feedback is to reject the plan, the feedback will be appended to \(\texttt{Review}\) and \(\texttt{valid}\) will be set to \(\texttt{False}\). This will cause lines 12-19 to be repeated, reiterating the plan.
When the plan does not have any rejection, the finalized plan will be parsed into tasks for each agent and translated into a standardized action format. This action (\(\texttt{action}_a\)) will be each agent's active action.
Now, since all agents have an active action, each agent will generate an executable vector (\(e_a\)) through our Execution Module in line 23. If the current action (\(\texttt{action}_a\)) is complete or single-step, it will then be set to \(\texttt{None}\) and appended to the agent's action history.
Lastly, all executable tensors will be combined and executed within the Wildfire Environment in line 28. If the max score has been reached, the algorithm ends.
Key Methods
Action Proposal
Enables the central planner to generate coordinated action plans for all agents. Maintains conversation history for iterative refinement.
def propose_actions(global_data: dict, past_conversation: list) -> tuple:
"""
Generates action proposals for all agents:
1. Considers global state and history
2. Handles initial and revision rounds
3. Ensures complete team coverage
4. Validates action feasibility
"""
Plan Review
Allows agents to evaluate and provide feedback on proposed plans. Ensures plans are feasible and aligned with agent capabilities.
def provide_feedback(agent: Agent, global_data: dict, proposed_actions: dict) -> str:
"""
Reviews proposed action plans:
1. Evaluates plan feasibility
2. Considers agent capabilities
3. Provides constructive feedback
4. Accepts or rejects proposals
"""
Communication Protocol
Message Format:
For proposals:
<reasoning>Plan rationale</reasoning>
<AGENT>Action description</AGENT>
For feedback:
<reasoning>Review rationale</reasoning>
<feedback>ACCEPT or rejection reason</feedback>
Action Execution
Manages the translation of high-level action descriptions into executable commands and handles their execution in the environment.
-
Action Generation: Converts natural language action descriptions into structured action objects that can be executed by the environment.
-
Execution Flow: Maps agent types to their specific action libraries and handles execution errors. Each agent type has specialized actions based on their capabilities.
Decision Flow
-
Plan Generation:
-
Feedback Collection: