CREW Wildfire: A Benchmark for Large-Scale Multi-Agent Agentic AI

teaser

Introduction

CREW Wildfire is an open-source benchmark designed to evaluate and advance large language model (LLM)-based multi-agent systems in complex, dynamic, real-world tasks. Built atop the human-AI teaming CREW simulation platform, CREW Wildfire offers procedurally generated wildfire response scenarios that benchmark the limits of current multi-agent Agentic AI frameworks.

Video (click to play on YouTube)

Components

CREW Wildfire consists of two main subcomponents that work together to create a comprehensive multi-agent simulation and benchmarking platform:

Environment is built on the Crew Dojo framework, which uses Unity to provide the core simulation infrastructure. Environment provides the infrastructure for procedurally generating large and complex wildfire scenarios as well as the framework for simulating teams of heterogeneous agents.

Algorithms, on the other hand, is built on the Crew Algorithms package to operate the Environment built on Crew Dojo. Algorithms provides a modular framework for the rapid development of Multi-LLM-Agent algorithms as well as the execution of several state-of-the-art Multi-LLM algorithms.

Along with a premade benchmarking suite, these two subcomponents create a robust and user-friendly platform for the advancement of scalable multi-agent Agentic intelligence in complex real-world scenarios.

Getting Started

Check out our Quick Installation guide
To get started, check out Getting Started for basic examples.
Check out Tutorials for advanced usage.

Paper

Codebase

Authors

CREW is a fully open-sourced project developed by General Robotics Lab at Duke University.