Analyzing Collected Data
This tutorial explains how to analyze the performance data collected from running different algorithms in the wildfire environment. We'll cover how to generate performance metrics, visualize results, and calculate Behavioral Competency Scores (BCS).
Prerequisites
- You have run some algorithms in the wildfire environment and collected data in the
results/logsdirectory - You have Python installed with pandas, numpy, and matplotlib
- You have the analysis scripts from the
datadirectory
Directory Structure
The analysis scripts expect your data to be organized in the following structure:
Available Analysis Tools
1. Performance Analysis (data_analysis.py)
This script generates various plots and statistics about algorithm performance:
- Line plots showing performance metrics over time
- Bar plots comparing final scores across algorithms
- Run length comparisons
- Rate metrics (per timestep) analysis
- Rate vs number of agents analysis
- CSV summaries of final scores and rates
To run the analysis:
The script will create a plots directory containing:
- Line plots: {level}_{metric}.png
- Final-value bars: {level}_final_{metric}_bar.png
- Rate bars: {level}_rate_{metric}_bar.png
- Run-length bars: {level}_run_length_bar.png
- Rate vs Agents plots: {metric}_vs_agents.png
- CSV summaries: final_score_stats.csv and rate_stats.csv
2. Behavioral Competency Score (BCS) Analysis (bcs.py)
The BCS analysis evaluates algorithms based on their performance across different behavioral goals:
- Task Designation (TD)
- Adaptive Coordination (AC)
- Search and Rescue (SR)
- Open-ended Suppression (OS)
- Real-time Communication (RC)
- Planning and Allocation (PA)
To run the BCS analysis:
This will generate:
- bcs_stats.csv: Detailed BCS scores for each algorithm and behavioral goal
- bcs_radar.png: A radar chart visualizing the BCS scores
Understanding Behavioral Competency Score (BCS)
The Behavioral Competency Score (BCS) is a metric designed to evaluate how well algorithms perform across different high-level behavioral goals in the wildfire environment. It provides a standardized way to compare algorithms' capabilities in specific areas of multi-agent coordination.
What is BCS?
BCS is a normalized score (ranging from 0 to 1) that measures an algorithm's competency in specific behavioral goals. Each level in the environment is associated with one or more behavioral goals, such as Task Designation or Real-time Communication. The BCS aggregates performance across all levels that test a particular behavioral goal.
How is BCS Calculated?
BCS calculation involves two main steps:
1. Level Normalization
For each level, we calculate a normalized score by comparing the algorithm's performance against:
- Baseline (B): The worst possible score (usually from DO-NOTHING algorithm)
- Target (T): The best possible score for that level
The normalization differs for two types of tasks:
Finite (reward) tasks (e.g., Cut Trees):
Open-ended (penalty) tasks (e.g., Suppress Fire):
where:
- \(s_{a,\ell}\) is the raw score of algorithm \(a\) on level \(\ell\)
- \(B_\ell\) is the baseline score for level \(\ell\)
- \(T_\ell\) is the target score for level \(\ell\)
Both formulas map the baseline to 0 and target to 1, but the logarithmic form for open-ended tasks better represents small improvements.
2. Behavioral Aggregation
For each behavioral goal \(g\), we calculate the BCS by averaging the normalized scores across all levels that test that goal:
where:
- \(\mathcal{T}_g\) is the set of levels associated with behavioral goal \(g\)
- \(|\mathcal{T}_g|\) is the number of levels in that set
- \(\text{NS}_{a,\ell}\) is the normalized score for algorithm \(a\) on level \(\ell\)
Why is BCS Important?
-
Standardized Comparison: BCS provides a scale-free way to compare algorithms across different tasks and behavioral goals.
-
Behavioral Insight: Instead of just looking at raw scores, BCS helps understand which high-level behaviors an algorithm is good at or struggles with.
-
Future Research: The normalized nature of BCS makes it useful for comparing algorithms across different studies and environments.
-
Algorithm Development: BCS can guide the development of new algorithms by highlighting which behavioral competencies need improvement.
Baseline Algorithm BCS Scores
Here are the BCS scores for the baseline algorithms:
| Behavior | CAMON | COELA | Embodied | HMAS-2 |
|---|---|---|---|---|
| Task Designation (TD) | 0.39 | 0.29 | 0.37 | 0.38 |
| Agent Capitalization (AC) | 0.50 | 0.41 | 0.46 | 0.45 |
| Spatial Reasoning (SR) | 0.29 | 0.16 | 0.29 | 0.28 |
| Observation Sharing (OS) | 0.11 | 0.11 | 0.15 | 0.16 |
| Realtime Coordination (RC) | 0.49 | 0.36 | 0.43 | 0.41 |
| Plan Adaptation (PA) | 0.23 | 0.14 | 0.22 | 0.22 |
Important Notes
- The analysis scripts assume specific level names and agent counts - make sure your data follows the expected format
- For rate vs agents analysis, certain levels are excluded due to variable agent counts
- BCS analysis requires running the DO-NOTHING algorithm on the same levels and seeds you want to test
Related Resources
- Levels Documentation - Details about available levels and their requirements
- Running Algorithms - How to run different algorithms in the environment