Analyzing Collected Data

This tutorial explains how to analyze the performance data collected from running different algorithms in the wildfire environment. We'll cover how to generate performance metrics, visualize results, and calculate Behavioral Competency Scores (BCS).

Prerequisites

You have run some algorithms in the wildfire environment and collected data in the results/logs directory
You have Python installed with pandas, numpy, and matplotlib
You have the analysis scripts from the data directory

Directory Structure

The analysis scripts expect your data to be organized in the following structure:

results/logs/
  algorithm_name/
    level_name/
      seed/
        timestamp/
          action_reward.csv

Available Analysis Tools

1. Performance Analysis (`data_analysis.py`)

This script generates various plots and statistics about algorithm performance:

Line plots showing performance metrics over time
Bar plots comparing final scores across algorithms
Run length comparisons
Rate metrics (per timestep) analysis
Rate vs number of agents analysis
CSV summaries of final scores and rates

To run the analysis:

python data_analysis.py

The script will create a plots directory containing: - Line plots: {level}_{metric}.png - Final-value bars: {level}_final_{metric}_bar.png - Rate bars: {level}_rate_{metric}_bar.png - Run-length bars: {level}_run_length_bar.png - Rate vs Agents plots: {metric}_vs_agents.png - CSV summaries: final_score_stats.csv and rate_stats.csv

2. Behavioral Competency Score (BCS) Analysis (`bcs.py`)

The BCS analysis evaluates algorithms based on their performance across different behavioral goals:

Task Designation (TD)
Adaptive Coordination (AC)
Search and Rescue (SR)
Open-ended Suppression (OS)
Real-time Communication (RC)
Planning and Allocation (PA)

To run the BCS analysis:

python bcs.py

This will generate: - bcs_stats.csv: Detailed BCS scores for each algorithm and behavioral goal - bcs_radar.png: A radar chart visualizing the BCS scores

Understanding Behavioral Competency Score (BCS)

The Behavioral Competency Score (BCS) is a metric designed to evaluate how well algorithms perform across different high-level behavioral goals in the wildfire environment. It provides a standardized way to compare algorithms' capabilities in specific areas of multi-agent coordination.

What is BCS?

BCS is a normalized score (ranging from 0 to 1) that measures an algorithm's competency in specific behavioral goals. Each level in the environment is associated with one or more behavioral goals, such as Task Designation or Real-time Communication. The BCS aggregates performance across all levels that test a particular behavioral goal.

How is BCS Calculated?

BCS calculation involves two main steps:

1. Level Normalization

For each level, we calculate a normalized score by comparing the algorithm's performance against:

Baseline (B): The worst possible score (usually from DO-NOTHING algorithm)
Target (T): The best possible score for that level

The normalization differs for two types of tasks:

Finite (reward) tasks (e.g., Cut Trees):

\[ \text{NS}_{a,\ell} = \frac{s_{a,\ell} - B_\ell}{T_\ell - B_\ell} \]

Open-ended (penalty) tasks (e.g., Suppress Fire):

\[ \text{NS}_{a,\ell} = \frac{\log\left(1 + \frac{s_{a,\ell} - B_\ell}{T_\ell - B_\ell}\right)}{\log(2)} \]

where:

\(s_{a,\ell}\) is the raw score of algorithm \(a\) on level \(\ell\)
\(B_\ell\) is the baseline score for level \(\ell\)
\(T_\ell\) is the target score for level \(\ell\)

Both formulas map the baseline to 0 and target to 1, but the logarithmic form for open-ended tasks better represents small improvements.

2. Behavioral Aggregation

For each behavioral goal \(g\), we calculate the BCS by averaging the normalized scores across all levels that test that goal:

\[ \text{BCS}_{a,g} = \frac{1}{|\mathcal{T}_g|} \sum_{\ell \in \mathcal{T}_g} \text{NS}_{a,\ell} \]

where:

\(\mathcal{T}_g\) is the set of levels associated with behavioral goal \(g\)
\(|\mathcal{T}_g|\) is the number of levels in that set
\(\text{NS}_{a,\ell}\) is the normalized score for algorithm \(a\) on level \(\ell\)

Why is BCS Important?

Standardized Comparison: BCS provides a scale-free way to compare algorithms across different tasks and behavioral goals.
Behavioral Insight: Instead of just looking at raw scores, BCS helps understand which high-level behaviors an algorithm is good at or struggles with.
Future Research: The normalized nature of BCS makes it useful for comparing algorithms across different studies and environments.
Algorithm Development: BCS can guide the development of new algorithms by highlighting which behavioral competencies need improvement.

Baseline Algorithm BCS Scores

Here are the BCS scores for the baseline algorithms:

Behavior	CAMON	COELA	Embodied	HMAS-2
Task Designation (TD)	0.39	0.29	0.37	0.38
Agent Capitalization (AC)	0.50	0.41	0.46	0.45
Spatial Reasoning (SR)	0.29	0.16	0.29	0.28
Observation Sharing (OS)	0.11	0.11	0.15	0.16
Realtime Coordination (RC)	0.49	0.36	0.43	0.41
Plan Adaptation (PA)	0.23	0.14	0.22	0.22

Important Notes

The analysis scripts assume specific level names and agent counts - make sure your data follows the expected format
For rate vs agents analysis, certain levels are excluded due to variable agent counts
BCS analysis requires running the DO-NOTHING algorithm on the same levels and seeds you want to test

Levels Documentation - Details about available levels and their requirements
Running Algorithms - How to run different algorithms in the environment