Skip to content

Train an Reinforcement Learning Baseline

RL algorithms are common baselines for measuring AI Agent performance in Human-AI teaming research. Once CREW is installed and the crew conda environment is activated, simply navigate under crew-algorithms and run:

python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=240 batch_size=240 train_batches=30

The script will first open and close a dummy environment to read the environment specs for creating an agent. Then it will launch a server instance of the 1 v 1 Hide and Seek game and train an DDPG algorithm on the seeker agent.

By default, CREW uses wandb to handle experiment logging. If you have not used wandb before, you can create an account according to their prompt after running the script. You can also disable wandb logging by adding WANDB_MODE=disabled before the python command:

WANDB_MODE=disabled python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=240 batch_size=240 train_batches=30

The above script will train the agent for 60 minutes. You should be able to observe decent performance by 20-30 minutes.crew

By default, the model weights will be saved after every training batch to CREW/Data/00/ddpg/<exp name>, where <exp name> is an automatically generated experiment name containing the time, environment, algorithm and seed information of the experiment. To evaluate and visualize trained models, use crew_algorithms/ddpg/eval.py, and add arguments exp_path and eval_weights to chose the experiment and weights to evaluate and visualize. exp_path starts with the subject ID, which is 00 by default, and eval_weights is a list of integers of the weights you wish to evaluate. For instance:

python crew_algorithms/ddpg/eval.py envs=hide_and_seek_1v1 exp_path='00/ddpg/0722_1419_hide_and_seek_1v1__ddpg_seed_42_' eval_weights=[10]