Train an Reinforcement Learning Baseline
RL algorithms are common baselines for measuring AI Agent performance in Human-AI teaming research. Once CREW is installed and the crew
conda environment is activated, simply navigate under crew-algorithms and run:
python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=240 batch_size=240 train_batches=30
The script will first open and close a dummy environment to read the environment specs for creating an agent. Then it will launch a server instance of the 1 v 1 Hide and Seek game and train an DDPG algorithm on the seeker agent.
By default, CREW uses wandb to handle experiment logging. If you have not used wandb before, you can create an account according to their prompt after running the script. You can also disable wandb logging by adding WANDB_MODE=disabled
before the python command:
WANDB_MODE=disabled python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=240 batch_size=240 train_batches=30
The above script will train the agent for 60 minutes. You should be able to observe decent performance by 20-30 minutes.crew
By default, the model weights will be saved after every training batch to CREW/Data/00/ddpg/<exp name>
, where <exp name>
is an automatically generated experiment name containing the time, environment, algorithm and seed information of the experiment. To evaluate and visualize trained models, use crew_algorithms/ddpg/eval.py
, and add arguments exp_path
and eval_weights
to chose the experiment and weights to evaluate and visualize. exp_path
starts with the subject ID, which is 00
by default, and eval_weights
is a list of integers of the weights you wish to evaluate. For instance: