Running an Algorithm

All algorithms of CREW are under CREW/crew-algorithms/crew_algorithms/. We provide implementation of state of the art human-guided RL algorithms and baseline RL algorithms. To run launch a training session, first configure the training settings, including environment, number of AI agents and hyperparameters. Then by following the commands in this page, python will automatically start a unity server instance and initialize AI agents.

Configure training files

To make things easy, we use Hydra for configurating training sessions. The default configuration for environments are defined in CREW/crew-algorithms/crew_algorithms/envs/configs.py, where you will find game file path, number of agents and other environment specific parameters.

algo1

The default configuration for algorithm specific parameters are under CREW/crew-algorithms/crew_algorithms/conf/.

algo2

Run specific parameters can be configured at the top of each algorithms' __main__.py file, under the Config class.

algo3

To enable wandb logging, modify the entity and project arguments in WandbConfig. To customize the data to be logged, follow their documentation. You can disable wandb logging by adding WANDB_MODE=disabled before the python command.

Launch a training session

First enter the crew-algorithms directory:

cd crew-algorithms

Then simply run the command python crew-algorithms/crew_algorithms/{ALGORITHM}, where {ALGORITHM} is the chosen algorithm. For instance:

python crew-algorithms/crew_algorithms/ddpg

You can always overwrite the default configurations by adding arguments in the command line:

python crew_algorithms/ddpg envs=hide_and_seek_1v1 collector.frames_per_batch=16 batch_size=16

Controlling training time

The decision frequency of all CREW games are set to 2Hz. Parameter collector.frames_per_batch determines how many time steps of data is collected every batch. Parameter train_batches determines how many batches of environment expericence to train on. So the total training time of a session can be contolled by:

0.5s * collector.frames_per_batch * train_batches