Skip to content
Armen Kasparian edited this page Mar 28, 2024 · 4 revisions

Framework

The framework is designed to be composable and easy to add new agents and environments. We use a registration mechanism to allow users and developers to easily add or use new components. Registered components can be found in <component>/__init.py files throughout the repository. We use the logging package to refine the verbosity level from the jobs. We liberally use asserts in the code therefore we recommend using python -O when running longer jobs.

Drivers

The driver is the main execution code that runs the optimization code. The current continuous driver, drivers/run_continuous.py, take the following parameters:

  • --index: Index for tracking (default=0)
  • --nepisodes: Number of episodes (default=1000)
  • --nsteps: Number of steps (default=-1 --> use environment default)
  • --agent: Agent used for RL (default='KerasTD3-v0')
  • --env: Environment used for RL (default='HalfCheetah-v4')
  • --logdir: Directory to save results (default='None')
  • --bsize: Buffer size (default= In Config)
  • --btype: Buffer type (default= In Config)
  • --inference: Inference only run flag (default=False)

Agents

The agent classes inherits for the core/agent_core.py class which defines the required function to interoperate with the drivers. Default configuration for agents are located in cfgs/<agent_id> Current agents algorithms include:

  • KerasDQN-v0 (discrete actions space)
  • KerasDDPG-v0 (continuous action space)
  • KerasTD3-v0 (continuous action space)

A complete list can be found in: agents/__init__.py

Buffers

The buffer classes inherits for the core/replay_core.py class which defines the required function to interoperate with the drivers. Default configuration for buffers are located in cfgs/<buffer_id> Current replay buffers include:

  • ER-v0 Uniform Random Sampling Experience Replay Buffer
  • PER-v0 Prioritized Experience Replay Buffer

A complete and up-to-date list can be found in: buffers/__init__.py

Environments

The framework allows to use any existing system environment that is based on the gymnasium package. We provide some simple environments to illustrate some research projects, such as:

  • DnC2s-Circle2D-Statefull-v0
  • DnC2s-Circle2D-Stateless-v0

A complete and up-to-date list of custom environments can be found in: envs/__init__.py

Calling baseline gymnasium environments is available and can be done via the command line.

Monitoring

The current code leverage tensorboard to monitor the training process. This currently include key learning parameters in the TD3 agent and the driver. Examples include:

  • Actor loss: the loss values from the critic during training
  • Critic losses: the loss values from the critic during training
  • Actions: the actions from the policy during training (w/ noise)
  • Training Reward: provides the reward based on the policy during training (actions w/ noise)
  • Inference Reward: provides the reward from the policy from inference (actions w/o noise)

After executing the driver, run the following command to see the results from tensorboard:

tensorboard --logdir ./results/

Example application execution

Circle Optimization Problem

python -O drivers/run_continuous.py --env DnC2s-Circle2D-Statefull-v0 --nepisodes 100000

Farama Foundation Pendulum-v1 Control Problem

python -O drivers/run_continuous.py --env Pendulum-v1 --nepisodes 100

Farama Foundation Mujoco Half-Chetah-v4 Control Problem

python -O drivers/run_continuous.py --env HalfCheetah-v4 --nepisodes 1000

Clone this wiki locally