-
Notifications
You must be signed in to change notification settings - Fork 3
Home
The framework is designed to be composable and easy to add new agents and environments.
We use a registration mechanism to allow users and developers to easily add or use new components.
Registered components can be found in <component>/__init.py files throughout the repository.
We use the logging package to refine the verbosity level from the jobs.
We liberally use asserts in the code therefore we recommend using python -O when running longer jobs.
The driver is the main execution code that runs the optimization code.
The current continuous driver, drivers/run_continuous.py, take the following parameters:
-
--index: Index for tracking (default=0) -
--nepisodes: Number of episodes (default=1000) -
--nsteps: Number of steps (default=-1 --> use environment default) -
--agent: Agent used for RL (default='KerasTD3-v0') -
--env: Environment used for RL (default='HalfCheetah-v4') -
--logdir: Directory to save results (default='None') -
--bsize: Buffer size (default= In Config) -
--btype: Buffer type (default= In Config) -
--inference: Inference only run flag (default=False)
The agent classes inherits for the core/agent_core.py class which defines the required function to interoperate with the drivers.
Default configuration for agents are located in cfgs/<agent_id>
Current agents algorithms include:
-
KerasDQN-v0(discrete actions space) -
KerasDDPG-v0(continuous action space) -
KerasTD3-v0(continuous action space)
A complete list can be found in:
agents/__init__.py
The buffer classes inherits for the core/replay_core.py class which defines the required function to interoperate with the drivers.
Default configuration for buffers are located in cfgs/<buffer_id>
Current replay buffers include:
-
ER-v0Uniform Random Sampling Experience Replay Buffer -
PER-v0Prioritized Experience Replay Buffer
A complete and up-to-date list can be found in:
buffers/__init__.py
The framework allows to use any existing system environment that is based on the gymnasium package. We provide some simple environments to illustrate some research projects, such as:
DnC2s-Circle2D-Statefull-v0DnC2s-Circle2D-Stateless-v0
A complete and up-to-date list of custom environments can be found in:
envs/__init__.py
Calling baseline gymnasium environments is available and can be done via the command line.
The current code leverage tensorboard to monitor the training process. This currently include key learning parameters in the TD3 agent and the driver. Examples include:
-
Actor loss: the loss values from the critic during training -
Critic losses: the loss values from the critic during training -
Actions: the actions from the policy during training (w/ noise) -
Training Reward: provides the reward based on the policy during training (actions w/ noise) -
Inference Reward: provides the reward from the policy from inference (actions w/o noise)
After executing the driver, run the following command to see the results from tensorboard:
tensorboard --logdir ./results/
python -O drivers/run_continuous.py --env DnC2s-Circle2D-Statefull-v0 --nepisodes 100000
python -O drivers/run_continuous.py --env Pendulum-v1 --nepisodes 100
python -O drivers/run_continuous.py --env HalfCheetah-v4 --nepisodes 1000