How to use Arena

Whether you are developing models for delivery route optimisation or energy distribution, Arena enables you to build and deploy faster than ever before and at a fraction of the cost.

How can I upload and validate a custom environment?
To upload a custom environment, first navigate to the Custom Environments section on the sidebar and select the New Custom Env button or, if you are creating your first custom environment, you can also select the Create Custom Environment button.

This will bring up a pop up window where you will be able to input the environment name and description.

Once saved, a tile will be created for your environment. You can now navigate into the file upload section by clicking the View button on the tile.

You can upload individual environment files or a folder containing the environment files by selecting the Upload button and choosing either the Upload files button or the Upload a folder button.

Note: When uploading a new environment, you must upload the custom environment file first and then define the path to your main environment class in the Environment class path input field. The format used should be module.sub_module:CustomEnvironment.
For example, if you upload a file named pusher_v5.py containing your environment class PusherEnv, the environment class path will be pusher_v5:PusherEnv.

Note: If your environment is more complex than a single file, you can upload a folder containing nested folders and scripts. At the highest level within the directory, a requirements.txt file should be included to specify dependencies, and a config.yaml file should also be included within a configs directory to outline any arguments that the environment will require.

Once you have uploaded your environment, select Commit Version to kickstart a series of environment validation checks. This checks that your environment conforms to the Gymnasium API and is therefore compatible with our algorithms.

If you want to make changes after uploading your file/s, you can select any file to bring up the code. Once you have made changes, click on the (x) icon at the top right hand corner of the code box to save the changes.

If you make changes to any of the files after committing a previous version, simply click the Commit Version button. This will create a new version of your environment and kick off a new round of environment validation checks. Whilst the validation is happening, pending will be displayed next to the environment version in the Versions dropdown.

Upon completion of the environment checks, the status will change from pending to either passed or failed depending on whether the checks were successful. You can view the results of the checks by selecting the version from the Versions dropdown and selecting the Validation Results tab.

Once the environment has been successfully validated, you can select it as an environment in the Environment section of the training workflow (follow the "How do I train an agent?" FAQ which explains walks through the training workflow). When you navigate to this section you can locate your custom environment by searching for it or looking through the Environments table.

After selecting the environment, you will be shown a summary of the environment, the validation checks, and a graph showing the reward distribution of a random agent for your reference as well as a rendered gif of your environment if your environment has a render function and an rgb_array render mode.
What format should my custom environment be in?
Custom environments must be implemented in Python and they must follow the Gymnasium API. Uploaded environments will be validated to ensure they follow the Gymnasium API and you will not be able to commence training until these checks are all completed successfully.

If uploaded environments use a render method, the environment will only be rendered during validation if an rgb_array render mode is provided.

Custom environment packages should be uploaded as a folder, containing all necessary files. At the highest level within the directory, a requirements.txt file should be included to specify dependencies, and a config.yaml file within a configs directory to outline any arguments that the environment will require. You can also upload a single script if your environment is contained entirely within this script.
How do I train an agent?
Start by creating a group, then set up a project within that group. Inside the project, you can create an experiment, where you can train an agent. This structure helps you keep your work organised.

Note:
Group: A Group is the top-level organization unit. It helps you manage and categorize multiple projects, allowing teams to collaborate and share resources.
Project: Within a Group, you create Projects. Each Project is a workspace where you can manage related experiments and keep all the work for a specific goal or task together.
Experiment: An Experiment is part of a Project and is where you define and run your tests. It's the environment where you configure the settings, parameters, and data to train an agent.
Agent: The Agent is what you train in an Experiment. It's the entity that learns and improves based on the data and parameters you've set in the Experiment.

To start, you will need to create a group by selecting the Groups tab on the sidebar and then selecting the New Group button.


Once you have created a group, select the View button on the new tile which will take you to a new page allowing you to create projects for your experiments.

Select the New Project or the Create Project button if you are creating the first project within the group. A pop up will appear allowing you to enter a project name and description.


Once a project has been created, you can now host multiple experiments within this project. To create an experiment in your project, select the New Experiment button on the project page. A pop up will appear allowing you to enter an experiment name and description.

Once an experiment is created, you will be taken into the experiment creation flow. To train an agent you will have to choose an environment. This can either be one of the many popular environments available on the platform or a custom environment which has been previously uploaded.

Once you have selected an environment, the validation results will be displayed. The Environment Summary table contains a summary of the operational parameters of the selected environment as well as its known category (if not custom) and version saved. The Environment Test Results from the environment validation are displayed in the top right quadrant of the page. Finally, a random agent rollout of the environment is displayed with the scores plotted over 100 episodes and the environment rendering if available.

Next, you will be prompted to select an algorithm that is compatible with your chosen environment. DQN, DQN Rainbow and PPO are algorithms used to train discrete-action type agents. DDPG, TD3 and PPO are algorithms used to train continuous-action type agents. If you need to learn about any of the given parameters or specifications, you can hover over the information icons provided. The AgileRL framework documentation also offers a great overview of SOTA reinforcement learning algorithms and techniques along with references.

MLP (linear layers) or CNN (convolutional layers stacked with linear layers) are loaded based on whether the environment has vector or image observations respectively. Those architectures can be modified by adding or deleting layers directly on the page.

After completing the Algorithm section, you will need to click the Next button to proceed to the Training setup. All experiment configurations are pre-populated with template values. The various training sections include environment vectorization (that each agent in the population trains on separately) and the replay buffer and its parameters for off-policy algorithms.

AgileRL Arena uses evolutionary population training with agent mutations. A tournament selection occurs at the end of each successive epoch upon which the fittest agents are preserved, cloned and mutated according to a Tournament Selection method. This uses a random subset of the population to select the next fittest agent, as well as elitism (whether to keep the overall fittest agent). Those selected agents are then mutated according to random probabilities that the user can set manually. The mutations are applied to evolve the neural network architectures and algorithm learning hyperparameters.

When you are done with the Training section, select the Next button, which will take you to the Resources page, which will display a summary of the experiment settings. On this page, you will be able to select the size of the experiments compute node.

Note: For the Beta, all compute nodes are set to Medium and cannot be changed.

You can either start training immediately by clicking on the Train button on the top-right corner shown in the figure above, or simply saving the experiment and submitting a training job later. Once you click Train or Save, you will be redirected to the Experiments page which will show the new experiment as well as previous experiments.
How can I monitor an experiment?

Visualising experiments and experiment controls
‍‍
Running and Completed experiments can be viewed on the Results tab. You may apply further controls to your experiments (as shown below): Halt experiment and Edit mutation parameters during an experiment, and View logs (during and after an experiment is completed), the latter of which may be helpful in making decisions for a live experiment.

Experiments currently in training will be listed as Running, and you will be able to visualise their performance results in real-time. You can set the data for many experiments as either exposed or hidden using the eye-shaped icon (which will be shown as whole or crossed, respectively). If you think that an experiment has ended early (results converged), or that mutation probabilities and ranges need to change for the better, you may click on the further actions (three dots) icon at the end of each row in the Running table. This will give you the ability to Halt or Edit the experiment.

Updating Hyperparameter Optimisation
‍‍
Over the course of an experiment, you may want to adjust the mutations to boost the learning of the agents. You can do this in two ways: adjusting the probability of specific mutations, and adjusting the range of the target changes, such as the learning step. To do this, select the Edit mutation parameters button in the experiment menu. A popup will appear which will allow you to update the mutation parameters in real time.

Note: You can also disable the mutation process for the agents by setting probabilities to 0 for all mutation types except None.

Looking at the logs
‍‍‍
In order to match a particular mutation and agent selection to a performance change in one of the visualisation plots, you may filter the logs by time range via the Run Query button on the top right along with the start/end time and dates selection tool. Then you can inspect the fitness evaluations for each agent and judge the quality of the current hyperparameters selection. You may also directly search for a particular keyword or substring in the logs.

Resuming a stopped experiment
‍‍
An experiment which has run its course (succeeded) or has been halted can be resumed from the Experiments page. It will run again for the same number of steps it has been initially set with upon its creation. In other words, an experiment of 1 million steps will always be scheduled to run for that many steps every time it is run.

How can I analyse the performance of my agent?
Once training has been started, population performance can be tracked via the Results page. Performance metrics of an experiment will be updated periodically, until the experiment completes (and the visualisations reach their final state).
How to visualise results of an experiment
One or more experiments may be selected for visualisation. When more than one experiment is selected for visualisation, it is possible to directly compare the performance of distinct algorithms (or environments) simultaneously, on the same plots. A visualization will load the data from an experiment run. The eye-shaped icon is used to show or hide this data from the plots of an experiment.

Note: Plots only appear if an experiment has already produced data to show. Experiments which have just been created will not immediately have the data.

Visualising a single run
Visualising your experiment as a single run alone yields insight into the evolutionary HPO. To do so, select or expose only one of the experiments. This will reveal metrics for each of the agents in the population. Those are: the score (i.e. taken from the active training or learning episodes), fitness (evaluated separately without the training), learning algorithm losses, episode time (calculated from the evaluations) and finally the number of steps through time (to compare the speed of different agents).

In any of the plots, clicking on one of the legends allows you to hide or display the entity in the plot (in the case of this single experiment inspection: each agent in the population).

Comparing two or more runs
When comparing two or more runs, the visualisations switch from the agent-level to population-level. Now, for each experiment, you’ll be able to see the averaged score across agents within the population, the best fitness, the average learning algorithm losses, the average episode time, and the number of steps taken.

Experiment summaries and comparison
There will be a table beneath the performance plots, which can be expanded for each domain (e.g. Environment, Algorithm…), to display specification summaries of the selected visualisation experiments side-by-side.

A concise summary can be extracted by filtering the display to show only the differences across experiments, which can be useful for determining the best training configuration.
How can I deploy and use my trained agent?
Deployment is done from the Experiments page. This can be done by selecting the Deploy button on the row of the experiment you wish to deploy. A pop up will then appear from which you can pick the checkpoint you wish to deploy. By default, the latest checkpoint is selected.

After you have selected Confirm, the agent will be added to the Deployments page permanently. There, it can be activated and de-activated via the ‘Enable’ and ‘Disable’ button on its corresponding row.

The Deployments page comes with a default API (script) that you should use for querying deployed agents. A connection is established to a deployed agent via its URL. An API key token is also provided for each experiment and this will be passed as a token to the API for authentication (with OAuth2). A query must be supplied with a batch of (single or several) environment states that the agent can return actions for (with the generic objective of stepping through the corresponding environments) while the agent continues to be deployed natively in AgileRL arena. The expected http response is 200 / success returned along with the action value(s) and an acknowledgement as the input batch size. Otherwise, the error code shall be returned along with a JSON response if available.