Single-agent and Multi-agent Support
On-policy, off-policy, offline, multi-agent and contextual multi-armed bandit reinforcement learning with unmatched speed and state-of-the-art performance
Evolutionary Hyperparameter Optimization
Achieve automatic convergence on optimal performance in a single training run through hyperparameter and neural network evolution
Distributed Training
Train even faster by taking full advantage of your entire compute stack for online and offline reinforcement learning with multi-GPU support
Hierarchical Skills
Teach agents to solve complex problems by breaking down tasks into smaller, learnable sub-tasks with the AgileRLÂ Skills wrapper
Lorem ipsum
Sign up for early access to Arena