In the highly competitive world of financial AI, where milliseconds can mean millions and model efficiency directly impacts profitability, selecting the right reinforcement learning framework can fundamentally transform a company's ability to deliver superior trading performance. For Warburg AI, a fintech company specialising in modular, self-improving financial prediction models, the adoption of AgileRL marked a breakthrough in training efficiency, cost optimisation, and development velocity.
"AgileRL really gave us the tools we needed to build upon that.
The performance is way higher, we're scaling way less horizontally, experiments that would take five to six days with RLlib can take 24 to 48 hours with AgileRL."
- Lancelot de Briey, Founder & Engineering Lead at Warburg AI
Pioneering Self-Improving Financial Models
Warburg AI develops sophisticated asset management solutions powered by self-improving AI technology that predicts market trends with high accuracy. Their approach combines advanced machine learning and reinforcement learning techniques with deep networks and selective memory systems, creating models that adapt continuously to changing market conditions.
The company's technology stack is built around several key innovations:
- Modular Architecture: Each AI model is designed to be highly customisable and tailored to specific requirements depending on the experiment.
- Continuous Learning: Models update daily through continuous learning mechanisms, constantly improving performance based on real-time market feedback and further research.
- Proprietary algorithms: WarburgAI creates its RL algorithms in-house and builds upon them by adding modules to increase the odds of specific behaviours or exploration.
- World-class simulators & WarburgAI DB: Extremely scalable, low-latency simulators were developed to support >1M learned steps per second, along with a database capable of feeding fresh data in microseconds, aggregated from >2 petabytes of partner data.
Breaking Through Infrastructure Limitations
Before implementing AgileRL, Warburg AI's reinforcement learning infrastructure relied heavily on RLlib, which presented several challenges that limited development velocity and cost efficiency:
- CPU-GPU Inefficiency: RLlib’s architecture often pushed the team toward CPU-heavy scaling because GPU utilisation was suboptimal. Eight to nine CPU-bound operations were still required even on GPU machines, creating bottlenecks as tensors bounced between CPU and accelerators.
- Training Latency: Each learning step took 45–60 seconds with RLlib, slowing experimentation cycles and model development.
- Cost Structure: Inefficient resource utilisation combined with the need for horizontal scaling resulted in operational costs of $50–70 per hour during intensive training.
- Limited Framework Control: RLlib’s higher-level abstractions made it difficult to access and modify core framework functionality, which constrained the ability to implement custom optimizations.
These limitations were particularly problematic for a company operating in fast-moving financial markets, where rapid iteration is crucial.
Transformative Implementation with AgileRL
The decision to implement AgileRL represented a strategic shift toward a more efficient and controllable reinforcement learning infrastructure. Several immediate advantages stood out:
- Custom Integration Approach: Warburg AI built custom distributed systems on top of AgileRL's flexible architecture, enabling sophisticated financial modeling while keeping full control of the training process.
- Multiple Algorithm Support: AgileRL’s library allows implementation of multiple PPO variants (Recurrent PPO, intrinsic curiosity modules, conditional value at risk, etc.).
- Rust Engine Optimisation: The team optimised their Rust simulators and data prep pipelines specifically for AgileRL, moving multi-step data transformations from Python to Rust and removing bottlenecks.
- GPU-First Architecture: AgileRL’s design enabled efficient GPU utilisation, shifting workloads from CPU-heavy to GPU-optimised processes.
Dramatic Performance and Cost Improvements
The impact of AgileRL was immediate and substantial:
Training Speed Acceleration
- Learning step time dropped from 45–60 seconds to 17 seconds (≈70% faster).
- Experiments that took 5–6 days with RLlib now finish in 24–48 hours.
- Faster cycles allow quicker iteration on trading strategies.
Cost Optimization
- Operational costs dropped from $50–70/hr to ~$24/hr (50–60% lower).
- Transition to GPU-first eliminated the need for >400 Graviton CPU cores.
- Existing GPU infra (AWS A100 at $12/hr) became fully utilised.
Operational Excellence
- Developer productivity improved, supported by faster problem resolution from AgileRL’s team.
- Framework flexibility allowed custom optimisations not possible before.
- Built-in hyperparameter optimization proved powerful, with room for even larger-scale population searches.
70%
Faster training
60%
Cost reduction
vs. existing baselines
Future-Ready Financial AI Infrastructure
Warburg AI’s move to AgileRL demonstrates how the right framework can transform the economics and capabilities of financial AI. The combination of improved efficiency, reduced costs, and enhanced developer productivity positions the company to scale while maintaining a technological edge.
Key lessons for financial AI teams:
- Infrastructure efficiency matters: framework choice is a strategic business decision, not just technical.
- GPU optimization is critical for modern financial workloads.
- Developer experience impacts innovation: responsive support and flexible design accelerate progress.
- Cost control enables scale, as shown by Warburg AI’s 50% training cost reduction.
As Warburg AI expands its strategies, AgileRL provides the foundation for sustained innovation - delivering performance while staying efficient.
If you would like to take part in a case-study with AgileRL, please reach out on LinkedIn.