Sijun Tan, Michael Luo, Colin Cai*, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, Manan Roongta, Chenguang Wang, Li Erran Li, Raluca Ada Popa, Ion Stoica**

*****: Project Leads

<aside> 🔥

TL;DR

We release rLLM-v0.1, a scalable framework for post-training language agents with reinforcement learning. rLLM enables users to easily build their custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.

With rLLM, we build and train DeepSWE, our SOTA software engineering agent that achieves 42.2% Pass@1 (59.0% with test-time scaling) ****on SWEBench-Verified.

We are committed to maintaining and growing rLLM. Join our community to build agents that “learn from experience”!

👨‍💻 Github, 📖 Docs 🌐 Website, :9738-discord-ico:Discord

</aside>

1. From Language Reasoners to Language Agents

The first half of 2025 has been defined by a surge of interest in reinforcement learning (RL) post-training for reasoning models. The release of DeepSeek-R1 ignited a wave of innovation, motivating many industry labs and academic research groups to train their own math and coding reasoners.

Riding this wave, we at Agentica introduced two fully open-source models:

**DeepScaleR** [1], a 1.5B model that outperforms o1-preview on competition-level math problems
**DeepCoder** [2], ****a ****14B model that matches o3-mini on coding competitions.

We open-sourced everything: models, training recipes, datasets, codebase, and training logs. Our previous effort mark a major milestone towards democratizing RL training for reasoning models.

But reasoning model is just the beginning.

At Agentica, our mission is to democratize RL post-training for general-purpose language agents. Reasoning models are essentially the simplest form of language agents — single-step, domain-specific solvers interacting with static environments like math exams or Codeforce problems. But real-world agents must go further: they must reason, act, and interact dynamically with complex, often uncertain environments.

This is where language reasoners evolve into agents.

The second half of 2025 marks our shift into the language agents era. Our first milestone towards this transition includes two major releases:

rLLM, a framework that enables users to easily build their own agents and environments, and post-train it with reinforcement learning.
DeepSWE, a 32B model trained with rLLM that achieves SOTA performance on SWEBench-Verified (42.2% Pass@1, 59.0% with test-time scaling) for open-source models.

In the rest of this post, we’ll dive into the rLLM framework. (Check out our DeepSWE blog post for more about our SOTA SWE agent)

2. rLLM — Enabling Language Agents to Learn from Experience

As Richard Sutton and David Silver wrote in “Welcome to the Era of Experience”, the future of AI lies not just in learning from static, human-curated datasets, but in learning from experience—from continuous interaction with dynamic environments.

To achieve this, agents must be live systems: deployed in the real world, collecting feedback from their interactions, and evolving through ongoing training. This shift demands a new class of frameworks—ones that support both inference-time execution and training-time adaptation within the same unified pipeline.

Most existing agentic frameworks focus solely on orchestration and inference, offering little to no support for post-deployment learning. Our goal with rLLM is to bridge this gap and make experience-driven learning accessible. We aim to provide: