Welcome to verl’s documentation!

verl is a flexible, efficient and production-ready RL training framework designed for large language models (LLMs) post-training. It is an open source implementation of the HybridFlow paper.

verl is flexible and easy to use with:

Easy extension of diverse RL algorithms: The Hybrid programming model combines the strengths of single-controller and multi-controller paradigms to enable flexible representation and efficient execution of complex Post-Training dataflows. Allowing users to build RL dataflows in a few lines of code.
Seamless integration of existing LLM infra with modular APIs: Decouples computation and data dependencies, enabling seamless integration with existing LLM frameworks, such as PyTorch FSDP, Megatron-LM and vLLM. Moreover, users can easily extend to other LLM training and inference frameworks.
Flexible device mapping and parallelism: Supports various placement of models onto different sets of GPUs for efficient resource utilization and scalability across different cluster sizes.
Ready integration with popular HuggingFace models

verl is fast with:

State-of-the-art throughput: By seamlessly integrating existing SOTA LLM training and inference frameworks, verl achieves high generation and training throughput.
Efficient actor model resharding with 3D-HybridEngine: Eliminates memory redundancy and significantly reduces communication overhead during transitions between training and generation phases.

Quickstart

Programming guide

HybridFlow Programming Guide

Data Preparation

Configurations

Config Explanation
- ppo_trainer.yaml for FSDP Backend

PPO Example

PPO Trainer and Workers

Performance Tuning Guide

Experimental Results

Algorithm Baselines

Advance Usage and Extension

API References

Data interface

FAQ

Frequently Asked Questions

Contribution

verl is free software; you can redistribute it and/or modify it under the terms of the Apache License 2.0. We welcome contributions. Join us on GitHub, Slack and Wechat for discussions.

Code formatting

We use yapf (Google style) to enforce strict code formatting when reviewing MRs. Run yapf at the top level of verl repo:

pip3 install yapf
yapf -ir -vv --style ./.style.yapf verl examples tests