verl

Quickstart

  • Installation
  • Quickstart: PPO training on GSM8K dataset
  • Multinode Training
  • Ray Debug Tutorial
  • More Resources
  • Agentic RL Training

Programming guide

  • HybridFlow Programming Guide
  • The Design of verl.single_controller

Data Preparation

  • Prepare Data for Post-Training
  • Implement Reward Function for Dataset

Configurations

  • Config Explanation

PPO Example

  • PPO Example Architecture
  • GSM8K Example
  • Megatron-FSDP Example
  • Multi-Modal Example Architecture
  • SkyPilot Examples

Algorithms

  • Proximal Policy Optimization (PPO)
  • Group Relative Policy Optimization (GRPO)
  • Recipe: Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO)
  • Recipe: Self-Play Fine-Tuning (SPIN)
  • Recipe: Self-Play Preference Optimization (SPPO)
  • Recipe: Entropy Mechanism
  • On-Policy RL with Optimal Reward Baseline (OPO)
  • Algorithm Baselines
  • GPG: Group Policy Gradient
  • Rollout Correction
  • Mathematical Formulations of Rollout Correction Methods in verl
  • Optimal Token Baseline (OTB)
  • Divergence Proximal Policy Optimization (DPPO)
  • On-Policy Distillation (OPD)

PPO Trainer and Workers

  • PPO Ray Trainer
  • Model Engine
  • Engine Workers
  • Automodel Backend
  • SGLang Backend
  • TensorRT-LLM Backend

Performance Tuning Guide

  • Training DeepSeek 671b
  • Verl LLM Best Practices (DAPO + Qwen3-235B)
  • Performance Tuning Guide
  • Rollout KV Cache Offload via Mooncake-Store
  • Upgrading to vLLM >= 0.8
  • Hardware Resource Needed for RL
  • verl Profiler System
  • NVIDIA Nsight Systems profiling in verl
  • PyTorch Profiling in verl

Adding new models

  • Add models with the FSDP backend
  • Add models with the Megatron-LM backend

Async Training

  • Recipe: One Step Off Policy Async Trainer
  • Recipe: Fully Async Policy Trainer
  • Recipe: Async On-Policy Knowledge Distillation Trainer

Low Precision

  • FP8 RL in verl
  • NVFP4 QAT (Quantization-Aware Training) in verl

Advanced Features

  • Using Checkpoints to Support Fault Tolerance Training
  • RoPE Scaling override
  • Attention Implementation Override
  • RL(HF) algorithms with LoRA Support
  • Multi-turn Rollout Support
  • Ray API Design Tutorial
  • Extend to other RL(HF) algorithms
  • Sandbox Fusion Example
  • Trace Function Usage Instructions
  • SkipManager: Skip everything in the RL pipeline.
  • Agent Loop
  • Reward Loop
  • TransferQueue Data System
  • Use Prometheus and Grafana to Monitor Rollout
  • Guide to Using MTP in SFT/RL Training and Inference

Hardware Support

  • Multi-Chip Support
  • AMD (ROCm) Tutorial
  • Ascend (NPU) Tutorial
    • Ascend Tutorial
    • Ascend Dockerfile Build Guidance
    • Ascend Install Guidance
    • Ascend Quickstart
    • Ascend Quickstart with vLLM Backend
    • Ascend Quickstart with SGLang Backend
    • Ascend Backend Features Guide
    • NPU 高级特性指南
    • NPU Model & Algorithms Support Status
    • Ascend Retool Best Practice
    • Ascend SGLang Best Practice
    • DAPO multi model optimization practice
    • NPU Qwen3-32B GSPO Optimization Practice
    • Qwen3.5-122B-A10B NPU 使用指南
    • 模型评测
    • 训练配置参数与指标说明
    • Transfer to NPU guide
    • Precision Alignment
    • Precision Debugger (msprobe) in verl
    • Ascend Performance Analysis Guide
    • Performance Tuning Guide on Ascend
    • Profiling采集指导
    • Profiling Data Collection Guide
    • NPU 常见问题解答
    • NPU-CI 添加指导

API References

  • Data interface
  • Single Controller interface
  • Trainer Interface
  • Utilities

Blog

  • verl 0.7 release blog

FAQ

  • Frequently Asked Questions

Contributing

  • Editing Agent Instructions

Development Notes

  • Sandbox Fusion Tool Integration
verl
  • Ascend (NPU) Tutorial
  • View page source

Ascend (NPU) Tutorial

Last updated: 06/05/2026.

Getting Started

  • Ascend Tutorial
  • Ascend Dockerfile Build Guidance
  • Ascend Install Guidance
  • Ascend Quickstart
  • Ascend Quickstart with vLLM Backend
  • Ascend Quickstart with SGLang Backend

Feature Support

  • Ascend Backend Features Guide
  • NPU 高级特性指南

Model Support

  • NPU Model & Algorithms Support Status
  • Ascend Retool Best Practice
  • Ascend SGLang Best Practice
  • DAPO multi model optimization practice
  • NPU Qwen3-32B GSPO Optimization Practice
  • Qwen3.5-122B-A10B NPU 使用指南

Developer Guide

  • 模型评测
  • 训练配置参数与指标说明
  • Transfer to NPU guide
  • Precision Alignment
  • Precision Debugger (msprobe) in verl
  • Ascend Performance Analysis Guide
  • Performance Tuning Guide on Ascend
  • Profiling采集指导
  • Profiling Data Collection Guide

FAQ & Contributing

  • NPU 常见问题解答
  • NPU-CI 添加指导
Previous Next

© Copyright 2024 ByteDance Seed Foundation MLSys Team.

Built with Sphinx using a theme provided by Read the Docs.