Search — verl documentation

verl

Quickstart

Installation
Quickstart: PPO training on GSM8K dataset
Multinode Training

Programming guide

HybridFlow Programming Guide

Data Preparation

Prepare Data for Post-Training
Implement Reward Function for Dataset

Configurations

Config Explanation

PPO Example

PPO Example Architecture
GSM8K Example

PPO Trainer and Workers

PPO Ray Trainer
PyTorch FSDP Backend
Megatron-LM Backend

Performance Tuning Guide

Performance Tuning Guide
Upgrading to vLLM >= 0.8

Experimental Results

Algorithm Baselines

Advance Usage and Extension

Ray API Design Tutorial
Extend to other RL(HF) algorithms
Add models with the FSDP backend
Add models with the Megatron-LM backend
Using Checkpoints to Support Fault Tolerance Training

API References

Data interface

FAQ

Frequently Asked Questions

verl

Search

© Copyright 2024 ByteDance Seed Foundation MLSys Team.

Built with Sphinx using a theme provided by Read the Docs.