verl

Quickstart

  • Installation
  • Quickstart: PPO training on GSM8K dataset
  • Multinode Training

Programming guide

  • HybridFlow Programming Guide

Data Preparation

  • Prepare Data for Post-Training
  • Implement Reward Function for Dataset

Configurations

  • Config Explanation

PPO Example

  • PPO Example Architecture
  • GSM8K Example

PPO Trainer and Workers

  • PPO Ray Trainer
  • PyTorch FSDP Backend
  • Megatron-LM Backend

Performance Tuning Guide

  • Performance Tuning Guide
  • Upgrading to vLLM >= 0.8

Experimental Results

  • Algorithm Baselines

Advance Usage and Extension

  • Ray API Design Tutorial
  • Extend to other RL(HF) algorithms
  • Add models with the FSDP backend
  • Add models with the Megatron-LM backend
  • Using Checkpoints to Support Fault Tolerance Training

API References

  • Data interface

FAQ

  • Frequently Asked Questions
verl
  • Search


© Copyright 2024 ByteDance Seed Foundation MLSys Team.

Built with Sphinx using a theme provided by Read the Docs.