TensorRT-LLM Backend
Last updated: 5/6/2026.
Authored By NVIDIA TensorRT-LLM Team
Introduction
TensorRT-LLM is a high-performance LLM inference engine with state-of-the-art optimizations for NVIDIA GPUs. The verl integration of TensorRT-LLM is based on TensorRT-LLM’s Ray orchestrator, with more features and performance optimizations to come.
For synchronous training, the TensorRT-LLM rollout adopts a mixed design combining aspects of the hybrid engine and colocated mode, instead of relying purely on standard colocated mode.
For asynchronous training, the TensorRT-LLM rollout follows other rollout backends and uses standalone mode for trainer and rollout placement.
TensorRT-LLM rollout supports the following key features, primarily tested on Qwen3 dense and MoE variants:
Synchronous training (GRPO, DAPO, etc.)
Cross-node inference
FP8 refit
Asynchronous training (further optimizations planned)
Preliminary support for VLM
You can track our roadmap and share feedback at the TensorRT-LLM rollout roadmap.
Installation
We recommend using docker/Dockerfile.stable.trtllm for building a docker image with TensorRT-LLM pre-installed. The verl integration is supported from nvcr.io/nvidia/tensorrt-llm/release:1.2.0rc6, and you can choose other TensorRT-LLM versions via TRTLLM_BASE_IMAGE from the NGC Catalog. The image is updated periodically to track TensorRT-LLM’s weekly releases.
Alternatively, refer to the TensorRT-LLM installation guide for compatible environments if you want to build your own.
Install verl with TensorRT-LLM:
pip install --upgrade pip
pip install -e ".[trtllm]"
Note
Using the TensorRT-LLM rollout requires setting the following environment variables before launching the Ray cluster. These have been included in all the example scripts:
# Clean all SLURM/MPI/PMIx env to avoid PMIx mismatch error.
for v in $(env | awk -F= '/^(PMI|PMIX|MPI|OMPI|SLURM)_/{print $1}'); do
unset "$v"
done
Using TensorRT-LLM rollout for GRPO
## For FSDP training engine
INFER_BACKEND=trtllm bash examples/grpo_trainer/run_qwen3_8b_fsdp.sh
## For Megatron-Core training engine
INFER_BACKEND=trtllm bash examples/grpo_trainer/run_qwen3_8b_megatron.sh
Using TensorRT-LLM rollout for DAPO with FP8
# For Megatron-Core training engine with FP8 rollout
INFER_BACKEND=trtllm ROLLOUT_QUANTIZATION=fp8 bash examples/grpo_trainer/run_qwen3_30b_a3b_megatron.sh
Using TensorRT-LLM rollout in fully async with GRPO
# Fully async policy with Megatron-Core training engine
bash verl/experimental/fully_async_policy/shell/grpo_30b_a3b_base_math_megatron_4_4_mis_trtllm.sh