Add models with the Megatron-LM backend

Last updated: 04/25/2025.

Model

If use latest verl, we have direct support of GPTModel for Megatron backend. You can use the similar way of using Megatron to pretrain custom models. We list the steps here:

Find model_initializer.py
If your model is configurable by TransformerLayerSpec , you can directly use GPTModel. Otherwise, Please implement a new ModelLayerSpec and ModelLayer here.
Use the right LayerSpec , TransformerConfig and HuggingfaceConfig as arguments to initialize the GPTModel.
Return the model at last.