Multi-Chip Support

Last updated: 06/03/2026.

Overview

verl supports RL training across multiple hardware platforms through a unified plugin system. The architecture consists of two main subsystems:

Platform Plugin System (verl.plugin.platform) — A hardware abstraction layer with auto-detection and a unified device API.
Engine Plugin System (verl.workers.engine.base) — Training engine extensions that add chip-specific optimizations on top of existing FSDP/Megatron engines.

Hardware Support

Built-in (verl core):

NVIDIA GPU (CUDA)
Huawei Ascend NPU

Via verl-hardware-plugin (reference implementations):

Other hardware platforms are supported through the external verl-hardware-plugin package, which provides reference implementations for vendors to adapt:

Intel XPU (Data Center GPU Max / Arc)
Cambricon MLU (MLU370 / MLU590)
MetaX (CUDA-compatible)

Note

The implementations in verl-hardware-plugin are examples only. Full production support requires collaboration with the respective hardware vendors. Vendors can use these as templates to build and maintain their own plugins.

Design Principles

Plugin Architecture: Platform backends and engine extensions register via decorator-based registries (PlatformRegistry, EngineRegistry), requiring no modifications to verl core code.
Auto-Detection + Manual Override: The platform auto-detects hardware type by probing is_available(use_smi_check=True) on each registered platform. Can be explicitly overridden via the VERL_PLATFORM environment variable.
Two-Dimensional Engine Lookup: Engines register with both device (torch device type) and vendor (hardware vendor). Lookup priority:
- Exact match (device, vendor) — vendor-specific engine
- Fallback to device-only key — base engine for that device type
- For CUDA-compatible devices, fallback to base CUDA engine
Backward Compatibility: The legacy verl.utils.device API is preserved as a thin wrapper over the platform plugin system. Existing code continues to work without modification.

Architecture Overview

+-------------------------------------------------------------------+
|                  verl Multi-Chip Architecture                      |
+-------------------------------------------------------------------+
|                                                                    |
|  +---------------------------------------------------------+      |
|  |              Platform Plugin System                      |      |
|  |            (verl.plugin.platform)                        |      |
|  |                                                          |      |
|  |  PlatformRegistry                                        |      |
|  |    ├─ "nvidia"    → PlatformCUDA      (built-in)         |      |
|  |    ├─ "huawei"    → PlatformNPU       (built-in)         |      |
|  |    ├─ "intel"     → PlatformXPU       (plugin)           |      |
|  |    ├─ "cambricon" → PlatformMLU       (plugin)           |      |
|  |    └─ "metax"     → PlatformMetaX     (plugin)           |      |
|  |                                                          |      |
|  +---------------------------------------------------------+      |
|                                                                    |
|  +---------------------------------------------------------+      |
|  |              Engine Plugin System                        |      |
|  |            (verl.workers.engine.base)                    |      |
|  |                                                          |      |
|  |  EngineRegistry  (device, vendor) → Engine class         |      |
|  |       |                                                  |      |
|  |       +-- ("cuda", None)     → FSDPEngineWithLMHead      |      |
|  |       +-- ("npu", None)      → FSDPNPUEngineWithLMHead   |      |
|  |       +-- ("cuda", "metax")  → FSDPMetaXEngineWithLMHead |      |
|  |       +-- ("xpu", "intel")   → FSDPXPUEngineWithLMHead   |      |
|  |       +-- ("mlu","cambricon")→ FSDPMLUEngineWithLMHead   |      |
|  |                                                          |      |
|  +---------------------------------------------------------+      |
|                                                                    |
+-------------------------------------------------------------------+

Plugin Loading

verl discovers plugins through two mechanisms:

setuptools entry_points (verl.plugins group) — standard Python packaging mechanism. After pip install, the plugin is auto-discovered.
``VERL_USE_EXTERNAL_MODULES`` environment variable — for development or non-packaged plugins:
```
export VERL_USE_EXTERNAL_MODULES=verl_hardware_plugin
```

Platform Registration

Each platform class registers via decorator:

@PlatformRegistry.register(platform="my_vendor")
class PlatformMyDevice(PlatformBase):
    @property
    def device_name(self) -> str:
        return "my_device"  # torch device type

    @property
    def vendor_name(self) -> str:
        return "my_vendor"  # used for engine lookup

Platform selection priority:

VERL_PLATFORM environment variable (explicit override)
Auto-detection via is_available(use_smi_check=True)
Fallback to "nvidia"

Engine Registration

Engine classes register with device and vendor:

@EngineRegistry.register(
    model_type="language_model",
    backend=["fsdp", "fsdp2"],
    device="cuda",           # torch device type
    vendor="my_vendor",      # vendor name
)
class FSDPMyVendorEngineWithLMHead(FSDPEngineWithLMHead):
    def initialize(self):
        super().initialize()
        # vendor-specific initialization

Engine lookup calls get_device_name() and get_vendor() from the active platform, then resolves the engine by (device_name, vendor_name) key.

Environment variable overrides for engine selection:

VERL_ENGINE_DEVICE — override detected device name
VERL_ENGINE_VENDOR — override detected vendor name

Adding New Hardware

For a step-by-step guide on adding support for a new hardware platform, see the verl-hardware-plugin Development Guide.

The core platform and engine registry mechanism is implemented in PR #6086.