SkipManager: Skip everything in the RL pipeline.
===========

Last updated: 2026-05-23

.. contents:: :local:
   :depth: 1

1. Overview
-----------

**SkipManager** (``verl.utils.skip.SkipManager``) is a general-purpose framework for **skipping
selected steps** in verl training flows. By bypassing expensive stages on configured steps, it helps
save **time**, **memory**, or other resources and improves **developer iteration speed** during
debugging and experimentation.

Skip behavior is centralized under the top-level Hydra key ``skip``. Modules register by **role**
(for example ``"rollout"`` or ``"async_rollout"``) and are attached with
``@SkipManager.annotate(role=...)``. Each role declares which integer **steps** in config are
eligible for skip logic. **Today only rollout-related roles are implemented**; the same mechanism
can be extended to other pipeline stages (see section 5).

Typical use cases
~~~~~~~~~~~~~~~~~

SkipManager is intended for development workflows where repeating full training is costly:

1. **Faster iteration**: skip heavy stages on chosen steps (e.g. generation) while exercising the
   rest of the pipeline.
2. **Deterministic replay**: cache and reload intermediate results to reproduce a prior run on
   specific steps.
3. **Resource savings**: avoid recomputing or holding large tensors when bisecting bugs or tuning
   downstream logic.

The built-in ``rollout`` / ``async_rollout`` modules apply this to sequence generation; other
roles can follow the same pattern as they are added.

Supported entry points today
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 28 36 36

   * - Training entry
     - Skip role / config
     - Status
   * - ``main_ppo.py`` (``RayPPOTrainer``)
     - ``skip.rollout``
     - **Supported**
   * - ``main_ppo_sync.py`` (TransferQueue + ReplayBuffer)
     - ``skip.rollout``
     - **Not supported** (see section 3)
   * - ``fully_async_main`` (``FullyAsyncRollouter``)
     - ``skip.async_rollout``
     - **Supported**


2. Shared configuration (``skip.rollout`` / ``skip.async_rollout``)
---------------------------------------------------------------------

Both roles use the same Hydra fields (``RolloutSkipConfig`` / ``AsyncRolloutSkipConfig`` in
``verl/utils/skip/config.py``). Defaults live in ``verl/trainer/config/ppo_trainer.yaml`` under
``skip.rollout`` and ``skip.async_rollout``.

Parameters
~~~~~~~~~~

- **enable** (bool): Master switch for this role.
- **dump_dir** (str): Root directory for cached ``DataProto`` shards (``~`` is expanded).
- **steps** (list[int]): Steps on which skip logic is *eligible*. Outside this list, the decorated
  function always runs normally.

  - For ``skip.rollout``: trainer **global_steps** (via ``SkipManager.set_step``).
  - For ``skip.async_rollout``: the feed-order index parsed from ``sample_id`` (see section 4) —
    **not** trainer ``global_steps``.

- **action** (``cache`` \| ``repeat``):

  - **cache**: If a valid dump exists for the current step, load it and skip generation; otherwise
    run generation and write under that step directory.
  - **repeat**: If any valid dump exists, load from a **substitute** step chosen by the algorithm
    below; otherwise run generation and dump as usual.

.. note::

   Only ``cache`` and ``repeat`` are validated in config today, even though ``SkipAction`` in
   ``verl.utils.skip.base_skip`` lists additional enum values for future modules.

``repeat`` step selection (``RolloutSkip._find_latest_step``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When ``action=repeat`` and the current step directory is missing or incomplete:

1. If the directory for the **current** step is valid, use the current step.
2. Else use the **largest** available step **strictly less than** the current step.
3. Else use the **smallest** available step **strictly greater than** the current step.
4. If no valid dump exists, skip does not apply: the wrapped function runs and may dump afterward.

``repeat`` does **not** guarantee the cached batch matches the current prompt or trainer step—use
it for debugging and iteration, and prefer ``cache`` when you need step-aligned replay.

Hydra CLI examples
~~~~~~~~~~~~~~~~~~

Colocated PPO (``skip.rollout``):

.. code-block:: bash

   skip.rollout.enable=True
   skip.rollout.dump_dir=/path/to/rollout_dump
   skip.rollout.steps=[1,2,3,10]
   skip.rollout.action=cache

Fully async (``skip.async_rollout``):

.. code-block:: bash

   skip.async_rollout.enable=True
   skip.async_rollout.dump_dir=/path/to/rollout_dump
   skip.async_rollout.steps=[1,2,3,4,5]
   skip.async_rollout.action=cache

To pass a long step list from **bash** only (not valid inside static YAML):

.. code-block:: bash

   skip.async_rollout.steps="[$(seq -s, 1 128)]"

On-disk layout
~~~~~~~~~~~~~~

.. code-block:: text

   {dump_dir}/{experiment_name}_{project_name}/
       └── GBS{gbs}_N{n}_in{prompt_len}_out{response_len}/
           ├── {step}/
           │   ├── gen_batch.dp
           │   └── meta.json
           └── ...

- **experiment_name** / **project_name**: from ``trainer.experiment_name`` and
  ``trainer.project_name`` in the run config.
- **gbs**, **n**, **prompt_len**, **response_len**: from ``data.gen_batch_size`` (or train batch
  size), ``actor_rollout_ref.rollout.n``, ``data.max_prompt_length``, and
  ``data.max_response_length``.

Caches from colocated ``main_ppo`` (larger **GBS**) and fully async streaming (typically **GBS=1**)
are generally **not** interchangeable unless these metadata match.

Minimal workflow (``cache``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. **First run** with ``enable=True``, ``action=cache``, and ``steps`` listing the steps you care
   about. Empty ``dump_dir`` → generation runs and writes ``gen_batch.dp`` + ``meta.json`` per step.
2. **Second run** with the same config and compatible trainer metadata → listed steps load from
   disk instead of regenerating.
3. **Partial caches** (some step dirs missing): those steps regenerate on the next run; other steps
   still load if present.

Relationship to legacy RolloutSkip
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If **both** ``skip.rollout.enable`` and legacy ``actor_rollout_ref.rollout.skip.enable`` are true,
SkipManager emits a ``DeprecationWarning`` and **forces** the legacy flag to ``False`` so only one
mechanism runs.


3. Rollout quick start (``rollout`` role)
-----------------------------------------

Use ``skip.rollout`` when training with ``main_ppo.py`` / ``RayPPOTrainer`` and the standard
``AgentLoopManager.generate_sequences`` path. Configuration fields and ``cache`` / ``repeat``
semantics are in section 2.

**``main_ppo.py`` (supported)**

- ``RayPPOTrainer.fit()`` calls ``SkipManager.init(self.config)`` and
  ``SkipManager.set_step(self.global_steps)`` each training step.
- ``AgentLoopManager.generate_sequences`` is decorated with
  ``@SkipManager.annotate(role="rollout")``.

**``main_ppo_sync.py`` (not supported yet)**

``main_ppo_sync`` replaces the Agent Loop integration with ``AgentLoopManagerTQ``. The main reason
rollout skip is not supported today is **logic coupling** in
``AgentLoopManagerTQ.generate_sequences``: it not only drives sequence generation, but also marks
samples in the ReplayBuffer and **writes generated data into TransferQueue (TQ)**. Skipping
``generate_sequences`` would therefore skip both generation and the TQ handoff, which breaks the
downstream training loop that consumes data from TQ.

Decoupling “generate” from “enqueue to TQ” is non-trivial under the current design, so SkipManager
adaptation for ``main_ppo_sync`` is **deferred** until the TransferQueue-based training path is
further stabilized.


4. Fully async quick start (``async_rollout`` role)
---------------------------------------------------

In :doc:`advance/fully_async`, Trainer and Rollouter run in separate processes. Rollout generation
happens on the Rollouter via streaming single-sample dispatch. Use ``skip.async_rollout`` (not
``skip.rollout``) when launching ``fully_async_main``. Shared Hydra fields and on-disk layout are
in section 2.

.. important::

   In ``async_rollout``, a step is **not** the trainer timeline. It is only the **prompt request /
   feed order** on the Rollouter: the monotonic index in ``sample_{epoch}_{index}`` when
   ``FullyAsyncRollouter`` enqueues the next prompt. Under concurrent rollout, completion order can
   differ from feed order; do not treat these indices as trainer ``global_steps`` or parameter-sync
   boundaries when configuring ``skip.async_rollout.steps``.

Step key from ``sample_id``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each fed sample carries an id of the form ``sample_{epoch}_{index}`` (for example
``sample_0_42``). The integer matched against ``skip.async_rollout.steps`` and used for on-disk
directories is the **last segment** — Rollouter feed-order index at enqueue time.

**Wiring**

- ``FullyAsyncRollouter`` calls ``SkipManager.init(self.config)`` in the Rollouter process.
- ``FullyAsyncAgentLoopManager.generate_sequences_single`` is decorated with
  ``@SkipManager.annotate(role="async_rollout")`` and receives ``sample_id`` for online step
  resolution.


5. Design and implementation
----------------------------

SkipManager API
~~~~~~~~~~~~~~~

``SkipManager`` (``verl.utils.skip.skip_manager``) is a class-level registry:

- **``init(config)``**: Parse ``config.skip`` into ``SkipManagerConfig``, instantiate one skip module
  per registered role, and store them in ``SkipManager.skip_instances``.
- **``set_step(step: int)``**: Set ``SkipManager.step`` for roles with ``support_online_step =
  False`` (trainer ``global_steps`` in ``main_ppo``).
- **``annotate(role, **kwargs)``**: Decorator factory for sync or async functions.

Decorator flow
~~~~~~~~~~~~~~

.. code-block:: text

   call decorated function
        │
        ▼
   skip disabled or role missing? ──yes──► run original function
        │no
        ▼
   resolve step (set_step vs extract_step)
        │
        ▼
   step ∉ config.steps? ──yes──► run original function
        │no
        ▼
   meet_precondition (cache/repeat)? ──yes──► warp_function (load cache)
        │no
        ▼
   run original function → prepare_data (dump)

BaseSkip interface
~~~~~~~~~~~~~~~~~~

Each skip module subclasses ``BaseSkip`` (``verl.utils.skip.base_skip``) and registers via
``@register_skip("role_name")``.

- **``support_actions``**: Allowed ``SkipAction`` values for this module.
- **``support_online_step``**: When ``True``, use ``extract_step`` per call instead of
  ``SkipManager.step``.

Instance methods: ``is_enabled``, ``meet_precondition``, ``warp_function``, ``prepare_data``, and
``extract_step`` (required when ``support_online_step`` is ``True``).

``RolloutSkip`` / ``AsyncRolloutSkip`` (``verl.utils.skip.rollout_skip``) implement generation
caching for the ``rollout`` and ``async_rollout`` roles.

Intercepted functions
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
   :header-rows: 1
   :widths: 16 34 28 22

   * - Role
     - Decorated function
     - Defined in
     - Step source
   * - ``rollout``
     - ``AgentLoopManager.generate_sequences``
     - ``verl/experimental/agent_loop/agent_loop.py``
     - ``SkipManager.set_step`` → trainer ``global_steps``
   * - ``async_rollout``
     - ``FullyAsyncAgentLoopManager.generate_sequences_single``
     - ``verl/experimental/fully_async_policy/fully_async_rollouter.py``
     - ``extract_step`` → ``sample_id`` suffix → **prompt feed order**

**``rollout``** wraps the full batch Agent Loop RPC (chunk dispatch, concat, timing) as one skip
unit.

**``async_rollout``** wraps one streaming sample's ``generate_sequences_single(self, prompts,
sample_id)`` so concurrent samples resolve step independently.

Step resolution: ``set_step`` vs ``support_online_step``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See section 2 for ``steps`` semantics per role.

- **Shared ``SkipManager.step``**: One class-level slot per process. Fits sequential trainer loops
  (``main_ppo``): ``set_step(global_steps)`` before rollout.
- **Online step**: ``AsyncRolloutSkip`` sets ``support_online_step = True`` and parses
  ``sample_id`` on each call so in-flight async samples do not share a single counter. For
  ``repeat``, ``RolloutSkip`` recomputes ``_find_latest_step`` on every ``meet_precondition`` and
  ``warp_function`` call (no shared mutable step field on the skip instance).

Extending with custom skip modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1. Subclass ``BaseSkip`` from ``verl.utils.skip.base_skip``.
2. Decorate the class with ``@register_skip("your_role_name")``.
3. Add a matching field under ``SkipManagerConfig``.
4. Attach ``@SkipManager.annotate(role="your_role_name")``. For concurrent pipelines, prefer
   ``support_online_step = True`` and pass step identity through call arguments.