Sandbox Fusion Example

Introduction

Sandbox Fusion is a remote code sandbox service that provides a secure environment for running and evaluating code generated by Large Language Models (LLMs). This example demonstrates how to train an LLM and use Sandbox Fusion to verify generated code, enhancing both security and performance.

By leveraging a remote code sandbox service with greater CPU resources for concurrent code verification, you can reduce the reward stage time by 10-30%, depending on the quality of the generated code.

Step 1: Prepare the Dataset

We use the Eurus-2-RL-Data dataset for training. This dataset combines math and code questions, making it suitable for LLM training tasks. You can download it from HuggingFace: Eurus-2-RL-Data Dataset.

Step 2: Set Up the Sandbox Fusion Service

Sandbox Fusion is a remote code sandbox service designed to securely run and evaluate LLM-generated code. To use it:

  1. Access Full Documentation: For detailed setup instructions, refer to the Sandbox Fusion Documentation.

  2. Deploy the Service: Choose one of the following deployment methods:

After deployment, you will receive an API endpoint in the format: https://<ip-address-or-domain-name>/run_code.

Step 3: Configure the Training Script

To integrate Sandbox Fusion into your training script, configure the following parameters:

Key Settings for Sandbox Fusion

  • reward_model.sandbox_fusion.url='<API-endpoint>': Enable Sandbox Fusion by specifying the API endpoint (must end with /run_code).

  • reward_model.sandbox_fusion.max_concurrent=256: Set the maximum number of concurrent API requests to the Sandbox Fusion service.

  • reward_model.sandbox_fusion.memory_limit_mb=1024: Set the memory limit (in MB) for each sandbox instance. Defaults to 1024MB if not specified.

Additional Optimization

To further reduce code verification time, enable parallel processing with:

  • reward_model.reward_manager=prime: The Prime reward manager verifies code across multiple subprocesses concurrently.

Example Script

For a practical implementation, refer to the example script:

examples/ppo_trainer/run_deepseek7b_llm_sandbox_fusion.sh

Once you’ve set your API endpoint in the script, you can start the training job.