Miles RL Training

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Purpose

To enable enterprise-grade RL training for large-scale MoE models, optimizing for stability, low-precision compute, and train-inference alignment.

Features

Unified FP8 training and inference
INT4 Quantization-Aware Training for large models
Rollout Routing Replay (R3) for MoE alignment
Speculative RL for increased throughput
Comprehensive documentation and troubleshooting guides

Use Cases

Training 1TB+ MoE models
FP8 or INT4 quantization-aware training
Achieving bit-wise identical train-inference alignment
Utilizing speculative RL for maximum throughput

Non-Goals

Research-grade original slime framework (use slime directly)
Flexible backend swapping (use verl)
PyTorch-native abstractions (use torchforge)

Workflow

Setup environment with FP8 block scaling and CUDA device connections.
Configure training parameters, including GPU allocation, model checkpoints, parallelism sizes, and data paths.
Verify model loads without errors, routing decisions are consistent, and no NaN/Inf values appear in loss.
Enable speculative decoding via SGLang flags for faster rollout.
Optionally enable online MTP training for draft model alignment.
Monitor for expected speedup and verify training stability.

Prerequisites

H100/H200 GPUs with FP8 support
MoE model (DeepSeek V3, Qwen3-MoE)
Docker environment with miles

Code Execution

info:LoggingThe documentation mentions `--log-level DEBUG` and `export MILES_DEBUG=1`, indicating logging capabilities exist but a dedicated audit file for destructive actions is not explicitly mentioned.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

97 /100

Analyzed about 20 hours ago

Trust Signals

Last commit16 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

Miles Rl Training

Skill

davila7

Slime Rl Training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Skill

Orchestra-Research

Slime RL Training

Skill

davila7

Tensorrt Llm

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Skill

Orchestra-Research

Agentdb Learning

Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.

Skill

ruvnet

Verl Rl Training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill

Orchestra-Research