Miles RL Training
Skill Verified ActiveProvides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
To enable enterprise-grade RL training for large-scale MoE models, optimizing for stability, low-precision compute, and train-inference alignment.
Features
- Unified FP8 training and inference
- INT4 Quantization-Aware Training for large models
- Rollout Routing Replay (R3) for MoE alignment
- Speculative RL for increased throughput
- Comprehensive documentation and troubleshooting guides
Use Cases
- Training 1TB+ MoE models
- FP8 or INT4 quantization-aware training
- Achieving bit-wise identical train-inference alignment
- Utilizing speculative RL for maximum throughput
Non-Goals
- Research-grade original slime framework (use slime directly)
- Flexible backend swapping (use verl)
- PyTorch-native abstractions (use torchforge)
Workflow
- Setup environment with FP8 block scaling and CUDA device connections.
- Configure training parameters, including GPU allocation, model checkpoints, parallelism sizes, and data paths.
- Verify model loads without errors, routing decisions are consistent, and no NaN/Inf values appear in loss.
- Enable speculative decoding via SGLang flags for faster rollout.
- Optionally enable online MTP training for draft model alignment.
- Monitor for expected speedup and verify training stability.
Prerequisites
- H100/H200 GPUs with FP8 support
- MoE model (DeepSeek V3, Qwen3-MoE)
- Docker environment with miles
Code Execution
- info:LoggingThe documentation mentions `--log-level DEBUG` and `export MILES_DEBUG=1`, indicating logging capabilities exist but a dedicated audit file for destructive actions is not explicitly mentioned.
Installation
First, add the marketplace
/plugin marketplace add Orchestra-Research/AI-Research-SKILLs/plugin install AI-Research-SKILLs@ai-research-skillsQuality Score
VerifiedTrust Signals
Similar Extensions
Miles Rl Training
92Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
Slime Rl Training
98Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Slime RL Training
78Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Tensorrt Llm
98Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.
Agentdb Learning
99Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.
Verl Rl Training
99Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.