Slime Rl Training
Skill Verifiziert AktivProvides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
To provide a comprehensive and production-ready framework for training LLMs with Reinforcement Learning, enabling users to leverage Megatron-LM and SGLang for scalable and efficient model development.
Funktionen
- Megatron-LM integration for distributed training
- SGLang for high-throughput generation rollouts
- Flexible data buffer and custom generation/reward functions
- Support for multiple LLM families (GLM, Qwen, Llama, etc.)
- Detailed workflows for various training scenarios
Anwendungsfälle
- Training GLM models with RL
- Implementing custom data generation pipelines for LLM fine-tuning
- Integrating Megatron-LM with SGLang for RL scaling
- Fine-tuning large language models on custom datasets using RL algorithms
Nicht-Ziele
- Providing a simple prompt-based agent for basic LLM tasks
- Replacing core LLM inference engines without framework integration
- Generic model training outside the RL post-training context
Installation
Zuerst Marketplace hinzufügen
/plugin marketplace add Orchestra-Research/AI-Research-SKILLs/plugin install AI-Research-SKILLs@ai-research-skillsQualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Miles RL Training
97Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
Miles Rl Training
92Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
Slime RL Training
78Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.
Verl Rl Training
99Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
Fine Tuning With Trl
96Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
Verl Rl Training
95Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.