Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Slime Rl Training

Skill Verifiziert Aktiv

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Zweck

To provide a comprehensive and production-ready framework for training LLMs with Reinforcement Learning, enabling users to leverage Megatron-LM and SGLang for scalable and efficient model development.

Funktionen

  • Megatron-LM integration for distributed training
  • SGLang for high-throughput generation rollouts
  • Flexible data buffer and custom generation/reward functions
  • Support for multiple LLM families (GLM, Qwen, Llama, etc.)
  • Detailed workflows for various training scenarios

Anwendungsfälle

  • Training GLM models with RL
  • Implementing custom data generation pipelines for LLM fine-tuning
  • Integrating Megatron-LM with SGLang for RL scaling
  • Fine-tuning large language models on custom datasets using RL algorithms

Nicht-Ziele

  • Providing a simple prompt-based agent for basic LLM tasks
  • Replacing core LLM inference engines without framework integration
  • Generic model training outside the RL post-training context

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert
98 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Miles RL Training

97

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Skill
Orchestra-Research

Miles Rl Training

92

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Skill
davila7

Slime RL Training

78

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Skill
davila7

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
Orchestra-Research

Fine Tuning With Trl

96

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill
Orchestra-Research

Verl Rl Training

95

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
davila7