跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Slime Rl Training

技能 已验证 活跃

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

目的

To provide a comprehensive and production-ready framework for training LLMs with Reinforcement Learning, enabling users to leverage Megatron-LM and SGLang for scalable and efficient model development.

功能

  • Megatron-LM integration for distributed training
  • SGLang for high-throughput generation rollouts
  • Flexible data buffer and custom generation/reward functions
  • Support for multiple LLM families (GLM, Qwen, Llama, etc.)
  • Detailed workflows for various training scenarios

使用场景

  • Training GLM models with RL
  • Implementing custom data generation pipelines for LLM fine-tuning
  • Integrating Megatron-LM with SGLang for RL scaling
  • Fine-tuning large language models on custom datasets using RL algorithms

非目标

  • Providing a simple prompt-based agent for basic LLM tasks
  • Replacing core LLM inference engines without framework integration
  • Generic model training outside the RL post-training context

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证
98 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标8.3k
许可证MIT
状态
查看源代码

类似扩展

Miles RL Training

97

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

技能
Orchestra-Research

Miles Rl Training

92

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

技能
davila7

Slime RL Training

78

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

技能
davila7

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能
Orchestra-Research

Fine Tuning With Trl

96

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

技能
Orchestra-Research

Verl Rl Training

95

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能
davila7