此内容尚未提供您的语言版本,正在以英文显示。

Slime Rl Training

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

目的

To provide a comprehensive and production-ready framework for training LLMs with Reinforcement Learning, enabling users to leverage Megatron-LM and SGLang for scalable and efficient model development.

功能

Megatron-LM integration for distributed training
SGLang for high-throughput generation rollouts
Flexible data buffer and custom generation/reward functions
Support for multiple LLM families (GLM, Qwen, Llama, etc.)
Detailed workflows for various training scenarios

使用场景

Training GLM models with RL
Implementing custom data generation pipelines for LLM fine-tuning
Integrating Megatron-LM with SGLang for RL scaling
Fine-tuning large language models on custom datasets using RL algorithms

非目标

Providing a simple prompt-based agent for basic LLM tasks
Replacing core LLM inference engines without framework integration
Generic model training outside the RL post-training context

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

98 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Miles RL Training

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

技能

Orchestra-Research

Miles Rl Training

技能

davila7