跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Slime RL Training

技能 已验证 活跃

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

目的

To enable users to perform advanced LLM post-training with Reinforcement Learning using a specific, integrated framework (slime), facilitating custom data generation and scaling training efforts.

功能

  • LLM post-training with RL using slime framework
  • Integration of Megatron-LM and SGLang
  • Support for various LLM families (GLM, Qwen, DeepSeek, Llama 3)
  • Multiple training workflows (standard, async, multi-turn)
  • Detailed configuration, installation, and troubleshooting guides

使用场景

  • Training GLM models with RL
  • Implementing custom data generation workflows for LLM training
  • Achieving tight Megatron-LM integration for RL scaling
  • Fine-tuning large language models with reinforcement learning techniques

非目标

  • Enterprise-grade stability features (suggests 'miles')
  • Flexible backend swapping (suggests 'verl')
  • PyTorch-native abstractions (suggests 'torchforge')
  • General LLM pre-training or inference outside of RL post-training

工作流

  1. Prepare Data
  2. Configure Model
  3. Launch Training (Standard/Async/Multi-Turn)
  4. Monitor Training

实践

  • Reinforcement Learning
  • Model Training
  • LLM Fine-tuning
  • Distributed Systems

先决条件

  • Docker environment or Megatron-LM + SGLang installed
  • Model checkpoint (HuggingFace or Megatron format)
  • Training data in JSONL format

Trust

  • info:Issues AttentionWith 17 open and 4 closed issues in the last 90 days, the closure rate is 19%, indicating slow maintainer response for ongoing issues.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
78 /100
about 22 hours ago 分析

信任信号

最近提交1 day ago
星标27.2k
许可证MIT
状态
查看源代码

类似扩展

Slime Rl Training

98

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

技能
Orchestra-Research

Miles RL Training

97

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

技能
Orchestra-Research

Miles Rl Training

92

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

技能
davila7

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

技能
K-Dense-AI

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

技能
K-Dense-AI

Nnsight Remote Interpretability

99

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

技能
davila7