此内容尚未提供您的语言版本,正在以英文显示。

Slime RL Training

技能已验证活跃

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

目的

To enable users to perform advanced LLM post-training with Reinforcement Learning using a specific, integrated framework (slime), facilitating custom data generation and scaling training efforts.

功能

LLM post-training with RL using slime framework
Integration of Megatron-LM and SGLang
Support for various LLM families (GLM, Qwen, DeepSeek, Llama 3)
Multiple training workflows (standard, async, multi-turn)
Detailed configuration, installation, and troubleshooting guides

使用场景

Training GLM models with RL
Implementing custom data generation workflows for LLM training
Achieving tight Megatron-LM integration for RL scaling
Fine-tuning large language models with reinforcement learning techniques

非目标

Enterprise-grade stability features (suggests 'miles')
Flexible backend swapping (suggests 'verl')
PyTorch-native abstractions (suggests 'torchforge')
General LLM pre-training or inference outside of RL post-training

工作流

Prepare Data
Configure Model
Launch Training (Standard/Async/Multi-Turn)
Monitor Training

实践

Reinforcement Learning
Model Training
LLM Fine-tuning
Distributed Systems

先决条件

Docker environment or Megatron-LM + SGLang installed
Model checkpoint (HuggingFace or Megatron format)
Training data in JSONL format

Trust

info:Issues AttentionWith 17 open and 4 closed issues in the last 90 days, the closure rate is 19%, indicating slow maintainer response for ongoing issues.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

78 /100

about 22 hours ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Slime Rl Training

技能

Orchestra-Research

Miles RL Training

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

技能

Orchestra-Research

Miles Rl Training

技能

davila7

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

技能

K-Dense-AI

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

技能

K-Dense-AI

Nnsight Remote Interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

技能

davila7