Slime RL Training

Skill Verified Active

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Purpose

To enable users to perform advanced LLM post-training with Reinforcement Learning using a specific, integrated framework (slime), facilitating custom data generation and scaling training efforts.

Features

LLM post-training with RL using slime framework
Integration of Megatron-LM and SGLang
Support for various LLM families (GLM, Qwen, DeepSeek, Llama 3)
Multiple training workflows (standard, async, multi-turn)
Detailed configuration, installation, and troubleshooting guides

Use Cases

Training GLM models with RL
Implementing custom data generation workflows for LLM training
Achieving tight Megatron-LM integration for RL scaling
Fine-tuning large language models with reinforcement learning techniques

Non-Goals

Enterprise-grade stability features (suggests 'miles')
Flexible backend swapping (suggests 'verl')
PyTorch-native abstractions (suggests 'torchforge')
General LLM pre-training or inference outside of RL post-training

Workflow

Prepare Data
Configure Model
Launch Training (Standard/Async/Multi-Turn)
Monitor Training

Practices

Reinforcement Learning
Model Training
LLM Fine-tuning
Distributed Systems

Prerequisites

Docker environment or Megatron-LM + SGLang installed
Model checkpoint (HuggingFace or Megatron format)
Training data in JSONL format

Trust

info:Issues AttentionWith 17 open and 4 closed issues in the last 90 days, the closure rate is 19%, indicating slow maintainer response for ongoing issues.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified

78 /100

Analyzed about 22 hours ago

Trust Signals

Last commit1 day ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

Slime Rl Training

Skill

Orchestra-Research

Miles RL Training

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Skill

Orchestra-Research

Miles Rl Training

Skill

davila7

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

Skill

K-Dense-AI

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

Skill

K-Dense-AI

Nnsight Remote Interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

Skill

davila7