Verl Rl Training

Skill Active

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Purpose

To enable users to implement advanced LLM post-training techniques like RLHF, GRPO, and PPO at scale using the verl library.

Features

Guidance for RLHF, GRPO, PPO, and other RL algorithms
Support for large-scale LLM post-training
Flexible infrastructure backend configurations
Detailed installation and quick start examples
Comprehensive configuration reference

Use Cases

Implementing RLHF for LLM fine-tuning
Training LLMs with GRPO for reasoning tasks
Scaling PPO training for large language models
Leveraging flexible backends like FSDP, Megatron-LM, and vLLM

Non-Goals

Megatron-native training (recommends other tools)
PyTorch-native abstractions with Monarch (recommends other tools)
Simple SFT/DPO (recommends other tools)

Workflow

Prepare Dataset
Define Reward Function
Create Training Config
Launch Training
Monitor and Validate

Practices

Reinforcement Learning
LLM Post-Training
Distributed Systems

Prerequisites

GPU cluster with 8+ GPUs (H100 recommended for math tasks)
Dataset in parquet format with 'prompt' and 'reward_model' columns
Base model from HuggingFace Hub
Install Megatron-LM bridge (for Megatron workflow)

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

95 /100

Analyzed about 18 hours ago

Trust Signals

Last commitabout 20 hours ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

Verl Rl Training

Skill

Orchestra-Research

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill

Orchestra-Research

Openrlhf Training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Skill

Orchestra-Research

Slime Rl Training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Skill

Orchestra-Research

Openrlhf Training

Skill

davila7

Torchforge

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

Skill

Orchestra-Research