Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Verl Rl Training

Skill Aktiv

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Zweck

To enable users to implement advanced LLM post-training techniques like RLHF, GRPO, and PPO at scale using the verl library.

Funktionen

Guidance for RLHF, GRPO, PPO, and other RL algorithms
Support for large-scale LLM post-training
Flexible infrastructure backend configurations
Detailed installation and quick start examples
Comprehensive configuration reference

Anwendungsfälle

Implementing RLHF for LLM fine-tuning
Training LLMs with GRPO for reasoning tasks
Scaling PPO training for large language models
Leveraging flexible backends like FSDP, Megatron-LM, and vLLM

Nicht-Ziele

Megatron-native training (recommends other tools)
PyTorch-native abstractions with Monarch (recommends other tools)
Simple SFT/DPO (recommends other tools)

Workflow

Prepare Dataset
Define Reward Function
Create Training Config
Launch Training
Monitor and Validate

Praktiken

Reinforcement Learning
LLM Post-Training
Distributed Systems

Voraussetzungen

GPU cluster with 8+ GPUs (H100 recommended for math tasks)
Dataset in parquet format with 'prompt' and 'reward_model' columns
Base model from HuggingFace Hub
Install Megatron-LM bridge (for Megatron workflow)

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

95 /100

Analysiert about 18 hours ago

Vertrauenssignale

Letzter Commitabout 20 hours ago

GitHub-Inhaber davila7

Sterne27.2k

Downloads 23k

LizenzMIT

Websiteaitmpl.com

Status

Quellcode ansehen

Verl Rl Training

Funktionen

Anwendungsfälle

Nicht-Ziele

Workflow

Praktiken

Voraussetzungen

Trust

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Verl Rl Training

Fine Tuning With Trl

Openrlhf Training

Slime Rl Training

Openrlhf Training

Torchforge