Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Verl Rl Training

Skill Aktiv

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Zweck

To enable users to implement advanced LLM post-training techniques like RLHF, GRPO, and PPO at scale using the verl library.

Funktionen

  • Guidance for RLHF, GRPO, PPO, and other RL algorithms
  • Support for large-scale LLM post-training
  • Flexible infrastructure backend configurations
  • Detailed installation and quick start examples
  • Comprehensive configuration reference

Anwendungsfälle

  • Implementing RLHF for LLM fine-tuning
  • Training LLMs with GRPO for reasoning tasks
  • Scaling PPO training for large language models
  • Leveraging flexible backends like FSDP, Megatron-LM, and vLLM

Nicht-Ziele

  • Megatron-native training (recommends other tools)
  • PyTorch-native abstractions with Monarch (recommends other tools)
  • Simple SFT/DPO (recommends other tools)

Workflow

  1. Prepare Dataset
  2. Define Reward Function
  3. Create Training Config
  4. Launch Training
  5. Monitor and Validate

Praktiken

  • Reinforcement Learning
  • LLM Post-Training
  • Distributed Systems

Voraussetzungen

  • GPU cluster with 8+ GPUs (H100 recommended for math tasks)
  • Dataset in parquet format with 'prompt' and 'reward_model' columns
  • Base model from HuggingFace Hub
  • Install Megatron-LM bridge (for Megatron workflow)

Trust

  • warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

95 /100
Analysiert about 18 hours ago

Vertrauenssignale

Letzter Commitabout 20 hours ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
Orchestra-Research

Fine Tuning With Trl

96

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill
Orchestra-Research

Openrlhf Training

99

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Skill
Orchestra-Research

Slime Rl Training

98

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

Skill
Orchestra-Research

Openrlhf Training

97

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Skill
davila7

Torchforge

96

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

Skill
Orchestra-Research