此内容尚未提供您的语言版本,正在以英文显示。

Verl Rl Training

技能活跃

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

目的

To enable users to implement advanced LLM post-training techniques like RLHF, GRPO, and PPO at scale using the verl library.

功能

Guidance for RLHF, GRPO, PPO, and other RL algorithms
Support for large-scale LLM post-training
Flexible infrastructure backend configurations
Detailed installation and quick start examples
Comprehensive configuration reference

使用场景

Implementing RLHF for LLM fine-tuning
Training LLMs with GRPO for reasoning tasks
Scaling PPO training for large language models
Leveraging flexible backends like FSDP, Megatron-LM, and vLLM

非目标

Megatron-native training (recommends other tools)
PyTorch-native abstractions with Monarch (recommends other tools)
Simple SFT/DPO (recommends other tools)

工作流

Prepare Dataset
Define Reward Function
Create Training Config
Launch Training
Monitor and Validate

实践

Reinforcement Learning
LLM Post-Training
Distributed Systems

先决条件

GPU cluster with 8+ GPUs (H100 recommended for math tasks)
Dataset in parquet format with 'prompt' and 'reward_model' columns
Base model from HuggingFace Hub
Install Megatron-LM bridge (for Megatron workflow)

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

95 /100

about 18 hours ago 分析

信任信号

最近提交about 20 hours ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Verl Rl Training

技能

Orchestra-Research

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

技能

Orchestra-Research

Openrlhf Training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

技能

Orchestra-Research

Slime Rl Training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

技能

Orchestra-Research

Openrlhf Training

技能

davila7

Torchforge

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

技能

Orchestra-Research