此内容尚未提供您的语言版本,正在以英文显示。

Verl Rl Training

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

目的

To enable users to effectively train large language models at scale using reinforcement learning techniques with the verl framework, offering production-ready guidance and support.

功能

Guidance on verl RL training library
Support for RLHF, GRPO, PPO, and other RL algorithms
Flexible infrastructure backend configurations (FSDP, Megatron)
Detailed troubleshooting and common issue resolution
Examples for various training workflows

使用场景

Implementing RLHF for LLM post-training
Training LLMs at scale with flexible infrastructure
Leveraging GRPO for math and reasoning tasks
Configuring PPO with a critic model for dense reward tasks

非目标

Implementing Megatron-native training directly (use slime or miles)
Simple SFT/DPO tasks (use TRL or Axolotl)
Core LLM architecture development
Basic language model fine-tuning

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

99 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Verl Rl Training

技能

davila7

Openrlhf Training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

技能

Orchestra-Research

Slime Rl Training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

技能

Orchestra-Research

Openrlhf Training

技能

davila7

Torchforge

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

技能

Orchestra-Research

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

技能

Orchestra-Research