Openrlhf Training

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Purpose

To enable efficient and scalable training of large language models using advanced RLHF techniques with high-performance distributed architecture.

Features

High-performance RLHF training
Ray + vLLM acceleration
Support for PPO, GRPO, RLOO, DPO
Distributed architecture for large models
GPU resource sharing via Hybrid Engine

Use Cases

Training large language models (7B-70B+) with RLHF
Achieving 2x faster training compared to DeepSpeedChat
Leveraging multi-node GPU clusters for distributed training
Fine-tuning models with advanced RL algorithms in a unified framework

Non-Goals

General-purpose model fine-tuning outside of RLHF
Inference serving or deployment orchestration
Model architecture definition or modification

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

99 /100

Analyzed about 20 hours ago

Trust Signals

Last commit16 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

Ray Train

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

Skill

Orchestra-Research

Openrlhf Training

Skill

davila7

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill

Orchestra-Research

Verl Rl Training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill

Orchestra-Research

TorchTitan Distributed LLM Pretraining

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill

Orchestra-Research

Huggingface Accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

Skill

davila7