此内容尚未提供您的语言版本,正在以英文显示。

Openrlhf Training

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

目的

To enable efficient and scalable training of large language models using advanced RLHF techniques with high-performance distributed architecture.

功能

High-performance RLHF training
Ray + vLLM acceleration
Support for PPO, GRPO, RLOO, DPO
Distributed architecture for large models
GPU resource sharing via Hybrid Engine

使用场景

Training large language models (7B-70B+) with RLHF
Achieving 2x faster training compared to DeepSpeedChat
Leveraging multi-node GPU clusters for distributed training
Fine-tuning models with advanced RL algorithms in a unified framework

非目标

General-purpose model fine-tuning outside of RLHF
Inference serving or deployment orchestration
Model architecture definition or modification

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

99 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Ray Train

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

技能

Orchestra-Research

Openrlhf Training

技能

davila7

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

技能

Orchestra-Research

Verl Rl Training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能

Orchestra-Research

TorchTitan Distributed LLM Pretraining

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

技能

Orchestra-Research

Huggingface Accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

技能

davila7