跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Openrlhf Training

技能 已验证 活跃

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

目的

To enable efficient and scalable training of large language models using advanced RLHF techniques with high-performance distributed architecture.

功能

  • High-performance RLHF training
  • Ray + vLLM acceleration
  • Support for PPO, GRPO, RLOO, DPO
  • Distributed architecture for large models
  • GPU resource sharing via Hybrid Engine

使用场景

  • Training large language models (7B-70B+) with RLHF
  • Achieving 2x faster training compared to DeepSpeedChat
  • Leveraging multi-node GPU clusters for distributed training
  • Fine-tuning models with advanced RL algorithms in a unified framework

非目标

  • General-purpose model fine-tuning outside of RLHF
  • Inference serving or deployment orchestration
  • Model architecture definition or modification

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证
99 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标8.3k
许可证MIT
状态
查看源代码

类似扩展

Ray Train

99

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

技能
Orchestra-Research

Openrlhf Training

97

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

技能
davila7

Pytorch Lightning

99

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

技能
Orchestra-Research

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能
Orchestra-Research

TorchTitan Distributed LLM Pretraining

99

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

技能
Orchestra-Research

Huggingface Accelerate

99

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

技能
davila7