此内容尚未提供您的语言版本,正在以英文显示。

Openrlhf Training

技能已验证活跃

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

目的

To enable efficient and high-performance Reinforcement Learning from Human Feedback (RLHF) training for large language models using a distributed architecture with advanced acceleration techniques.

功能

High-performance RLHF training framework
Support for PPO, GRPO, RLOO, DPO algorithms
Ray + vLLM acceleration for large models (7B-70B+)
Distributed architecture with multi-node GPU cluster support
Hybrid Engine for GPU resource sharing

使用场景

Training large language models with RLHF
Fine-tuning models on custom reward functions
Leveraging distributed computing for faster training
Accelerating inference during RLHF rollout phases

非目标

Single-node or basic model fine-tuning
Environments without GPU acceleration capabilities
Inference-only model serving outside of the training loop

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

97 /100

about 18 hours ago 分析

信任信号

最近提交about 20 hours ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Openrlhf Training

技能

Orchestra-Research

Verl Rl Training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能

Orchestra-Research

Moe Training

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.

技能

davila7

Ray Data

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

技能

Orchestra-Research

Verl Rl Training

技能

davila7

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

技能

K-Dense-AI