Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Torchforge

Skill Aktiv

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

Zweck

To enable researchers and engineers to conduct PyTorch-native agentic RL experiments with clean abstractions, easy algorithm implementation, and scalable distributed training capabilities.

Funktionen

  • PyTorch-native RL abstractions
  • Separation of RL algorithms from infrastructure
  • Scalable training with Monarch and TorchTitan
  • Easy algorithm experimentation (GRPO, SFT examples)
  • High-throughput inference with vLLM

Anwendungsfälle

  • When needing clean RL abstractions independent of infrastructure
  • When experimenting with new RL algorithms in PyTorch
  • For scalable training of RL models using distributed systems like Monarch
  • When integrating with Meta's TorchTitan for model parallelism

Nicht-Ziele

  • Production-ready stability (considered experimental)
  • Megatron-native training (use alternative skills)
  • Replacing fully mature RL frameworks for production deployment

Workflow

  1. Define Configuration (YAML)
  2. Define Reward Function (Python)
  3. Launch Training (Python script)
  4. Monitor Progress (W&B, metrics)

Praktiken

  • RL Algorithm Implementation
  • Distributed Training
  • Model Experimentation
  • Scalable ML Systems

Voraussetzungen

  • Python 3.12+
  • 3+ GPUs recommended for GRPO training
  • PyTorch >= 2.9.0 (nightly)
  • Monarch, TorchTitan, vLLM

Practical Utility

  • warning:Production readinessThe SKILL.md explicitly states that torchforge is experimental and APIs may change, suggesting it's not fully production-ready for all use cases, although it covers the stated research purpose.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

96 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Pytorch Lightning

99

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill
Orchestra-Research

Huggingface Accelerate

99

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

Skill
davila7

Ray Train

99

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

Skill
Orchestra-Research

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
Orchestra-Research

Openrlhf Training

99

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Skill
Orchestra-Research

TorchTitan Distributed LLM Pretraining

99

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill
Orchestra-Research