Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

TorchTitan Distributed LLM Pretraining

Skill Verifiziert Aktiv

Teil von:Agent Native Research Artifact (ARA) Tooling

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Zweck

Enables efficient and scalable pretraining of large language models using PyTorch's native distributed training capabilities.

Funktionen

4D parallelism (FSDP2, TP, PP, CP)
PyTorch-native distributed training
Float8 training for H100 GPUs
Support for Llama 3.1, DeepSeek V3, and custom models
Distributed checkpointing and interoperability

Anwendungsfälle

Pretraining LLMs from scratch at scale (8 to 512+ GPUs)
Leveraging PyTorch-native solutions for distributed training
Optimizing training performance with Float8 on H100 GPUs
Achieving interoperable checkpoints with torchtune/HuggingFace

Nicht-Ziele

Fine-tuning LLMs (focus is pretraining)
Providing a solution without PyTorch or its ecosystem
Achieving maximum performance on NVIDIA-only deployments (vs. Megatron-LM)
Offering inference support (focus is training)

Workflow

Download tokenizer
Configure training (TOML file)
Launch training (script or torchrun)
Monitor training (TensorBoard)
Manage checkpoints

Praktiken

Model Architecture
Distributed Training
Optimization
LLM Pretraining

Voraussetzungen

PyTorch >= 2.6.0
TorchTitan >= 0.2.0
TorchAO >= 0.5.0
HuggingFace token for asset download

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert

99 /100

Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago

GitHub-Inhaber Orchestra-Research

Sterne8.3k

Downloads 0

LizenzMIT

Websiteorchestra-research.com

Status

Quellcode ansehen

TorchTitan Distributed LLM Pretraining

Funktionen

Anwendungsfälle

Nicht-Ziele

Workflow

Praktiken

Voraussetzungen

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Distributed Llm Pretraining Torchtitan

Ray Train

Pytorch Lightning

Openrlhf Training

Huggingface Accelerate

HuggingFace Accelerate