跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Training Llms Megatron

技能 已验证 活跃

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

目的

To enable users to effectively train large language models using NVIDIA Megatron-Core by providing detailed configurations, best practices, and advanced parallelism strategies for maximum GPU efficiency and scale.

功能

  • Trains LLMs from 2B to 462B parameters
  • Utilizes NVIDIA Megatron-Core framework
  • Implements advanced parallelism strategies (TP, PP, SP, CP, EP)
  • Optimizes for maximum GPU efficiency (e.g., 47% MFU on H100)
  • Provides production-ready configurations for LLaMA, Mixtral, Nemotron, and DeepSeek models

使用场景

  • Training models larger than 1B parameters
  • Needing maximum GPU efficiency (target >40% MFU)
  • Using NVIDIA GPUs (A100, H100)
  • Implementing fine-grained parallelism control for large-scale training

非目标

  • Training models smaller than 1B parameters
  • Using non-NVIDIA GPUs
  • Prototyping or educational purposes for very small models
  • Simple model fine-tuning without distributed strategies

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
97 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标27.2k
许可证MIT
状态
查看源代码

类似扩展

Pytorch Lightning

99

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

技能
Orchestra-Research

Megatron Core LLM Training

95

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

技能
Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

技能
ruvnet

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

技能
Yeachan-Heo

Wrap Up Ritual

100

End-of-session ritual that audits changes, runs quality checks, captures learnings, and produces a session summary. Use when saying "wrap up", "done for the day", "finish coding", or ending a coding session.

技能
rohitg00

Project Development

100

This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.

技能
muratcankoylan