此内容尚未提供您的语言版本,正在以英文显示。

Training Llms Megatron

技能已验证活跃

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

目的

To enable users to effectively train large language models using NVIDIA Megatron-Core by providing detailed configurations, best practices, and advanced parallelism strategies for maximum GPU efficiency and scale.

功能

Trains LLMs from 2B to 462B parameters
Utilizes NVIDIA Megatron-Core framework
Implements advanced parallelism strategies (TP, PP, SP, CP, EP)
Optimizes for maximum GPU efficiency (e.g., 47% MFU on H100)
Provides production-ready configurations for LLaMA, Mixtral, Nemotron, and DeepSeek models

使用场景

Training models larger than 1B parameters
Needing maximum GPU efficiency (target >40% MFU)
Using NVIDIA GPUs (A100, H100)
Implementing fine-grained parallelism control for large-scale training

非目标

Training models smaller than 1B parameters
Using non-NVIDIA GPUs
Prototyping or educational purposes for very small models
Simple model fine-tuning without distributed strategies

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

97 /100

1 day ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

技能

Orchestra-Research

Megatron Core LLM Training

技能

Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

技能

ruvnet

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

技能

Yeachan-Heo

Wrap Up Ritual

100

End-of-session ritual that audits changes, runs quality checks, captures learnings, and produces a session summary. Use when saying "wrap up", "done for the day", "finish coding", or ending a coding session.

技能

rohitg00

Project Development

100

This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.

技能

muratcankoylan