Training Llms Megatron
技能 已验证 活跃Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.
To enable users to effectively train large language models using NVIDIA Megatron-Core by providing detailed configurations, best practices, and advanced parallelism strategies for maximum GPU efficiency and scale.
功能
- Trains LLMs from 2B to 462B parameters
- Utilizes NVIDIA Megatron-Core framework
- Implements advanced parallelism strategies (TP, PP, SP, CP, EP)
- Optimizes for maximum GPU efficiency (e.g., 47% MFU on H100)
- Provides production-ready configurations for LLaMA, Mixtral, Nemotron, and DeepSeek models
使用场景
- Training models larger than 1B parameters
- Needing maximum GPU efficiency (target >40% MFU)
- Using NVIDIA GPUs (A100, H100)
- Implementing fine-grained parallelism control for large-scale training
非目标
- Training models smaller than 1B parameters
- Using non-NVIDIA GPUs
- Prototyping or educational purposes for very small models
- Simple model fine-tuning without distributed strategies
安装
npx skills add davila7/claude-code-templates通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。
质量评分
已验证类似扩展
Pytorch Lightning
99High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Megatron Core LLM Training
95Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.
Chat Format
100Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval
Oh My Claudecode
100Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly
Wrap Up Ritual
100End-of-session ritual that audits changes, runs quality checks, captures learnings, and produces a session summary. Use when saying "wrap up", "done for the day", "finish coding", or ending a coding session.
Project Development
100This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.