Skip to main content

Training Llms Megatron

Skill Verified Active

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

Purpose

To enable users to effectively train large language models using NVIDIA Megatron-Core by providing detailed configurations, best practices, and advanced parallelism strategies for maximum GPU efficiency and scale.

Features

  • Trains LLMs from 2B to 462B parameters
  • Utilizes NVIDIA Megatron-Core framework
  • Implements advanced parallelism strategies (TP, PP, SP, CP, EP)
  • Optimizes for maximum GPU efficiency (e.g., 47% MFU on H100)
  • Provides production-ready configurations for LLaMA, Mixtral, Nemotron, and DeepSeek models

Use Cases

  • Training models larger than 1B parameters
  • Needing maximum GPU efficiency (target >40% MFU)
  • Using NVIDIA GPUs (A100, H100)
  • Implementing fine-grained parallelism control for large-scale training

Non-Goals

  • Training models smaller than 1B parameters
  • Using non-NVIDIA GPUs
  • Prototyping or educational purposes for very small models
  • Simple model fine-tuning without distributed strategies

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
97 /100
Analyzed 1 day ago

Trust Signals

Last commit1 day ago
Stars27.2k
LicenseMIT
Status
View Source

Similar Extensions

Pytorch Lightning

99

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill
Orchestra-Research

Megatron Core LLM Training

95

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

Skill
Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

Skill
ruvnet

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

Skill
Yeachan-Heo

Wrap Up Ritual

100

End-of-session ritual that audits changes, runs quality checks, captures learnings, and produces a session summary. Use when saying "wrap up", "done for the day", "finish coding", or ending a coding session.

Skill
rohitg00

Project Development

100

This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.

Skill
muratcankoylan

© 2025 SkillRepo · Find the right skill, skip the noise.