Training Llms Megatron

Skill Verified Active

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on H100), or require tensor/pipeline/sequence/context/expert parallelism. Production-ready framework used for Nemotron, LLaMA, DeepSeek.

Purpose

To enable users to effectively train large language models using NVIDIA Megatron-Core by providing detailed configurations, best practices, and advanced parallelism strategies for maximum GPU efficiency and scale.

Features

Trains LLMs from 2B to 462B parameters
Utilizes NVIDIA Megatron-Core framework
Implements advanced parallelism strategies (TP, PP, SP, CP, EP)
Optimizes for maximum GPU efficiency (e.g., 47% MFU on H100)
Provides production-ready configurations for LLaMA, Mixtral, Nemotron, and DeepSeek models

Use Cases

Training models larger than 1B parameters
Needing maximum GPU efficiency (target >40% MFU)
Using NVIDIA GPUs (A100, H100)
Implementing fine-grained parallelism control for large-scale training

Non-Goals

Training models smaller than 1B parameters
Using non-NVIDIA GPUs
Prototyping or educational purposes for very small models
Simple model fine-tuning without distributed strategies

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified

97 /100

Analyzed 1 day ago

Trust Signals

Last commit1 day ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill

Orchestra-Research

Megatron Core LLM Training

Skill

Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

Skill

ruvnet

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

Skill

Yeachan-Heo

Wrap Up Ritual

100

End-of-session ritual that audits changes, runs quality checks, captures learnings, and produces a session summary. Use when saying "wrap up", "done for the day", "finish coding", or ending a coding session.

Skill

rohitg00

Project Development

100

This skill should be used when the user asks to "start an LLM project", "design batch pipeline", "evaluate task-model fit", "structure agent project", or mentions pipeline architecture, agent-assisted development, cost estimation, or choosing between LLM and traditional approaches.

Skill

muratcankoylan