NanoGPT

Skill Active

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

Purpose

To provide a clear, concise, and hackable implementation of the GPT-2 architecture for educational purposes, enabling users to understand transformer models from scratch.

Features

Minimalist GPT-2 (124M) implementation
Reproduces GPT-2 on OpenWebText
Clean, hackable code for learning transformers
Supports training on CPU (Shakespeare) or multi-GPU (OpenWebText)
Includes example configurations and data preparation scripts

Use Cases

Learning transformer architecture from scratch
Experimenting with GPT model components
Teaching or understanding deep learning models
Prototyping new transformer ideas

Non-Goals

Production-ready deployment of LLMs
State-of-the-art performance benchmarks
Large-scale distributed training beyond 8 GPUs
Complex model tuning for specific applications

Workflow

Prepare data (e.g., Shakespeare or OpenWebText)
Configure training parameters
Train the model
Generate text from the trained model

Practices

Model Architecture
Transformer Implementation
Educational Code

Prerequisites

Python 3.8+
PyTorch
torch, numpy, transformers, datasets, tiktoken, wandb, tqdm

Practical Utility

info:Production readinessWhile the code is clean and educational, it is presented as an educational tool rather than a production-ready system. Training large models like GPT-2 requires significant computational resources not typically available for immediate production use.

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

87 /100

Analyzed about 2 months ago

Trust Signals

Last commitabout 2 months ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

Nanogpt

Skill

Orchestra-Research

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

Skill

K-Dense-AI

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill

Orchestra-Research

Nnsight Remote Interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

Skill

davila7

Huggingface Accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

Skill

davila7

TorchTitan Distributed LLM Pretraining

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill

Orchestra-Research