Nanogpt
Skill Verified ActiveEducational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).
To provide a clean, hackable, and educational implementation of GPT-2 for users wanting to understand transformer models from scratch and experiment with training.
Features
- Reproduces GPT-2 (124M) implementation
- Clean, ~300-line Python code for learning
- Supports training on Shakespeare (CPU) and OpenWebText (GPU)
- Includes fine-tuning and custom dataset training workflows
- Detailed explanations of architecture, training, and data preparation
Use Cases
- Learning the fundamentals of transformer architectures
- Experimenting with GPT model training from scratch
- Understanding data preparation and tokenization for LLMs
- Prototyping small-scale LLM training runs
Non-Goals
- Production-ready deployment of large-scale models
- Advanced optimization techniques beyond basic configurations
- Integration with complex MLOps pipelines
Practical Utility
- info:Production readinessWhile excellent for learning and experimentation, the skill's focus on a ~300-line implementation and reproduction of GPT-2 suggests it's not intended for production use cases requiring robustness and advanced features.
Execution
- info:ValidationWhile the scripts handle input data and configuration, formal schema validation libraries are not explicitly used for all parameters.
- info:Pinned dependenciesDependencies are listed, but specific version pinning via lockfiles is not explicitly detailed, relying on standard pip installation.
Installation
First, add the marketplace
/plugin marketplace add Orchestra-Research/AI-Research-SKILLs/plugin install AI-Research-SKILLs@ai-research-skillsQuality Score
VerifiedTrust Signals
Similar Extensions
NanoGPT
87Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).
TorchTitan Distributed LLM Pretraining
99Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
ML Training Recipes
99Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.
Pytorch Lightning
99High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Distributed Llm Pretraining Torchtitan
98Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.
Baoyu Comic
100Knowledge comic creator supporting multiple art styles and tones. Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".