Nanogpt

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture from scratch. Train on Shakespeare (CPU) or OpenWebText (multi-GPU).

Purpose

To provide a clean, hackable, and educational implementation of GPT-2 for users wanting to understand transformer models from scratch and experiment with training.

Features

Reproduces GPT-2 (124M) implementation
Clean, ~300-line Python code for learning
Supports training on Shakespeare (CPU) and OpenWebText (GPU)
Includes fine-tuning and custom dataset training workflows
Detailed explanations of architecture, training, and data preparation

Use Cases

Learning the fundamentals of transformer architectures
Experimenting with GPT model training from scratch
Understanding data preparation and tokenization for LLMs
Prototyping small-scale LLM training runs

Non-Goals

Production-ready deployment of large-scale models
Advanced optimization techniques beyond basic configurations
Integration with complex MLOps pipelines

Practical Utility

info:Production readinessWhile excellent for learning and experimentation, the skill's focus on a ~300-line implementation and reproduction of GPT-2 suggests it's not intended for production use cases requiring robustness and advanced features.

Execution

info:ValidationWhile the scripts handle input data and configuration, formal schema validation libraries are not explicitly used for all parameters.
info:Pinned dependenciesDependencies are listed, but specific version pinning via lockfiles is not explicitly detailed, relying on standard pip installation.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

95 /100

Analyzed 1 day ago

Trust Signals

Last commit17 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

NanoGPT

Skill

davila7

TorchTitan Distributed LLM Pretraining

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

Skill

Orchestra-Research

ML Training Recipes

Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.

Skill

Orchestra-Research

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill

Orchestra-Research

Distributed Llm Pretraining Torchtitan

Skill

davila7

Baoyu Comic

100

Knowledge comic creator supporting multiple art styles and tones. Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".

Skill

jimliu