ML Training Recipes

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.

Purpose

To offer expert-level, production-ready PyTorch training patterns and debugging strategies, enabling users to efficiently train and fine-tune neural networks.

Features

PyTorch training recipes for LLMs, vision, diffusion, and biomedical domains
Covers training loops, optimizer selection (AdamW, Muon), and LR scheduling
Includes mixed precision, debugging techniques, and systematic experimentation patterns
Provides reference files for detailed architecture, scaling laws, and optimizer configurations

Use Cases

Training or fine-tuning neural networks with PyTorch
Debugging common training issues like loss spikes or out-of-memory errors
Selecting appropriate model architectures and optimizers for specific data types and scales
Optimizing GPU throughput and resource utilization during training

Non-Goals

Providing pre-trained models
Handling deployment or inference-specific optimizations
Offering recipes for frameworks other than PyTorch

Workflow

Understand data type and scale
Select appropriate architecture based on decision trees
Configure optimizer and LR schedule
Implement training loop with mixed precision and EMA
Debug issues using provided checklists and patterns
Track experiments systematically for comparison

Practices

Code Quality
Reproducibility
Best Practices

Prerequisites

PyTorch (>=2.0.0)
Python environment with necessary libraries (e.g., transformers, torchvision, monai, etc.)

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

99 /100

Analyzed about 19 hours ago

Trust Signals

Last commit16 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill

github

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill

davila7

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill

davila7

Prompt Optimization

100

Applies prompt repetition to improve accuracy for non-reasoning LLMs

Skill

asklokesh

Pytorch Lightning

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill

Orchestra-Research

Open Targets Platform Query Skill

100

Query Open Targets Platform for target-disease associations, drug target discovery, tractability/safety data, genetics/omics evidence, known drugs, for therapeutic target identification. Part of the AlterLab Academic Skills suite.

Skill

AlterLab-IEU