Knowledge Distillation

Skill Verified Active

Compress large language models using knowledge distillation from teacher to student models. Use when deploying smaller models with retained performance, transferring GPT-4 capabilities to open-source models, or reducing inference costs. Covers temperature scaling, soft targets, reverse KLD, logit distillation, and MiniLLM training strategies.

Purpose

To enable users to compress large language models effectively, retaining performance while reducing size and inference costs, by providing practical guidance and code examples for knowledge distillation.

Features

Compress LLMs using knowledge distillation
Transfer capabilities from large to smaller models
Reduce inference costs
Implement temperature scaling, soft targets, and reverse KLD
Provide training scripts and evaluation methods

Use Cases

Compressing models from 70B to 7B parameters while retaining performance
Transferring capabilities from proprietary models (like GPT-4) to open-source alternatives
Reducing inference costs by deploying smaller, distilled models
Creating specialized models by distilling domain-specific knowledge

Non-Goals

Training models from scratch without a teacher model
Performing inference on distilled models (focus is on training)
Covering advanced MLOps deployment strategies beyond the training script

Execution

info:Pinned dependenciesDependencies are listed, but not strictly pinned with lockfiles within the SKILL.md, though standard package managers would typically handle this during installation.

Practical Utility

info:Edge casesThe skill touches on hyperparameters and model size ratios, which are related to effective application, but does not explicitly document failure modes or recovery steps for specific scenarios.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified

96 /100

Analyzed 1 day ago

Trust Signals

Last commit1 day ago

GitHub owner davila7

Stars27.2k

Downloads 23k

LicenseMIT

Websiteaitmpl.com

Status

View Source

Similar Extensions

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

Skill

K-Dense-AI

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

Skill

K-Dense-AI

Nnsight Remote Interpretability

Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive models (70B+) without local GPU resources, or when working with any PyTorch architecture.

Skill

davila7