Skip to main content

Model Pruning

Skill Verified Active

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

Purpose

To reduce LLM size and accelerate inference using techniques like Wanda and SparseGPT, enabling deployment on constrained hardware and efficient serving.

Features

  • Reduce model size by 40-60%
  • Accelerate inference with hardware-friendly sparsity
  • Deploy on constrained hardware
  • Compress models without retraining (one-shot)
  • Implement Wanda, SparseGPT, and N:M structured pruning

Use Cases

  • Compressing LLMs for deployment on edge devices
  • Achieving faster inference speeds on hardware accelerators
  • Reducing memory footprint for efficient LLM serving
  • Exploring state-of-the-art model pruning techniques

Non-Goals

  • Retraining models after pruning
  • Providing a general-purpose model optimization suite
  • Handling unstructured sparsity without hardware support for speedup

Practical Utility

  • info:Edge casesThe SKILL.md names limitations like 'no retraining' and 'activation dependency' but does not detail specific failure modes with symptoms and recovery steps.

Execution

  • info:ValidationWhile the code uses standard Python libraries, explicit schema validation for all inputs and outputs is not detailed in the documentation.
  • info:Pinned dependenciesDependencies are listed, but specific version pinning or lockfiles are not explicitly shown in the documentation for the provided examples.

Code Execution

  • info:Error HandlingPython scripts generally handle errors, but specific details on structured error reporting or fail-closed behavior for the pruning functions are not explicitly documented.

Installation

npx skills add davila7/claude-code-templates

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
95 /100
Analyzed 1 day ago

Trust Signals

Last commit1 day ago
Stars27.2k
LicenseMIT
Status
View Source

Similar Extensions

Model Pruning

98

Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.

Skill
Orchestra-Research

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

Skill
K-Dense-AI

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill
davila7

ML Training Recipes

99

Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.

Skill
Orchestra-Research

Ray Train

99

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

Skill
Orchestra-Research

Pytorch Lightning

99

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.

Skill
Orchestra-Research

© 2025 SkillRepo · Find the right skill, skip the noise.