Model Pruning
Skill Verifiziert AktivReduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.
To reduce LLM size and accelerate inference using techniques like Wanda and SparseGPT, enabling deployment on constrained hardware and efficient serving.
Funktionen
- Reduce model size by 40-60%
- Accelerate inference with hardware-friendly sparsity
- Deploy on constrained hardware
- Compress models without retraining (one-shot)
- Implement Wanda, SparseGPT, and N:M structured pruning
Anwendungsfälle
- Compressing LLMs for deployment on edge devices
- Achieving faster inference speeds on hardware accelerators
- Reducing memory footprint for efficient LLM serving
- Exploring state-of-the-art model pruning techniques
Nicht-Ziele
- Retraining models after pruning
- Providing a general-purpose model optimization suite
- Handling unstructured sparsity without hardware support for speedup
Practical Utility
- info:Edge casesThe SKILL.md names limitations like 'no retraining' and 'activation dependency' but does not detail specific failure modes with symptoms and recovery steps.
Execution
- info:ValidationWhile the code uses standard Python libraries, explicit schema validation for all inputs and outputs is not detailed in the documentation.
- info:Pinned dependenciesDependencies are listed, but specific version pinning or lockfiles are not explicitly shown in the documentation for the provided examples.
Code Execution
- info:Error HandlingPython scripts generally handle errors, but specific details on structured error reporting or fail-closed behavior for the pruning functions are not explicitly documented.
Installation
npx skills add davila7/claude-code-templatesFührt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.
Qualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Model Pruning
98Reduce LLM size and accelerate inference using pruning techniques like Wanda and SparseGPT. Use when compressing models without retraining, achieving 50% sparsity with minimal accuracy loss, or enabling faster inference on hardware accelerators. Covers unstructured pruning, structured pruning, N:M sparsity, magnitude pruning, and one-shot methods.
PyTorch Lightning
100Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.
Implementing Llms Litgpt
100Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.
ML Training Recipes
99Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.
Ray Train
99Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.
Pytorch Lightning
99High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.