Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

AWQ Quantization

Skill Verifiziert Aktiv

Teil von:Agent Native Research Artifact (ARA) Tooling

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster inference than GPTQ with better accuracy preservation, or for instruction-tuned and multimodal models. MLSys 2024 Best Paper Award winner.

Zweck

To enable efficient deployment of large language models on resource-constrained hardware by compressing model weights with minimal performance degradation.

Funktionen

Activation-aware weight quantization for 4-bit LLMs
Minimal accuracy loss (<5%)
Significant inference speedup (~2.5-3x)
Support for various kernel backends (GEMM, GEMV, Marlin, ExLlama, IPEX)
Integration with HuggingFace Transformers and vLLM
Custom calibration data for domain-specific models

Anwendungsfälle

Deploying large models (7B-70B) on limited GPU memory
Achieving faster inference than GPTQ with better accuracy preservation
Quantizing instruction-tuned and multimodal models
Optimizing LLM serving for production environments

Nicht-Ziele

Providing a general-purpose LLM training framework
Replacing fine-tuning or other model adaptation techniques
Supporting quantization methods other than 4-bit AWQ

Workflow

Load model and tokenizer
Define quantization configuration (bits, group size, kernel version)
Quantize the model using calibration data
Save the quantized model and tokenizer
Load and use the quantized model for inference

Praktiken

Model Optimization
Quantization Techniques
LLM Deployment

Voraussetzungen

Python 3.8+
CUDA 11.8+ (for NVIDIA GPUs)
Compute Capability 7.5+ GPU (NVIDIA Turing or newer)
transformers>=4.45.0
torch>=2.0.0

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert

95 /100

Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago

GitHub-Inhaber Orchestra-Research

Sterne8.3k

Downloads 0

LizenzMIT

Websiteorchestra-research.com

Status

Quellcode ansehen

Ähnliche Erweiterungen

Wrap Up Ritual

100

End-of-session ritual that audits changes, runs quality checks, captures learnings, and produces a session summary. Use when saying "wrap up", "done for the day", "finish coding", or ending a coding session.

Skill

rohitg00

TradeMemory Protocol

100

Domänenwissen für die Evolution Engine — LLM-gestützte autonome Strategieentdeckung aus rohen OHLCV-Daten. Behandelt die Schleife Generieren-Backtesten-Auswählen-Entwickeln, vektorisiertes Backtesting, Out-of-Sample-Validierung und Strategiegraduierung. Verwenden Sie es beim Entdecken von Handelspatterns, Ausführen von Backtests, Entwickeln von Strategien oder Überprüfen von Evolutionsprotokollen. Löst aus bei "evolve", "discover patterns", "backtest", "evolution", "strategy generation", "candidate strategy".

Skill

mnemox-ai

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill

github

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill

davila7

Prompt Optimization

100

Wendet Prompt-Wiederholung an, um die Genauigkeit für LLMs ohne Schlussfolgerungsfähigkeit zu verbessern

Skill

asklokesh

Vector Index Tuning

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

Skill

wshobson