Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Quantizing Models Bitsandbytes

Skill Verifiziert Aktiv

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

Zweck

Quantize LLMs to reduce memory usage by 50-75% with minimal accuracy loss, enabling larger models on limited hardware and faster inference.

Funktionen

  • Quantizes LLMs to 8-bit or 4-bit
  • Supports INT8, NF4, FP4 formats
  • Enables QLoRA training
  • Integrates with HuggingFace Transformers
  • Reduces memory by 50-75%

Anwendungsfälle

  • Fitting larger models into limited GPU memory
  • Achieving faster LLM inference speeds
  • Fine-tuning large models on consumer GPUs with QLoRA
  • Reducing optimizer memory during training with 8-bit optimizers

Nicht-Ziele

  • Replacing advanced inference optimization frameworks like GPTQ or AWQ
  • Providing CPU-only inference solutions like GGUF
  • Supporting hardware without tensor core acceleration

Trust

  • info:Issues Attention17 issues opened and 4 closed in the last 90 days indicates a closure rate below 50% with a moderate number of open issues.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
95 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Quantizing Models Bitsandbytes

97

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4, FP4 formats, QLoRA training, and 8-bit optimizers. Works with HuggingFace Transformers.

Skill
Orchestra-Research

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill
github

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill
davila7

Prompt Optimization

100

Wendet Prompt-Wiederholung an, um die Genauigkeit für LLMs ohne Schlussfolgerungsfähigkeit zu verbessern

Skill
asklokesh

Vector Index Tuning

99

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

Skill
wshobson

Transformers

98

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

Skill
K-Dense-AI