此内容尚未提供您的语言版本,正在以英文显示。

Hqq Quantization

技能已验证活跃

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when deploying with vLLM or HuggingFace Transformers.

目的

To enable efficient LLM deployment by quantizing models to lower bit precision without calibration data, facilitating faster inference and reduced memory footprint.

功能

Calibration-free LLM quantization (4/3/2-bit)
Multiple optimized inference backends (Marlin, TorchAO, ATen, etc.)
Seamless integration with HuggingFace Transformers and vLLM
Support for fine-tuning quantized models with PEFT/LoRA
Fast quantization workflows (minutes vs. hours)

使用场景

Quantizing LLMs for faster inference without needing calibration datasets.
Reducing memory footprint of LLMs for deployment on resource-constrained environments.
Integrating quantized models into vLLM or HuggingFace Transformers pipelines.
Experimenting with extreme quantization levels (2-bit, 1-bit) for LLMs.

非目标

Performing calibration-based quantization (e.g., AWQ, GPTQ).
Providing CPU-focused quantization (refer to llama.cpp/GGUF).
Replacing simple 8-bit/4-bit quantization tools like bitsandbytes for basic use cases.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

96 /100

1 day ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

技能

davila7

Ray Train

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

技能

Orchestra-Research

Huggingface Accelerate

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

技能

davila7

Openrlhf Training

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

技能

Orchestra-Research