Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Llama Cpp

Skill Aktiv

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Zweck

To enable efficient and accessible LLM inference on hardware lacking NVIDIA GPUs, making local and edge LLM deployments feasible.

Funktionen

CPU-only inference
Apple Silicon (M1/M2/M3) optimization
AMD/Intel GPU support (non-CUDA)
GGUF quantization (1.5-8 bit)
OpenAI-compatible API server mode

Anwendungsfälle

Running LLMs on personal Macs or Linux machines
Edge deployments on resource-constrained devices
Local LLM development and testing without GPU hardware
Utilizing models when CUDA is unavailable

Nicht-Ziele

Maximizing throughput on high-end NVIDIA GPUs
Providing a Python-first API like vLLM or TensorRT-LLM
Managing cloud infrastructure for LLM serving

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate (approx. 23.5%) and potentially slow maintainer response.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

85 /100

Analysiert about 23 hours ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber davila7

Sterne27.2k

Downloads 23k

LizenzMIT

Websiteaitmpl.com

Status

Quellcode ansehen

Llama Cpp

Funktionen

Anwendungsfälle

Nicht-Ziele

Trust

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Llama Cpp

GGUF Quantization

GGUF Quantization

VLLM High Performance LLM Serving

Hugging Face Local Models

Cli Anything Quietshrink