Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Llama Cpp

Skill Verifiziert Aktiv

Teil von:Agent Native Research Artifact (ARA) Tooling

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Zweck

To enable cost-effective and accessible LLM inference on diverse consumer hardware, including edge devices and Macs, where high-end GPUs are unavailable or undesirable.

Funktionen

LLM inference on CPU, Apple Silicon, and consumer GPUs
Support for GGUF quantization (1.5-8 bit)
4-10x speedup vs PyTorch on CPU
OpenAI-compatible server mode
Hardware acceleration (Metal, CUDA, ROCm)

Anwendungsfälle

Edge device LLM deployment
Running LLMs on M1/M2/M3 Macs
Inference on AMD or Intel GPUs
Development environments where CUDA is unavailable

Nicht-Ziele

Training LLMs
Utilizing NVIDIA GPUs with CUDA (use TensorRT-LLM instead)
Providing a Python-first API for NVIDIA GPUs (use vLLM instead)

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert

95 /100

Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago

GitHub-Inhaber Orchestra-Research

Sterne8.3k

Downloads 0

LizenzMIT

Websiteorchestra-research.com

Status

Quellcode ansehen

Llama Cpp

Funktionen

Anwendungsfälle

Nicht-Ziele

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Llama Cpp

GGUF Quantization

GGUF Quantization

VLLM High Performance LLM Serving

Hugging Face Local Models

Cli Anything Quietshrink