Hugging Face Local Models

Skill Verified Active

Use to select models to run locally with llama.cpp and GGUF on CPU, Mac Metal, CUDA, or ROCm. Covers finding GGUFs, quant selection, running servers, exact GGUF file lookup, conversion, and OpenAI-compatible local serving.

Purpose

To enable users to easily select and run local language models using llama.cpp and GGUF formats, covering model discovery, quantization, and serving.

Features

Find GGUF models on Hugging Face Hub
Select optimal quantization levels
Run models with llama-cli and llama-server
Convert models from Transformers to GGUF
Support for CPU, Metal, CUDA, and ROCm

Use Cases

Selecting the best GGUF model for your hardware
Running LLMs locally for privacy or cost savings
Experimenting with different model quantizations
Setting up an OpenAI-compatible local inference server

Non-Goals

Training or fine-tuning models
Managing Hugging Face Hub repositories directly (beyond downloading)
Providing a full GUI for model management

Workflow

Search Hugging Face Hub for llama.cpp-compatible GGUF models.
Identify the recommended quant and file from the model's page or tree API.
Install or ensure `llama.cpp` is available.
Launch the model using `llama-cli` or `llama-server` with appropriate flags.
Convert models from Transformers to GGUF if no pre-quantized version is available.

Prerequisites

llama.cpp installed
Python 3
Hugging Face Hub CLI (optional for authentication)

Documentation

info:Configuration & parameter referenceWhile the skill details model selection and launch commands, explicit documentation on configuration parameters for `llama-cli` or `llama-server` beyond basic flags is not provided.

Versioning

info:Release ManagementThere is no explicit versioning (e.g., semver in frontmatter, CHANGELOG, or release tags) for this skill; installation instructions reference the `main` branch.

Practical Utility

info:Edge casesWhile general guidance on quant choice and troubleshooting is provided, specific failure modes with symptoms and recovery steps are not extensively detailed.

Installation

/plugin install skills@huggingface-skills

Quality Score

Verified

95 /100

Analyzed about 16 hours ago

Trust Signals

Last commit2 days ago

GitHub owner huggingface

Stars10.5k

LicenseApache-2.0

Websitehuggingface.co

Status

View Source

Similar Extensions

GGUF Quantization

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without GPU requirements.

Skill

Orchestra-Research

Hugging Science

Use when the user is doing AI/ML work in a scientific domain — biology, chemistry, physics, astronomy, climate, genomics, materials science, medicine, ecology, energy, conservation, engineering, mathematics, scientific reasoning, drug discovery, protein design, weather modeling, theorem proving, single-cell, PDE solving, or anything similar. Hugging Science (huggingscience.co) is a curated catalog of scientific datasets, models, blog posts, and interactive Spaces; the `hugging-science` org on Hugging Face hosts community datasets, models, and demo Spaces. This skill helps you discover the right resource AND actually use it — loading datasets via `datasets`, running models via `transformers` or the HF Inference API, calling Spaces like BoltzGen via `gradio_client`, and citing blog posts for methodology. Trigger this skill whenever a user mentions a scientific ML task, asks for "a dataset/model for X" where X is a scientific topic, wants to fine-tune on scientific data, asks about protein / molecule / genome / climate / materials / astronomy / pathology / weather ML, or needs AI tools for research — even if they never say "Hugging Science" explicitly. The catalog is purpose-built for LLM agents (it ships an `llms-full.txt`); prefer it over generic web search for these tasks.

Skill

K-Dense-AI

Llama Cpp

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization (1.5-8 bit) for reduced memory and 4-10× speedup vs PyTorch on CPU.

Skill

Orchestra-Research

Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing models, datasets, spaces, buckets, repos, papers, jobs, and more on the Hugging Face Hub. Use when: handling authentication; managing local cache; managing Hugging Face Buckets; running or scheduling jobs on Hugging Face infrastructure; managing Hugging Face repos; discussions and pull requests; browsing models, datasets and spaces; reading, searching, or browsing academic papers; managing collections; querying datasets; configuring spaces; setting up webhooks; or deploying and managing HF Inference Endpoints. Make sure to use this skill whenever the user mentions 'hf', 'huggingface', 'Hugging Face', 'huggingface-cli', or 'hugging face cli', or wants to do anything related to the Hugging Face ecosystem and to AI and ML in general. Also use for cloud storage needs like training checkpoints, data pipelines, or agent traces. Use even if the user doesn't explicitly ask for a CLI command. Replaces the deprecated `huggingface-cli`.

Skill

huggingface

Hugging Face Local Models

Features

Use Cases

Non-Goals

Workflow

Prerequisites

Documentation

Versioning

Practical Utility

Quality Score

Trust Signals

Similar Extensions

GGUF Quantization

Hugging Science

Llama Cpp

GGUF Quantization

Llama Cpp

Hf Cli