Huggingface Llm Trainer

Skill Verified Active

Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

Purpose

Streamline and simplify the process of training and converting LLMs on cloud infrastructure, making advanced ML workflows accessible.

Features

Fine-tune LLMs using TRL or Unsloth
Leverage Hugging Face Jobs infrastructure
Supports SFT, DPO, GRPO, and Reward Modeling
Convert models to GGUF format for local deployment
Includes cost estimation and Trackio monitoring

Use Cases

Fine-tune language models on cloud GPUs without local setup
Align models with human preferences using DPO
Convert trained models to GGUF for Ollama or LM Studio
Optimize training for limited GPU memory with Unsloth

Non-Goals

Directly managing Hugging Face infrastructure (handled by `hf-cli`)
Advanced distributed training setup beyond TRL's automatic handling
Modifying the core TRL or Unsloth libraries

Installation

/plugin install skills@huggingface-skills

Quality Score

Verified

99 /100

Analyzed about 16 hours ago

Trust Signals

Last commit2 days ago

GitHub owner huggingface

Stars10.5k

LicenseApache-2.0

Websitehuggingface.co

Status

View Source

Similar Extensions

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill

davila7

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill

davila7

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

Skill

K-Dense-AI

Unsloth

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill

Orchestra-Research

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill

Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

Skill

ruvnet