Huggingface Llm Trainer
Skill Verifiziert AktivTrain or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.
Streamline and simplify the process of training and converting LLMs on cloud infrastructure, making advanced ML workflows accessible.
Funktionen
- Fine-tune LLMs using TRL or Unsloth
- Leverage Hugging Face Jobs infrastructure
- Supports SFT, DPO, GRPO, and Reward Modeling
- Convert models to GGUF format for local deployment
- Includes cost estimation and Trackio monitoring
Anwendungsfälle
- Fine-tune language models on cloud GPUs without local setup
- Align models with human preferences using DPO
- Convert trained models to GGUF for Ollama or LM Studio
- Optimize training for limited GPU memory with Unsloth
Nicht-Ziele
- Directly managing Hugging Face infrastructure (handled by `hf-cli`)
- Advanced distributed training setup beyond TRL's automatic handling
- Modifying the core TRL or Unsloth libraries
Installation
/plugin install skills@huggingface-skillsQualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Unsloth
100Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
Implementing Llms Litgpt
100Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.
TimesFM Forecasting
100Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.
Unsloth
98Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
Fine Tuning With Trl
96Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.
Chat Format
100Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval