此内容尚未提供您的语言版本,正在以英文显示。

Huggingface Llm Trainer

技能已验证活跃

Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

目的

Streamline and simplify the process of training and converting LLMs on cloud infrastructure, making advanced ML workflows accessible.

功能

Fine-tune LLMs using TRL or Unsloth
Leverage Hugging Face Jobs infrastructure
Supports SFT, DPO, GRPO, and Reward Modeling
Convert models to GGUF format for local deployment
Includes cost estimation and Trackio monitoring

使用场景

Fine-tune language models on cloud GPUs without local setup
Align models with human preferences using DPO
Convert trained models to GGUF for Ollama or LM Studio
Optimize training for limited GPU memory with Unsloth

非目标

Directly managing Hugging Face infrastructure (handled by `hf-cli`)
Advanced distributed training setup beyond TRL's automatic handling
Modifying the core TRL or Unsloth libraries

安装

/plugin install skills@huggingface-skills

质量评分

已验证

99 /100

1 day ago 分析

信任信号

最近提交2 days ago

GitHub 所有者 huggingface

星标10.5k

许可证Apache-2.0

网站huggingface.co

状态

查看源代码

类似扩展

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

技能

davila7

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

技能

davila7

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

技能

K-Dense-AI

Unsloth

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

技能

Orchestra-Research

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

技能

Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

技能

ruvnet