Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Fine Tuning with TRL

Skill Verifiziert Aktiv

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Zweck

To enable users to fine-tune LLMs using advanced reinforcement learning techniques with TRL for instruction tuning, preference alignment, and reward optimization.

Funktionen

  • Supervised Fine-Tuning (SFT) for instruction tuning
  • Direct Preference Optimization (DPO) for preference alignment
  • Proximal Policy Optimization (PPO) for reward optimization
  • Group Relative Policy Optimization (GRPO) for memory-efficient RL
  • Reward model training for RLHF pipelines
  • Code examples for various fine-tuning workflows

Anwendungsfälle

  • Aligning LLMs with human preferences using preference data
  • Training LLMs to follow instructions effectively
  • Optimizing LLM policies with reinforcement learning
  • Building full RLHF pipelines

Nicht-Ziele

  • Basic fine-tuning without RL (use HuggingFace Trainer)
  • YAML-based training configuration (use Axolotl)
  • Educational, minimal fine-tuning (use LitGPT)
  • Fast LoRA training (use Unsloth)

Voraussetzungen

  • Python environment
  • HuggingFace Transformers and TRL libraries
  • PyTorch (or other compatible deep learning framework)
  • GPU with sufficient VRAM (depending on model size and method)

Documentation

  • info:Configuration & parameter referenceWhile the SKILL.md provides configuration examples for various TRL methods (e.g., SFTConfig, DPOConfig), it doesn't explicitly document all possible parameters or their default values for each configuration object. Specific hyperparameter tables offer guidance but lack comprehensive documentation.

Code Execution

  • info:Error HandlingThe provided code examples demonstrate standard Python library usage; while the underlying TRL library likely has robust error handling, the examples themselves do not explicitly showcase custom error handling or structured error reporting for user-facing errors.

Execution

  • info:Pinned dependenciesThe SKILL.md lists dependencies but does not specify pinned versions or provide a lockfile. The README suggests installation via `pip`, which can install latest versions by default without pinning.

Practical Utility

  • info:Edge casesThe skill addresses common issues like OOM errors and poor alignment quality in DPO training, offering specific parameter adjustments and recovery steps. However, it doesn't systematically list all potential failure modes and recovery steps for every scenario.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
95 /100
Analysiert about 18 hours ago

Vertrauenssignale

Letzter Commitabout 20 hours ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Huggingface Llm Trainer

99

Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

Skill
huggingface

Fine Tuning With Trl

96

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill
Orchestra-Research

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill
davila7

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill
davila7

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

Skill
K-Dense-AI

Agentdb Learning

99

Create and train AI learning plugins with AgentDB's 9 reinforcement learning algorithms. Includes Decision Transformer, Q-Learning, SARSA, Actor-Critic, and more. Use when building self-learning agents, implementing RL, or optimizing agent behavior through experience.

Skill
ruvnet