Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Fine Tuning With Trl

Skill Verifiziert Aktiv

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Zweck

To enable users to fine-tune LLMs using various reinforcement learning methods and align them with human preferences or specific tasks.

Funktionen

  • Supervised Fine-Tuning (SFT) for instruction tuning
  • Direct Preference Optimization (DPO) for preference alignment
  • Proximal Policy Optimization (PPO) for reward optimization
  • Group Relative Policy Optimization (GRPO) for memory-efficient RL
  • Reward model training for RLHF pipelines
  • Detailed workflows and code examples for each method

Anwendungsfälle

  • Aligning LLMs with human preferences using preference data
  • Training instruction-following models
  • Performing full RLHF pipelines
  • Optimizing LLMs with minimal memory using GRPO

Nicht-Ziele

  • Basic fine-tuning without RL methods
  • Providing a GUI for training configuration
  • Hyperparameter optimization beyond standard guidance

Execution

  • info:Pinned dependenciesDependencies are listed in SKILL.md but not pinned with versions or accompanied by a lockfile, which could lead to compatibility issues.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert
96 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Grpo Rl Training

95

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

Skill
Orchestra-Research

Verl Rl Training

95

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
davila7

Grpo Rl Training

76

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

Skill
davila7

Huggingface Llm Trainer

99

Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

Skill
huggingface

Openrlhf Training

99

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.

Skill
Orchestra-Research

Verl Rl Training

99

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
Orchestra-Research