Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Grpo Rl Training

Skill Aktiv

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

Zweck

To empower users to fine-tune language models for specific tasks like enforcing output formats, teaching verifiable tasks, improving reasoning, and aligning models to domain-specific behaviors, particularly when custom reward signals are needed.

Funktionen

  • Expert guidance on GRPO algorithm fundamentals
  • Detailed implementation workflow (dataset, rewards, training, deployment)
  • Production-ready training templates and code examples
  • Extensive library of customizable reward functions
  • Hyperparameter tuning advice and troubleshooting guide

Anwendungsfälle

  • Enforcing specific output formats (XML, JSON)
  • Teaching verifiable tasks with objective correctness metrics
  • Improving reasoning capabilities through chain-of-thought rewards
  • Aligning models to domain-specific behaviors without preference data

Nicht-Ziele

  • Simple supervised fine-tuning tasks
  • Tasks without clear reward signals
  • Scenarios where high-quality preference pairs are already available (DPO/PPO are better)

Trust

  • warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

76 /100
Analysiert about 18 hours ago

Vertrauenssignale

Letzter Commitabout 20 hours ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Fine Tuning With Trl

96

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill
Orchestra-Research

Grpo Rl Training

95

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

Skill
Orchestra-Research

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill
davila7

Huggingface Llm Trainer

99

Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

Skill
huggingface

Verl Rl Training

95

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

Skill
davila7

Fine Tuning with TRL

95

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill
davila7