此内容尚未提供您的语言版本,正在以英文显示。

Grpo Rl Training

技能活跃

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

目的

To empower users to fine-tune language models for specific tasks like enforcing output formats, teaching verifiable tasks, improving reasoning, and aligning models to domain-specific behaviors, particularly when custom reward signals are needed.

功能

Expert guidance on GRPO algorithm fundamentals
Detailed implementation workflow (dataset, rewards, training, deployment)
Production-ready training templates and code examples
Extensive library of customizable reward functions
Hyperparameter tuning advice and troubleshooting guide

使用场景

Enforcing specific output formats (XML, JSON)
Teaching verifiable tasks with objective correctness metrics
Improving reasoning capabilities through chain-of-thought rewards
Aligning models to domain-specific behaviors without preference data

非目标

Simple supervised fine-tuning tasks
Tasks without clear reward signals
Scenarios where high-quality preference pairs are already available (DPO/PPO are better)

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a low closure rate and potentially slow maintainer response.

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

76 /100

about 18 hours ago 分析

信任信号

最近提交about 20 hours ago

GitHub 所有者 davila7

星标27.2k

下载量 23k

许可证MIT

网站aitmpl.com

状态

查看源代码

类似扩展

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

技能

Orchestra-Research

Grpo Rl Training

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

技能

Orchestra-Research

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

技能

davila7

Huggingface Llm Trainer

Train or fine-tune language and vision models using TRL (Transformer Reinforcement Learning) or Unsloth with Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, model selection/leaderboards and model persistence. Use for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

技能

huggingface

Verl Rl Training

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.

技能

davila7

Fine Tuning with TRL

技能

davila7