Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Simpo Training

Skill Aktiv

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Zweck

To provide an efficient, reference-free method for preference alignment of LLMs when simpler, faster training than DPO/PPO is desired.

Funktionen

  • Reference-free preference optimization (SimPO)
  • Outperforms DPO on benchmark evaluations
  • More efficient training than DPO/PPO
  • Detailed configurations for multiple LLM architectures
  • Troubleshooting and hyperparameter tuning guidance

Anwendungsfälle

  • Fine-tuning LLMs with preference data for alignment
  • Training models when a reference model is unavailable or undesirable
  • Achieving simpler and faster preference alignment compared to DPO/PPO
  • Optimizing LLMs for specific task domains with preference feedback

Nicht-Ziele

  • Performing standard supervised fine-tuning (SFT)
  • Implementing DPO or PPO directly
  • Training LLM architectures that do not support preference data formats
  • Providing pre-trained models (focus is on the training methodology)

Code Execution

  • info:ValidationWhile the configuration is provided in YAML, explicit schema validation libraries like Zod or Pydantic are not evident for input arguments or structured output handling.

Execution

  • warning:Pinned dependenciesDependencies are listed but not explicitly pinned with version numbers or lockfiles in the SKILL.md, which could lead to compatibility issues.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

95 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill
davila7

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill
davila7

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill
github

Prompt Optimization

100

Wendet Prompt-Wiederholung an, um die Genauigkeit für LLMs ohne Schlussfolgerungsfähigkeit zu verbessern

Skill
asklokesh

Fine Tuning With Trl

96

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill
Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

Skill
ruvnet