Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Simpo Training

Skill Aktiv

Teil von:Agent Native Research Artifact (ARA) Tooling

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Zweck

To provide an efficient, reference-free method for preference alignment of LLMs when simpler, faster training than DPO/PPO is desired.

Funktionen

Reference-free preference optimization (SimPO)
Outperforms DPO on benchmark evaluations
More efficient training than DPO/PPO
Detailed configurations for multiple LLM architectures
Troubleshooting and hyperparameter tuning guidance

Anwendungsfälle

Fine-tuning LLMs with preference data for alignment
Training models when a reference model is unavailable or undesirable
Achieving simpler and faster preference alignment compared to DPO/PPO
Optimizing LLMs for specific task domains with preference feedback

Nicht-Ziele

Performing standard supervised fine-tuning (SFT)
Implementing DPO or PPO directly
Training LLM architectures that do not support preference data formats
Providing pre-trained models (focus is on the training methodology)

Code Execution

info:ValidationWhile the configuration is provided in YAML, explicit schema validation libraries like Zod or Pydantic are not evident for input arguments or structured output handling.

Execution

warning:Pinned dependenciesDependencies are listed but not explicitly pinned with version numbers or lockfiles in the SKILL.md, which could lead to compatibility issues.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

95 /100

Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago

GitHub-Inhaber Orchestra-Research

Sterne8.3k

Downloads 0

LizenzMIT

Websiteorchestra-research.com

Status

Quellcode ansehen

Simpo Training

Funktionen

Anwendungsfälle

Nicht-Ziele

Code Execution

Execution

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Unsloth

Implementing Llms Litgpt

Arize Prompt Optimization

Prompt Optimization

Fine Tuning With Trl

Chat Format