Simpo Training

Skill Active

Part of:Agent Native Research Artifact (ARA) Tooling

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

Purpose

To provide an efficient, reference-free method for preference alignment of LLMs when simpler, faster training than DPO/PPO is desired.

Features

Reference-free preference optimization (SimPO)
Outperforms DPO on benchmark evaluations
More efficient training than DPO/PPO
Detailed configurations for multiple LLM architectures
Troubleshooting and hyperparameter tuning guidance

Use Cases

Fine-tuning LLMs with preference data for alignment
Training models when a reference model is unavailable or undesirable
Achieving simpler and faster preference alignment compared to DPO/PPO
Optimizing LLMs for specific task domains with preference feedback

Non-Goals

Performing standard supervised fine-tuning (SFT)
Implementing DPO or PPO directly
Training LLM architectures that do not support preference data formats
Providing pre-trained models (focus is on the training methodology)

Code Execution

info:ValidationWhile the configuration is provided in YAML, explicit schema validation libraries like Zod or Pydantic are not evident for input arguments or structured output handling.

Execution

warning:Pinned dependenciesDependencies are listed but not explicitly pinned with version numbers or lockfiles in the SKILL.md, which could lead to compatibility issues.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

95 /100

Analyzed about 20 hours ago

Trust Signals

Last commit16 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

Skill

davila7

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

Skill

davila7

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill

github

Prompt Optimization

100

Applies prompt repetition to improve accuracy for non-reasoning LLMs

Skill

asklokesh

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

Skill

Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

Skill

ruvnet