此内容尚未提供您的语言版本,正在以英文显示。

Simpo Training

技能活跃

属于:Agent Native Research Artifact (ARA) Tooling

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

目的

To provide an efficient, reference-free method for preference alignment of LLMs when simpler, faster training than DPO/PPO is desired.

功能

Reference-free preference optimization (SimPO)
Outperforms DPO on benchmark evaluations
More efficient training than DPO/PPO
Detailed configurations for multiple LLM architectures
Troubleshooting and hyperparameter tuning guidance

使用场景

Fine-tuning LLMs with preference data for alignment
Training models when a reference model is unavailable or undesirable
Achieving simpler and faster preference alignment compared to DPO/PPO
Optimizing LLMs for specific task domains with preference feedback

非目标

Performing standard supervised fine-tuning (SFT)
Implementing DPO or PPO directly
Training LLM architectures that do not support preference data formats
Providing pre-trained models (focus is on the training methodology)

Code Execution

info:ValidationWhile the configuration is provided in YAML, explicit schema validation libraries like Zod or Pydantic are not evident for input arguments or structured output handling.

Execution

warning:Pinned dependenciesDependencies are listed but not explicitly pinned with version numbers or lockfiles in the SKILL.md, which could lead to compatibility issues.

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

95 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Unsloth

100

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

技能

davila7

Implementing Llms Litgpt

100

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of architectures, or production fine-tuning with LoRA/QLoRA. Single-file implementations, no abstraction layers.

技能

davila7

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

技能

github

Prompt Optimization

100

应用提示重复以提高非推理 LLM 的准确性

技能

asklokesh

Fine Tuning With Trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

技能

Orchestra-Research

Chat Format

100

Format prompts for different LLM providers with chat templates and HNSW-powered context retrieval

技能

ruvnet