跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Simpo Training

技能 活跃

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.

目的

To provide an efficient, reference-free method for preference alignment of LLMs when simpler, faster training than DPO/PPO is desired.

功能

  • Reference-free preference optimization (SimPO)
  • Outperforms DPO on benchmark evaluations
  • More efficient training than DPO/PPO
  • Detailed configurations for multiple LLM architectures
  • Troubleshooting and hyperparameter tuning guidance

使用场景

  • Fine-tuning LLMs with preference data for alignment
  • Training models when a reference model is unavailable or undesirable
  • Achieving simpler and faster preference alignment compared to DPO/PPO
  • Optimizing LLMs for specific task domains with preference feedback

非目标

  • Performing standard supervised fine-tuning (SFT)
  • Implementing DPO or PPO directly
  • Training LLM architectures that do not support preference data formats
  • Providing pre-trained models (focus is on the training methodology)

Code Execution

  • info:ValidationWhile the configuration is provided in YAML, explicit schema validation libraries like Zod or Pydantic are not evident for input arguments or structured output handling.

Execution

  • warning:Pinned dependenciesDependencies are listed but not explicitly pinned with version numbers or lockfiles in the SKILL.md, which could lead to compatibility issues.

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

95 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标8.3k
许可证MIT
状态
查看源代码