跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Autoresearch Agent

技能 已验证 活跃

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

目的

To automate and measure the optimization of any file based on a defined metric, enabling users to systematically improve code performance, content quality, or other measurable outcomes.

功能

  • Autonomous experiment loop for file optimization
  • Automated editing, evaluation, committing, and reverting
  • Support for various domains (engineering, marketing, content, prompts)
  • Configurable evaluation commands, metrics, and directions
  • Git integration for versioning and rollback

使用场景

  • Use when you want to optimize code speed, reduce bundle/image size, or improve test pass rates.
  • Use when you need to improve content quality such as headlines, copy, or click-through rates.
  • Use when optimizing prompts for LLM interactions or agent instructions.
  • Use when running any measurable improvement loop that can be automated.

非目标

  • Modifying the evaluation script or external dependencies.
  • Performing optimization without a clear, measurable metric.
  • Handling the initial setup of the target project's build or testing environment.

工作流

  1. User runs `/ar:setup` to configure experiment parameters (target file, evaluation command, metric, direction).
  2. Script creates experiment directory, config files, and a git branch.
  3. User (or AI agent) calls `python scripts/run_experiment.py --single`.
  4. Script edits the target file based on AI's instruction (or a generated change).
  5. Script runs the evaluation command and parses the metric.
  6. Script compares the new metric to the best metric found so far.
  7. If improved, the change is committed; otherwise, the repo is reset.
  8. Result (keep/discard/crash) is logged to `results.tsv`.
  9. This loop continues until interrupted or a goal is met.

安装

请先添加 Marketplace

/plugin marketplace add alirezarezvani/claude-skills
/plugin install autoresearch-agent@claude-code-skills

质量评分

已验证
99 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标14.6k
许可证MIT
状态
查看源代码

类似扩展

Project Session Manager

100

Worktree-first dev environment manager for issues, PRs, and features with optional tmux sessions

技能
Yeachan-Heo

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

技能
Yeachan-Heo

Using Git Worktrees

100

在开始需要与当前工作区隔离的功能性工作,或在执行实现计划之前使用 - 通过原生工具或 git worktree 回退确保存在隔离的工作区。

技能
obra

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

技能
EveryInc

Rule Effectiveness Analysis

100

Analyze which rules are actively used vs inert. Detect coverage gaps. Recommend pruning to reduce token consumption.

技能
luiseiman

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

技能
github