Autoresearch Agent
技能 已验证 活跃Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.
To automate and measure the optimization of any file based on a defined metric, enabling users to systematically improve code performance, content quality, or other measurable outcomes.
功能
- Autonomous experiment loop for file optimization
- Automated editing, evaluation, committing, and reverting
- Support for various domains (engineering, marketing, content, prompts)
- Configurable evaluation commands, metrics, and directions
- Git integration for versioning and rollback
使用场景
- Use when you want to optimize code speed, reduce bundle/image size, or improve test pass rates.
- Use when you need to improve content quality such as headlines, copy, or click-through rates.
- Use when optimizing prompts for LLM interactions or agent instructions.
- Use when running any measurable improvement loop that can be automated.
非目标
- Modifying the evaluation script or external dependencies.
- Performing optimization without a clear, measurable metric.
- Handling the initial setup of the target project's build or testing environment.
工作流
- User runs `/ar:setup` to configure experiment parameters (target file, evaluation command, metric, direction).
- Script creates experiment directory, config files, and a git branch.
- User (or AI agent) calls `python scripts/run_experiment.py --single`.
- Script edits the target file based on AI's instruction (or a generated change).
- Script runs the evaluation command and parses the metric.
- Script compares the new metric to the best metric found so far.
- If improved, the change is committed; otherwise, the repo is reset.
- Result (keep/discard/crash) is logged to `results.tsv`.
- This loop continues until interrupted or a goal is met.
安装
请先添加 Marketplace
/plugin marketplace add alirezarezvani/claude-skills/plugin install autoresearch-agent@claude-code-skills质量评分
已验证类似扩展
Project Session Manager
100Worktree-first dev environment manager for issues, PRs, and features with optional tmux sessions
Oh My Claudecode
100Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly
Using Git Worktrees
100在开始需要与当前工作区隔离的功能性工作,或在执行实现计划之前使用 - 通过原生工具或 git worktree 回退确保存在隔离的工作区。
CE Optimize
100Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.
Rule Effectiveness Analysis
100Analyze which rules are actively used vs inert. Detect coverage gaps. Recommend pruning to reduce token consumption.
Arize Prompt Optimization
100Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.