Skip to main content

Autoresearch Agent

Skill Verified Active

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

Purpose

To automate and measure the optimization of any file based on a defined metric, enabling users to systematically improve code performance, content quality, or other measurable outcomes.

Features

  • Autonomous experiment loop for file optimization
  • Automated editing, evaluation, committing, and reverting
  • Support for various domains (engineering, marketing, content, prompts)
  • Configurable evaluation commands, metrics, and directions
  • Git integration for versioning and rollback

Use Cases

  • Use when you want to optimize code speed, reduce bundle/image size, or improve test pass rates.
  • Use when you need to improve content quality such as headlines, copy, or click-through rates.
  • Use when optimizing prompts for LLM interactions or agent instructions.
  • Use when running any measurable improvement loop that can be automated.

Non-Goals

  • Modifying the evaluation script or external dependencies.
  • Performing optimization without a clear, measurable metric.
  • Handling the initial setup of the target project's build or testing environment.

Workflow

  1. User runs `/ar:setup` to configure experiment parameters (target file, evaluation command, metric, direction).
  2. Script creates experiment directory, config files, and a git branch.
  3. User (or AI agent) calls `python scripts/run_experiment.py --single`.
  4. Script edits the target file based on AI's instruction (or a generated change).
  5. Script runs the evaluation command and parses the metric.
  6. Script compares the new metric to the best metric found so far.
  7. If improved, the change is committed; otherwise, the repo is reset.
  8. Result (keep/discard/crash) is logged to `results.tsv`.
  9. This loop continues until interrupted or a goal is met.

Installation

First, add the marketplace

/plugin marketplace add alirezarezvani/claude-skills
/plugin install autoresearch-agent@claude-code-skills

Quality Score

Verified
99 /100
Analyzed about 18 hours ago

Trust Signals

Last commitabout 21 hours ago
Stars14.6k
LicenseMIT
Status
View Source

Similar Extensions

Project Session Manager

100

Worktree-first dev environment manager for issues, PRs, and features with optional tmux sessions

Skill
Yeachan-Heo

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

Skill
Yeachan-Heo

Using Git Worktrees

100

Use when starting feature work that needs isolation from current workspace or before executing implementation plans - ensures an isolated workspace exists via native tools or git worktree fallback

Skill
obra

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

Skill
EveryInc

Rule Effectiveness Analysis

100

Analyze which rules are actively used vs inert. Detect coverage gaps. Recommend pruning to reduce token consumption.

Skill
luiseiman

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill
github

© 2025 SkillRepo · Find the right skill, skip the noise.