Autoresearch Agent

Skill Verified Active

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

Purpose

To automate and measure the optimization of any file based on a defined metric, enabling users to systematically improve code performance, content quality, or other measurable outcomes.

Features

Autonomous experiment loop for file optimization
Automated editing, evaluation, committing, and reverting
Support for various domains (engineering, marketing, content, prompts)
Configurable evaluation commands, metrics, and directions
Git integration for versioning and rollback

Use Cases

Use when you want to optimize code speed, reduce bundle/image size, or improve test pass rates.
Use when you need to improve content quality such as headlines, copy, or click-through rates.
Use when optimizing prompts for LLM interactions or agent instructions.
Use when running any measurable improvement loop that can be automated.

Non-Goals

Modifying the evaluation script or external dependencies.
Performing optimization without a clear, measurable metric.
Handling the initial setup of the target project's build or testing environment.

Workflow

User runs `/ar:setup` to configure experiment parameters (target file, evaluation command, metric, direction).
Script creates experiment directory, config files, and a git branch.
User (or AI agent) calls `python scripts/run_experiment.py --single`.
Script edits the target file based on AI's instruction (or a generated change).
Script runs the evaluation command and parses the metric.
Script compares the new metric to the best metric found so far.
If improved, the change is committed; otherwise, the repo is reset.
Result (keep/discard/crash) is logged to `results.tsv`.
This loop continues until interrupted or a goal is met.

Installation

First, add the marketplace

/plugin marketplace add alirezarezvani/claude-skills

/plugin install autoresearch-agent@claude-code-skills

Quality Score

Verified

99 /100

Analyzed about 18 hours ago

Trust Signals

Last commitabout 21 hours ago

GitHub owner alirezarezvani

Stars14.6k

LicenseMIT

Websitealirezarezvani.medium.com

Status

View Source

Similar Extensions

Project Session Manager

100

Worktree-first dev environment manager for issues, PRs, and features with optional tmux sessions

Skill

Yeachan-Heo

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

Skill

Yeachan-Heo

Using Git Worktrees

100

Use when starting feature work that needs isolation from current workspace or before executing implementation plans - ensures an isolated workspace exists via native tools or git worktree fallback

Skill

obra

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

Skill

EveryInc

Rule Effectiveness Analysis

100

Analyze which rules are actively used vs inert. Detect coverage gaps. Recommend pruning to reduce token consumption.

Skill

luiseiman

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill

github