Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Autoresearch Agent

Skill Verifiziert Aktiv

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

Zweck

To automate and measure the optimization of any file based on a defined metric, enabling users to systematically improve code performance, content quality, or other measurable outcomes.

Funktionen

  • Autonomous experiment loop for file optimization
  • Automated editing, evaluation, committing, and reverting
  • Support for various domains (engineering, marketing, content, prompts)
  • Configurable evaluation commands, metrics, and directions
  • Git integration for versioning and rollback

Anwendungsfälle

  • Use when you want to optimize code speed, reduce bundle/image size, or improve test pass rates.
  • Use when you need to improve content quality such as headlines, copy, or click-through rates.
  • Use when optimizing prompts for LLM interactions or agent instructions.
  • Use when running any measurable improvement loop that can be automated.

Nicht-Ziele

  • Modifying the evaluation script or external dependencies.
  • Performing optimization without a clear, measurable metric.
  • Handling the initial setup of the target project's build or testing environment.

Workflow

  1. User runs `/ar:setup` to configure experiment parameters (target file, evaluation command, metric, direction).
  2. Script creates experiment directory, config files, and a git branch.
  3. User (or AI agent) calls `python scripts/run_experiment.py --single`.
  4. Script edits the target file based on AI's instruction (or a generated change).
  5. Script runs the evaluation command and parses the metric.
  6. Script compares the new metric to the best metric found so far.
  7. If improved, the change is committed; otherwise, the repo is reset.
  8. Result (keep/discard/crash) is logged to `results.tsv`.
  9. This loop continues until interrupted or a goal is met.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add alirezarezvani/claude-skills
/plugin install autoresearch-agent@claude-code-skills

Qualitätspunktzahl

Verifiziert
99 /100
Analysiert about 21 hours ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne14.6k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Project Session Manager

100

Worktree-first dev environment manager for issues, PRs, and features with optional tmux sessions

Skill
Yeachan-Heo

Oh My Claudecode

100

Process-first advisor routing for Claude, Codex, or Gemini via `omc ask`, with artifact capture and no raw CLI assembly

Skill
Yeachan-Heo

Using Git Worktrees

100

Verwenden Sie dies, wenn Sie mit der Feature-Arbeit beginnen, die eine Isolierung vom aktuellen Arbeitsbereich erfordert, oder bevor Sie Implementierungspläne ausführen – stellt sicher, dass über native Tools oder einen Git-Worktree-Fallback ein isolierter Arbeitsbereich vorhanden ist.

Skill
obra

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

Skill
EveryInc

Rule Effectiveness Analysis

100

Analyze which rules are actively used vs inert. Detect coverage gaps. Recommend pruning to reduce token consumption.

Skill
luiseiman

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill
github