Agent Benchmark Suite
Skill Verified ActiveAgent skill for benchmark-suite - invoke with $agent-benchmark-suite
To automate and enhance the performance optimization lifecycle for software systems by providing comprehensive benchmarking, regression detection, and validation capabilities.
Features
- Comprehensive benchmarking framework
- Automated performance regression detection
- Automated performance testing and validation
- Integration with MCP for advanced analysis
- CLI commands for operational control
Use Cases
- Running performance benchmarks for new features or infrastructure changes
- Detecting performance regressions before they impact users
- Validating performance against Service Level Agreements (SLAs)
- Automating performance testing as part of CI/CD pipelines
Non-Goals
- Functional testing of application logic
- Security vulnerability scanning beyond performance-related aspects
- End-user application support or bug fixing
Workflow
- Configure benchmark parameters (duration, iterations, baseline)
- Execute comprehensive benchmark suite or specific benchmarks
- Analyze benchmark results for performance metrics and trends
- Detect performance regressions by comparing current results with historical data
- Validate performance against predefined criteria (SLAs, scalability)
- Generate summary reports and recommendations
Practices
- Performance Optimization
- Automated Testing
- Regression Prevention
- Continuous Integration
Prerequisites
- Claude Code environment
- Access to MCP server (for full functionality)
Installation
npx skills add ruvnet/rufloRuns the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.
Quality Score
VerifiedTrust Signals
Similar Extensions
Telegram Crabbox E2e Proof
100Use when reviewing, reproducing, or proving OpenClaw Telegram behavior with a real Telegram user on Crabbox, including PR review workflows that need an agent-controlled Telegram Desktop recording, TDLib user-driver commands, Convex-leased credentials, WebVNC observation, and motion-trimmed artifacts.
Openclaw Testing
100Choose, run, rerun, or debug OpenClaw tests, CI checks, Docker E2E lanes, release validation, and the cheapest safe verification path.
OpenClaw Release Maintainer
100Prepare or verify OpenClaw stable/beta releases, changelogs, release notes, publish commands, and artifacts.
ClawSweeper Skill
100Use for all ClawSweeper work: OpenClaw issue/PR sweep reports, commit-review reports, repair jobs, cloud fix PRs, @clawsweeper maintainer mention commands, trusted ClawSweeper-reviewed autofix/automerge, GitHub Actions monitoring, permissions, gates, and manual backfills.
Agent Browser
100Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Benchmark
100Performance regression detection using the browse daemon. Establishes baselines for page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR. Tracks performance trends over time. Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals", "bundle size", "load time". (gstack) Voice triggers (speech-to-text aliases): "speed test", "check performance".