Skip to main content

Experiment Design

Skill Verified Active

A discipline for designing experiments (A/B tests, multivariate, holdouts) so the results actually answer the question you asked. Hypothesis writing, sample size, duration, segment analysis, interpretation, decision-making, and the common failure modes that produce confidently wrong shipping decisions.

Purpose

To equip users with a robust framework for conducting experiments that yield trustworthy and actionable results, preventing costly shipping decisions based on flawed data.

Features

  • Detailed guidance on hypothesis formulation (cause, effect, magnitude, mechanism)
  • Framework for sample size and minimum detectable effect calculation
  • Best practices for test duration and handling novelty/primacy effects
  • Clear delineation of what NOT to A/B test
  • Guidance on segment analysis, interaction effects, and ratio metrics
  • Methodology for sequential testing and avoiding p-hacking
  • Decision framework for interpreting results (win, loss, inconclusive)
  • Catalog of common experimental failures and their fixes

Use Cases

  • Designing A/B tests for new product features
  • Interpreting the results of multivariate experiments
  • Determining appropriate sample size and test duration for a given MDE
  • Avoiding common pitfalls that lead to confidently wrong shipping decisions
  • Establishing a disciplined process for product experimentation

Non-Goals

  • Implementing or running the experimentation platform itself
  • Deep statistical analysis beyond standard practice (e.g., advanced Bayesian methods)
  • Feature flag operational mechanics
  • Platform-specific tooling configuration

Practices

  • Experimental Design
  • Statistical Rigor
  • Product Analytics
  • Decision Science

Installation

npx skills add rampstackco/claude-skills

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
99 /100
Analyzed about 13 hours ago

Trust Signals

Last commit3 days ago
Stars168
LicenseMIT
Status
View Source

Similar Extensions

Measure Experiment Design

100

Designs an A/B test or experiment with clear hypothesis, variants, success metrics, sample size, and duration. Use when planning experiments to validate product changes or test hypotheses.

Skill
product-on-purpose

Acquisition Channel Advisor

100

Evaluate acquisition channels using unit economics, customer quality, and scalability. Use when deciding whether to scale, test, or kill a growth channel.

Skill
deanpeters

Brainstorm Experiments New

100

Design lean startup experiments (pretotypes) for a new product. Creates XYZ hypotheses and suggests low-effort validation methods like landing pages, explainer videos, and pre-orders. Use when validating a new product idea, creating pretotypes, or testing market demand.

Skill
phuryn

Ads Performance Analytics

99

How to read paid media dashboards without fooling yourself. Attribution models, platform reporting quirks, multi-platform reconciliation, ROAS vs LTV horizon traps, statistical noise in performance metrics, incrementality testing, and the failure modes that produce expensive lessons. Triggers on read paid media dashboard, attribution analysis, ROAS vs LTV, multi-platform reconciliation, ad incrementality, geo holdout, conversion lift study, ghost bidding, paid media reporting, board-deck paid media metrics, blended CAC, MMM, MTA, last-click attribution. Also triggers when a marketer is about to scale, kill, or rebudget a campaign based on platform metrics, or when reconciling platform reports against warehouse revenue.

Skill
rampstackco

Experimentation Platform Orchestrator

98

A platform decision framework for experimentation. When to use Statsig vs PostHog vs GrowthBook vs Optimizely vs Amplitude vs Eppo vs Kameleoon. How to migrate between them. How to coordinate when multi-platform is genuinely warranted. The decisions that compound for years and the ones you can defer. Triggers on which experimentation platform, choose Statsig vs PostHog, evaluate experimentation tools, switch experimentation platform, migrate from Optimizely, consolidate experimentation tools, multi-platform experimentation, experimentation platform decision, ab test platform selection, feature flag platform vs experiment platform, warehouse-native experiments, vendor lock-in experimentation. Also triggers when a team is asking about cost, governance, or migration cost across experimentation tools, or when an evaluation is starting.

Skill
rampstackco

Ab Test Setup

98

When the user wants to plan, design, or implement an A/B test or experiment, or build a growth experimentation program. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "should I test this," "which version is better," "test two versions," "statistical significance," "how long should I run this test," "growth experiments," "experiment velocity," "experiment backlog," "ICE score," "experimentation program," or "experiment playbook." Use this whenever someone is comparing two approaches and wants to measure which performs better, or when they want to build a systematic experimentation practice. For tracking implementation, see analytics-tracking. For page-level conversion optimization, see page-cro.

Skill
coreyhaines31

© 2025 SkillRepo · Find the right skill, skip the noise.