Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Run Ab Test Models

Skill Verifiziert Aktiv

Design and execute A/B tests for ML models in production using traffic splitting, statistical significance testing, and canary/shadow deployment strategies. Measure performance differences and make data-driven decisions about model rollout. Use when validating a new model version before full rollout, comparing candidate models trained with different algorithms, measuring business metric impact of model changes, or when regulatory requirements mandate gradual rollout.

Zweck

To enable data-driven decisions about ML model rollouts by designing and executing controlled A/B tests in production environments.

Funktionen

A/B test design with statistical significance
Traffic splitting and user assignment
Canary and shadow deployment strategies
Performance metric collection and analysis
Guardrail monitoring for safety thresholds
Automated rollout decision support

Anwendungsfälle

Validating new model versions before full rollout
Comparing candidate models trained with different algorithms
Measuring business metric impact of model changes
Meeting regulatory requirements for gradual rollout

Nicht-Ziele

The actual deployment of ML models to production infrastructure
The training of ML models
Real-time model serving infrastructure management

Documentation

info:Configuration & parameter referenceThe SKILL.md outlines the required and optional inputs for the A/B test experiment but does not provide a detailed reference for all parameters or their defaults within the main document. The referenced examples.md likely contain this detail.

Execution

info:ValidationThe Python examples demonstrate basic data handling and analysis, but explicit schema validation libraries like Zod or Pydantic are not shown for all inputs/outputs within the main SKILL.md.

Installation

/plugin install agent-almanac@pjt222-agent-almanac

Qualitätspunktzahl

Verifiziert

95 /100

Analysiert about 16 hours ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber pjt222

Sterne14

Downloads 308

LizenzMIT

Websitepjt222.github.io

Status

Quellcode ansehen

Run Ab Test Models

Funktionen

Anwendungsfälle

Nicht-Ziele

Documentation

Execution

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Measure Experiment Design

Arize Experiment

CE Optimize

OraClaw Bandit

Experiment Designer

Ab Test Setup