跳转到主要内容

OraClaw Bandit

技能 已验证 活跃

A/B 测试和功能优化,适用于 AI 代理。使用多臂老虎机和上下文老虎机(LinUCB)自动选择最佳选项。无需数据仓库——直接从请求运行

目的

为 AI 代理提供精确、确定的优化算法以进行决策,使它们能够在不依赖可能易变的 LLM 启发式方法的情况下选择最佳选项、运行有效的 A/B 测试和优化功能。

功能

  • 使用老虎机自动选择最佳变体
  • 使用 LinUCB 进行上下文感知优化
  • 低延迟(<25ms)和无 token 计算
  • 多种集成方法(MCP 服务器、REST API、SDK)
  • 支持各种优化算法

使用场景

  • 在 A/B 测试的多个选项中选择最佳变体
  • 优化功能标志、提示、电子邮件主题或任何选择
  • 根据用户、时间或情况进行上下文感知的选择
  • 在没有预定样本量的情况下运行自适应实验

非目标

  • 执行超出优化范围的任意数学计算
  • 充当通用数据分析或数据仓库工具
  • 替代不需要确定性数学解决方案的任务的 LLM 推理

实践

  • 优化
  • 实验设计
  • 机器学习运维

先决条件

  • 用于高级功能的 ORACLAW_API_KEY 环境变量
  • 用于本地 MCP 服务器设置的 Node.js/npm

安装

npx skills add Whatsonyourmind/oraclaw

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
99 /100
1 day ago 分析

信任信号

最近提交12 days ago
星标8
许可证MIT
状态
查看源代码

类似扩展

Measure Experiment Design

100

Designs an A/B test or experiment with clear hypothesis, variants, success metrics, sample size, and duration. Use when planning experiments to validate product changes or test hypotheses.

技能
product-on-purpose

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

技能
EveryInc

Experiment Designer

99

Use when planning product experiments, writing testable hypotheses, estimating sample size, prioritizing tests, or interpreting A/B outcomes with practical statistical rigor.

技能
alirezarezvani

Ab Test Setup

98

When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," "hypothesis," "conversion experiment," "statistical significance," or "test this." For tracking implementation, see analytics-tracking.

技能
alirezarezvani

Run Ab Test Models

95

Design and execute A/B tests for ML models in production using traffic splitting, statistical significance testing, and canary/shadow deployment strategies. Measure performance differences and make data-driven decisions about model rollout. Use when validating a new model version before full rollout, comparing candidate models trained with different algorithms, measuring business metric impact of model changes, or when regulatory requirements mandate gradual rollout.

技能
pjt222

Creating Experiments

79

Guides agents through the 3-step experiment creation flow: defining the hypothesis, configuring rollout, and setting up analytics. Delegates rollout decisions to configuring-experiment-rollout and metric setup to configuring-experiment-analytics. TRIGGER when: user asks to create a new experiment or A/B test, OR when you are about to call experiment-create. DO NOT TRIGGER when: user is updating an existing experiment, managing lifecycle, or only browsing experiments.

技能
PostHog