Slo Architect
技能 已验证 活跃Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.
To help teams define, review, and operate meaningful SLOs, error budgets, and burn-rate alerts that directly reflect user experience and drive engineering action.
功能
- SLO designer with predefined SLI types
- Error budget calculator with multi-window burn-rate thresholds
- SLO reviewer for common bugs (target, window, SLI definition)
- Generates PromQL-shaped alert rules
- Outputs markdown or JSON for documentation and integration
使用场景
- Defining a new SLO for a service or feature
- Reviewing existing SLOs for common bugs and policy adherence
- Computing error budgets and burn-rate alert thresholds
- Integrating SLOs with feature flags, chaos engineering, and Kubernetes operators
非目标
- General observability strategy (metrics, logs, traces)
- Customer-facing SLAs with legal implications
- Performance load testing or capacity planning
- Active incident response
安装
请先添加 Marketplace
/plugin marketplace add alirezarezvani/claude-skills/plugin install engineering@claude-code-skills质量评分
已验证类似扩展
Define SLO/SLI/SLA
99Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.
SRE Engineer
98Defines service level objectives, creates error budget policies, designs incident response procedures, develops capacity models, and produces monitoring configurations and automation scripts for production systems. Use when defining SLIs/SLOs, managing error budgets, building reliable systems at scale, incident management, chaos engineering, toil reduction, or capacity planning.
Observability Designer
100Observability Designer (POWERFUL)
Chaos Engineering
99Use when planning, running, or learning from chaos engineering experiments. Triggers on "chaos experiment", "fault injection", "gameday", "resilience test", "blast radius", "steady state", "abort criteria", "Chaos Toolkit", "Chaos Mesh", "Litmus", "Gremlin", "AWS FIS", or any deliberate failure-injection question. Ships experiment designer, blast-radius calculator, and postmortem generator (all stdlib Python), 4 references on chaos principles + experiment design + attack taxonomy + tooling landscape, and a /chaos-experiment slash command. Composes with feature-flags-architect (kill switches as abort triggers) and kubernetes-operator (common chaos targets).
Slo Implementation
97Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.
Project Session Manager
100Worktree-first dev environment manager for issues, PRs, and features with optional tmux sessions