此内容尚未提供您的语言版本,正在以英文显示。

Observability Designer

技能已验证活跃

Observability Designer (POWERFUL)

目的

To help engineers and SREs design robust, scalable, and cost-effective observability strategies for their production systems.

功能

Generates SLI/SLO frameworks with error budgets and burn rate alerts
Analyzes and optimizes existing alert configurations
Creates role-specific and service-type optimized dashboard specifications
Follows observability best practices for metrics, logs, and traces
Provides recommendations for monitoring integration and implementation

使用场景

When designing observability for a new service
When needing to optimize existing alerting to reduce noise
When creating comprehensive monitoring dashboards for different roles
When establishing or refining SLOs and error budget policies

非目标

Implementing or deploying monitoring infrastructure
Directly integrating with specific cloud provider monitoring services
Writing custom metric exporters or agents

工作流

Define service characteristics (type, criticality, dependencies).
Use `slo_designer.py` to generate SLIs, SLOs, error budgets, and alerts.
Use `alert_optimizer.py` to analyze and improve existing alerts.
Use `dashboard_generator.py` to create monitoring dashboards tailored to roles and services.
Integrate generated configurations into the monitoring stack (Prometheus, Grafana, Alertmanager).

实践

SLO Design
Alert Optimization
Dashboard Design
Monitoring Best Practices

先决条件

Python 3.7+

安装

请先添加 Marketplace

/plugin marketplace add alirezarezvani/claude-skills

/plugin install engineering@claude-code-skills

质量评分

已验证

100 /100

1 day ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 alirezarezvani

星标14.6k

许可证MIT

网站alirezarezvani.medium.com

状态

查看源代码

类似扩展

Define SLO/SLI/SLA

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

技能

pjt222

Slo Architect

Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.

技能

alirezarezvani

Azure Monitor Query Py

100

Azure Monitor Query SDK for Python. Use for querying Log Analytics workspaces and Azure Monitor metrics. Triggers: "azure-monitor-query", "LogsQueryClient", "MetricsQueryClient", "Log Analytics", "Kusto queries", "Azure metrics".

技能

microsoft

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

技能

wshobson

Monitor Stream

Stream live swarm events using the Monitor tool for real-time observability

技能

ruvnet

LangSmith Observability

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

技能

Orchestra-Research