Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Observability Designer

Skill Verifiziert Aktiv

Observability Designer (POWERFUL)

Zweck

To help engineers and SREs design robust, scalable, and cost-effective observability strategies for their production systems.

Funktionen

  • Generates SLI/SLO frameworks with error budgets and burn rate alerts
  • Analyzes and optimizes existing alert configurations
  • Creates role-specific and service-type optimized dashboard specifications
  • Follows observability best practices for metrics, logs, and traces
  • Provides recommendations for monitoring integration and implementation

Anwendungsfälle

  • When designing observability for a new service
  • When needing to optimize existing alerting to reduce noise
  • When creating comprehensive monitoring dashboards for different roles
  • When establishing or refining SLOs and error budget policies

Nicht-Ziele

  • Implementing or deploying monitoring infrastructure
  • Directly integrating with specific cloud provider monitoring services
  • Writing custom metric exporters or agents

Workflow

  1. Define service characteristics (type, criticality, dependencies).
  2. Use `slo_designer.py` to generate SLIs, SLOs, error budgets, and alerts.
  3. Use `alert_optimizer.py` to analyze and improve existing alerts.
  4. Use `dashboard_generator.py` to create monitoring dashboards tailored to roles and services.
  5. Integrate generated configurations into the monitoring stack (Prometheus, Grafana, Alertmanager).

Praktiken

  • SLO Design
  • Alert Optimization
  • Dashboard Design
  • Monitoring Best Practices

Voraussetzungen

  • Python 3.7+

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add alirezarezvani/claude-skills
/plugin install engineering@claude-code-skills

Qualitätspunktzahl

Verifiziert
100 /100
Analysiert about 22 hours ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne14.6k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Define SLO/SLI/SLA

99

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

Skill
pjt222

Slo Architect

99

Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.

Skill
alirezarezvani

Azure Monitor Query Py

100

Azure Monitor Query SDK for Python. Use for querying Log Analytics workspaces and Azure Monitor metrics. Triggers: "azure-monitor-query", "LogsQueryClient", "MetricsQueryClient", "Log Analytics", "Kusto queries", "Azure metrics".

Skill
microsoft

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Skill
wshobson

Monitor Stream

99

Stream live swarm events using the Monitor tool for real-time observability

Skill
ruvnet

LangSmith Observability

99

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

Skill
Orchestra-Research