Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Observability Designer

Skill Verifiziert Aktiv

Observability Designer (POWERFUL)

Zweck

To help engineers and SREs design robust, scalable, and cost-effective observability strategies for their production systems.

Funktionen

Generates SLI/SLO frameworks with error budgets and burn rate alerts
Analyzes and optimizes existing alert configurations
Creates role-specific and service-type optimized dashboard specifications
Follows observability best practices for metrics, logs, and traces
Provides recommendations for monitoring integration and implementation

Anwendungsfälle

When designing observability for a new service
When needing to optimize existing alerting to reduce noise
When creating comprehensive monitoring dashboards for different roles
When establishing or refining SLOs and error budget policies

Nicht-Ziele

Implementing or deploying monitoring infrastructure
Directly integrating with specific cloud provider monitoring services
Writing custom metric exporters or agents

Workflow

Define service characteristics (type, criticality, dependencies).
Use `slo_designer.py` to generate SLIs, SLOs, error budgets, and alerts.
Use `alert_optimizer.py` to analyze and improve existing alerts.
Use `dashboard_generator.py` to create monitoring dashboards tailored to roles and services.
Integrate generated configurations into the monitoring stack (Prometheus, Grafana, Alertmanager).

Praktiken

SLO Design
Alert Optimization
Dashboard Design
Monitoring Best Practices

Voraussetzungen

Python 3.7+

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add alirezarezvani/claude-skills

/plugin install engineering@claude-code-skills

Qualitätspunktzahl

Verifiziert

100 /100

Analysiert about 22 hours ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber alirezarezvani

Sterne14.6k

LizenzMIT

Websitealirezarezvani.medium.com

Status

Quellcode ansehen

Ähnliche Erweiterungen

Define SLO/SLI/SLA

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

Skill

pjt222

Slo Architect

Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.

Skill

alirezarezvani

Azure Monitor Query Py

100

Azure Monitor Query SDK for Python. Use for querying Log Analytics workspaces and Azure Monitor metrics. Triggers: "azure-monitor-query", "LogsQueryClient", "MetricsQueryClient", "Log Analytics", "Kusto queries", "Azure metrics".

Skill

microsoft

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Skill

wshobson

Monitor Stream

Stream live swarm events using the Monitor tool for real-time observability

Skill

ruvnet

LangSmith Observability

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

Skill

Orchestra-Research