Configure Alerting Rules

Skill Verified Active

Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.

Purpose

To enable users to set up robust and actionable incident alerting by configuring Prometheus Alertmanager, reducing alert fatigue, and ensuring timely notifications to the appropriate teams.

Features

Configure Alertmanager deployment and Prometheus integration
Define Prometheus alerting rules with best practices
Create notification templates for Slack, PagerDuty, and email
Implement advanced routing, grouping, and inhibition rules
Manage silences for planned maintenance and integrate with external systems

Use Cases

Implementing proactive monitoring with automated incident detection
Routing alerts to appropriate teams based on severity
Reducing alert fatigue through grouping and deduplication
Integrating monitoring with on-call systems like PagerDuty

Non-Goals

Setting up Prometheus metrics collection
Writing custom alert queries
Managing on-call rotations directly (only integrating with systems that do)
Writing incident response runbooks (though it links to them)

Installation

/plugin install agent-almanac@pjt222-agent-almanac

Quality Score

Verified

98 /100

Analyzed about 21 hours ago

Trust Signals

Last commit1 day ago

GitHub owner pjt222

Stars14

Downloads 308

LicenseMIT

Websitepjt222.github.io

Status

View Source

Similar Extensions

Observability Designer

100

Observability Designer (POWERFUL)

Skill

alirezarezvani

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Skill

wshobson

Monitor Stream

Stream live swarm events using the Monitor tool for real-time observability

Skill

ruvnet

LangSmith Observability

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

Skill

Orchestra-Research

Plan Capacity

Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.

Skill

pjt222

Define SLO/SLI/SLA

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

Skill

pjt222