Configure Alerting Rules
Skill Verified ActiveConfigure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.
To enable users to set up robust and actionable incident alerting by configuring Prometheus Alertmanager, reducing alert fatigue, and ensuring timely notifications to the appropriate teams.
Features
- Configure Alertmanager deployment and Prometheus integration
- Define Prometheus alerting rules with best practices
- Create notification templates for Slack, PagerDuty, and email
- Implement advanced routing, grouping, and inhibition rules
- Manage silences for planned maintenance and integrate with external systems
Use Cases
- Implementing proactive monitoring with automated incident detection
- Routing alerts to appropriate teams based on severity
- Reducing alert fatigue through grouping and deduplication
- Integrating monitoring with on-call systems like PagerDuty
Non-Goals
- Setting up Prometheus metrics collection
- Writing custom alert queries
- Managing on-call rotations directly (only integrating with systems that do)
- Writing incident response runbooks (though it links to them)
Installation
/plugin install agent-almanac@pjt222-agent-almanacQuality Score
VerifiedTrust Signals
Similar Extensions
Observability Designer
100Observability Designer (POWERFUL)
Grafana Dashboards
99Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Monitor Stream
99Stream live swarm events using the Monitor tool for real-time observability
LangSmith Observability
99LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.
Plan Capacity
99Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.
Define SLO/SLI/SLA
99Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.