跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Configure Alerting Rules

技能 已验证 活跃

Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.

目的

To enable users to set up robust and actionable incident alerting by configuring Prometheus Alertmanager, reducing alert fatigue, and ensuring timely notifications to the appropriate teams.

功能

  • Configure Alertmanager deployment and Prometheus integration
  • Define Prometheus alerting rules with best practices
  • Create notification templates for Slack, PagerDuty, and email
  • Implement advanced routing, grouping, and inhibition rules
  • Manage silences for planned maintenance and integrate with external systems

使用场景

  • Implementing proactive monitoring with automated incident detection
  • Routing alerts to appropriate teams based on severity
  • Reducing alert fatigue through grouping and deduplication
  • Integrating monitoring with on-call systems like PagerDuty

非目标

  • Setting up Prometheus metrics collection
  • Writing custom alert queries
  • Managing on-call rotations directly (only integrating with systems that do)
  • Writing incident response runbooks (though it links to them)

安装

/plugin install agent-almanac@pjt222-agent-almanac

质量评分

已验证
98 /100
about 24 hours ago 分析

信任信号

最近提交2 days ago
星标14
许可证MIT
状态
查看源代码

类似扩展

Observability Designer

100

Observability Designer (POWERFUL)

技能
alirezarezvani

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

技能
wshobson

Monitor Stream

99

Stream live swarm events using the Monitor tool for real-time observability

技能
ruvnet

LangSmith Observability

99

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

技能
Orchestra-Research

Plan Capacity

99

Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.

技能
pjt222

Define SLO/SLI/SLA

99

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

技能
pjt222