此内容尚未提供您的语言版本,正在以英文显示。

Configure Alerting Rules

技能已验证活跃

Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.

目的

To enable users to set up robust and actionable incident alerting by configuring Prometheus Alertmanager, reducing alert fatigue, and ensuring timely notifications to the appropriate teams.

功能

Configure Alertmanager deployment and Prometheus integration
Define Prometheus alerting rules with best practices
Create notification templates for Slack, PagerDuty, and email
Implement advanced routing, grouping, and inhibition rules
Manage silences for planned maintenance and integrate with external systems

使用场景

Implementing proactive monitoring with automated incident detection
Routing alerts to appropriate teams based on severity
Reducing alert fatigue through grouping and deduplication
Integrating monitoring with on-call systems like PagerDuty

非目标

Setting up Prometheus metrics collection
Writing custom alert queries
Managing on-call rotations directly (only integrating with systems that do)
Writing incident response runbooks (though it links to them)

安装

/plugin install agent-almanac@pjt222-agent-almanac

质量评分

已验证

98 /100

about 24 hours ago 分析

信任信号

最近提交2 days ago

GitHub 所有者 pjt222

星标14

下载量 308

许可证MIT

网站pjt222.github.io

状态

查看源代码

类似扩展

Observability Designer

100

Observability Designer (POWERFUL)

技能

alirezarezvani

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

技能

wshobson

Monitor Stream

Stream live swarm events using the Monitor tool for real-time observability

技能

ruvnet

LangSmith Observability

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

技能

Orchestra-Research

Plan Capacity

Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.

技能

pjt222

Define SLO/SLI/SLA

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

技能

pjt222