跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Design On Call Rotation

技能 已验证 活跃

Design sustainable on-call rotations with balanced schedules, clear escalation policies, fatigue management, and handoff procedures. Minimize burnout while maintaining incident response coverage. Use when setting up on-call for the first time, scaling a team from 2-3 to 5+ engineers, addressing on-call burnout or alert fatigue, improving incident response times, or after a post-mortem identifies handoff issues.

目的

Design balanced on-call schedules that minimize burnout and ensure effective incident response coverage.

功能

  • Define rotation schedule models (weekly, split, follow-the-sun)
  • Configure tiered escalation policies with delays
  • Implement structured handoff procedures and reminders
  • Establish fatigue management rules and track metrics
  • Document runbooks and essential access information
  • Schedule regular on-call retrospectives for continuous improvement

使用场景

  • Setting up on-call rotations for the first time
  • Scaling an on-call system for growing teams
  • Addressing engineer burnout and alert fatigue
  • Improving incident response times and handoff clarity

非目标

  • Automating the alert tuning process itself
  • Directly integrating with specific alerting tools beyond configuration examples
  • Replacing the need for human judgment in incident response

安装

/plugin install agent-almanac@pjt222-agent-almanac

质量评分

已验证
100 /100
about 22 hours ago 分析

信任信号

最近提交2 days ago
星标14
许可证MIT
状态
查看源代码

类似扩展

On Call Handoff Patterns

95

Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use this skill when transitioning on-call responsibilities between engineers and ensuring the incoming responder has full situational awareness, when writing a shift summary that captures active incidents, ongoing investigations, and recent changes, when handing off mid-incident so a fresh engineer can take over the incident commander role without losing context, when onboarding a new engineer to the on-call rotation for the first time, or when auditing and improving the quality of existing handoff processes across teams.

技能
wshobson

Observability Designer

100

Observability Designer (POWERFUL)

技能
alirezarezvani

Incident Response

100

Manage active production incidents through detection, triage, mitigation, communication, and resolution with structured roles and decision-making. Use this skill whenever the user has an active incident, a production issue, a service outage, a security incident, or needs to plan incident response procedures. Triggers on incident response, production incident, outage, service down, site down, P0, P1, severity, downtime, on-call, incident commander, status page, postmortem prep. Also triggers when something is actively broken in production and the user is figuring out what to do.

技能
rampstackco

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

技能
wshobson

Monitor Stream

99

Stream live swarm events using the Monitor tool for real-time observability

技能
ruvnet

Plan Capacity

99

Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.

技能
pjt222