Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Design On Call Rotation

Skill Verifiziert Aktiv
Teil von:Agent Almanac

Design sustainable on-call rotations with balanced schedules, clear escalation policies, fatigue management, and handoff procedures. Minimize burnout while maintaining incident response coverage. Use when setting up on-call for the first time, scaling a team from 2-3 to 5+ engineers, addressing on-call burnout or alert fatigue, improving incident response times, or after a post-mortem identifies handoff issues.

Zweck

Design balanced on-call schedules that minimize burnout and ensure effective incident response coverage.

Funktionen

  • Define rotation schedule models (weekly, split, follow-the-sun)
  • Configure tiered escalation policies with delays
  • Implement structured handoff procedures and reminders
  • Establish fatigue management rules and track metrics
  • Document runbooks and essential access information
  • Schedule regular on-call retrospectives for continuous improvement

Anwendungsfälle

  • Setting up on-call rotations for the first time
  • Scaling an on-call system for growing teams
  • Addressing engineer burnout and alert fatigue
  • Improving incident response times and handoff clarity

Nicht-Ziele

  • Automating the alert tuning process itself
  • Directly integrating with specific alerting tools beyond configuration examples
  • Replacing the need for human judgment in incident response

Installation

/plugin install agent-almanac@pjt222-agent-almanac

Qualitätspunktzahl

Verifiziert
100 /100
Analysiert about 21 hours ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne14
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

On Call Handoff Patterns

95

Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use this skill when transitioning on-call responsibilities between engineers and ensuring the incoming responder has full situational awareness, when writing a shift summary that captures active incidents, ongoing investigations, and recent changes, when handing off mid-incident so a fresh engineer can take over the incident commander role without losing context, when onboarding a new engineer to the on-call rotation for the first time, or when auditing and improving the quality of existing handoff processes across teams.

Skill
wshobson

Observability Designer

100

Observability Designer (POWERFUL)

Skill
alirezarezvani

Incident Response

100

Manage active production incidents through detection, triage, mitigation, communication, and resolution with structured roles and decision-making. Use this skill whenever the user has an active incident, a production issue, a service outage, a security incident, or needs to plan incident response procedures. Triggers on incident response, production incident, outage, service down, site down, P0, P1, severity, downtime, on-call, incident commander, status page, postmortem prep. Also triggers when something is actively broken in production and the user is figuring out what to do.

Skill
rampstackco

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Skill
wshobson

Monitor Stream

99

Stream live swarm events using the Monitor tool for real-time observability

Skill
ruvnet

Plan Capacity

99

Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.

Skill
pjt222