Incident Response
插件 已验证 活跃Production incident management, triage workflows, and automated incident resolution
To provide a comprehensive, automated system for managing production incidents and resolving software issues efficiently and effectively.
功能
- Automated incident response orchestration
- Multi-agent debugging and root cause analysis
- Structured workflows with user approval checkpoints
- Comprehensive test and verification phases
- Blameless postmortem and prevention planning
使用场景
- Responding to critical production outages
- Automating the debugging of complex software bugs
- Creating structured runbooks for incident management
- Ensuring consistent quality and reliability through automated fixes
非目标
- Replacing human decision-making entirely
- Handling hardware failures directly
- Providing a general-purpose code generation tool
安装
请先添加 Marketplace
/plugin marketplace add wshobson/agents/plugin install incident-response@claude-code-workflows包含 3 个扩展
Skill (3)
Create structured incident response runbooks with step-by-step procedures, escalation paths, and recovery actions. Use this skill when building a service outage runbook for a payment processing system; creating database incident procedures covering connection pool exhaustion, replication lag, and disk space alerts; onboarding new on-call engineers who need step-by-step recovery guides written for a 3 AM brain; or standardizing escalation matrices across multiple engineering teams.
Master on-call shift handoffs with context transfer, escalation procedures, and documentation. Use this skill when transitioning on-call responsibilities between engineers and ensuring the incoming responder has full situational awareness, when writing a shift summary that captures active incidents, ongoing investigations, and recent changes, when handing off mid-incident so a fresh engineer can take over the incident commander role without losing context, when onboarding a new engineer to the on-call rotation for the first time, or when auditing and improving the quality of existing handoff processes across teams.
Write effective blameless postmortems with root cause analysis, timelines, and action items. Use when conducting incident reviews, writing postmortem documents, or improving incident response processes.
质量评分
已验证类似扩展
Dotforge Stack Python Fastapi
100Python 3.12+ with FastAPI, async/await, type hints, and Ruff linting rules for Claude Code.
Deployhq
100使用 DeployHQ CLI 部署代码、管理服务器和自动化基础架构
Ag2 Agent Builder
100Build AG2 (AutoGen) multi-agent systems with slash commands: scaffold agents, wire workflows, create tools, and review code
Slo Architect
99End-to-end SLO/SLI/error-budget discipline per Google SRE Workbook. Ships SLO designer (refuses to render without required fields), error-budget calculator with multi-window burn-rate alert thresholds (PromQL-shaped), and SLO reviewer that catches the 7 common bugs (target too high, window too short, no SLI definition, CPU-as-SLI, etc.). 4 references on principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. Asset templates for SLO YAML and error budget policy. /slo-design slash command. NOT a generic observability skill.
Chaos Engineering
99End-to-end chaos engineering discipline: design experiments with hypothesis + steady-state metric + blast radius + abort criteria, calculate risk score against error budget, and generate blameless postmortems. 3 stdlib Python tools (experiment_designer, blast_radius_calculator, experiment_postmortem), 4 references on chaos principles + experiment design + 7-attack taxonomy + tooling landscape (Chaos Toolkit/Mesh/Litmus/Gremlin/AWS FIS/DIY), templates for plans + postmortems, and a /chaos-experiment slash command. Composes with feature-flags-architect (kill switches as abort triggers) and kubernetes-operator (chaos targets).
GDPR Breach Sentinel
97Incident response and legal compliance guidance for data breaches under GDPR Articles 33 & 34