此内容尚未提供您的语言版本,正在以英文显示。

Run Chaos Experiment

技能已验证活跃

Design and execute chaos engineering experiments using Litmus or Chaos Mesh. Test system resilience through controlled fault injection, validate hypothesis-driven tests, and improve failure recovery. Use before major product launches, after architecture changes to validate resilience, during GameDays or disaster recovery drills, to validate assumptions about failure modes, or as part of an SRE maturity program.

目的

To provide a structured, repeatable, and safe process for conducting chaos engineering experiments, thereby improving system resilience and validating failure recovery strategies.

功能

Design and execute chaos engineering experiments
Test system resilience via fault injection
Validate hypothesis-driven tests
Improve failure recovery
Support for Litmus and Chaos Mesh

使用场景

Before major product launches to test stability
After architecture changes to validate resilience
During GameDays or disaster recovery drills
To validate assumptions about failure modes

非目标

Performing general Kubernetes administration beyond tool installation and experiment execution
Automated incident response without human oversight
Continuous chaos engineering without scheduled execution or defined triggers

先决条件

Kubernetes cluster
kubectl and Helm installed
Necessary RBAC permissions for Chaos Mesh/Litmus installation and experiment execution

Scope

info:Dry-run previewWhile not explicitly offering a `--dry-run` flag for the entire skill, the SKILL.md emphasizes staging environments and defines rollback plans and abort conditions, which serve a similar purpose of previewing and mitigating risk.

安装

/plugin install agent-almanac@pjt222-agent-almanac

质量评分

已验证

95 /100

about 23 hours ago 分析

信任信号

最近提交2 days ago

GitHub 所有者 pjt222

星标14

下载量 308

许可证MIT

网站pjt222.github.io

状态

查看源代码

类似扩展

Chaos Engineering

Use when planning, running, or learning from chaos engineering experiments. Triggers on "chaos experiment", "fault injection", "gameday", "resilience test", "blast radius", "steady state", "abort criteria", "Chaos Toolkit", "Chaos Mesh", "Litmus", "Gremlin", "AWS FIS", or any deliberate failure-injection question. Ships experiment designer, blast-radius calculator, and postmortem generator (all stdlib Python), 4 references on chaos principles + experiment design + attack taxonomy + tooling landscape, and a /chaos-experiment slash command. Composes with feature-flags-architect (kill switches as abort triggers) and kubernetes-operator (common chaos targets).

技能

alirezarezvani

Chaos Engineer

Designs chaos experiments, creates failure injection frameworks, and facilitates game day exercises for distributed systems — producing runbooks, experiment manifests, rollback procedures, and post-mortem templates. Use when designing chaos experiments, implementing failure injection frameworks, or conducting game day exercises. Invoke for chaos experiments, resilience testing, blast radius control, game days, antifragile systems, fault injection, Chaos Monkey, Litmus Chaos.

技能

jeffallan

Release It!

Build production-ready systems with stability patterns: circuit breakers, bulkheads, timeouts, and retry logic. Use when the user mentions "production outage", "circuit breaker", "timeout strategy", "deployment pipeline", "chaos engineering", "bulkhead pattern", "retry with backoff", or "health checks". Also trigger when designing resilient microservices, planning zero-downtime deployments, or investigating cascading failure scenarios. Covers capacity planning, health checks, and anti-fragility patterns. For data systems, see ddia-systems. For system architecture, see system-design.

技能

wondelai

K8s Manifest Generator

100

Create production-ready Kubernetes manifests for Deployments, Services, ConfigMaps, and Secrets following best practices and security standards. Use when generating Kubernetes YAML manifests, creating K8s resources, or implementing production-grade Kubernetes configurations.

技能

wshobson

Circuit Breaker Pattern

100

Implement circuit breaker logic for agentic tool calls — tracking tool health, transitioning between closed/open/half-open states, reducing task scope when tools fail, routing to alternatives via capability maps, and enforcing failure budgets to prevent error accumulation. Separates orchestration (deciding what to attempt) from execution (calling tools), following the expeditor pattern. Use when building agents that depend on multiple tools with varying reliability, designing fault-tolerant agentic workflows, recovering gracefully from tool outages mid-task, or hardening existing agents against cascading tool failures.

技能

pjt222

Observability Designer

100

Observability Designer (POWERFUL)

技能

alirezarezvani