Monitoring Expert
技能 已验证 活跃Configures monitoring systems, implements structured logging pipelines, creates Prometheus/Grafana dashboards, defines alerting rules, and instruments distributed tracing. Implements Prometheus/Grafana stacks, conducts load testing, performs application profiling, and plans infrastructure capacity. Use when setting up application monitoring, adding observability to services, debugging production issues with logs/metrics/traces, running load tests with k6 or Artillery, profiling CPU/memory bottlenecks, or forecasting capacity needs.
To empower developers to implement effective application monitoring, observability, and performance testing by providing practical code examples and configuration guidance.
功能
- Configures structured logging pipelines
- Implements Prometheus/Grafana dashboards and metrics
- Instruments distributed tracing
- Defines alerting rules
- Conducts load testing and performance profiling
使用场景
- Setting up application monitoring and observability
- Debugging production issues with logs, metrics, and traces
- Running load tests with tools like k6 or Artillery
- Profiling CPU/memory bottlenecks and forecasting capacity needs
非目标
- Logging sensitive data
- Alerting on every error (alert fatigue)
- Using string interpolation in logs
- Skipping correlation IDs in distributed systems
工作流
- Assess monitoring needs
- Instrument application with logging, metrics, traces
- Configure data collection and storage
- Build dashboards using RED/USE methods
- Define and validate alerts on critical paths
实践
- Structured Logging
- Metrics Implementation
- Distributed Tracing
- Alerting Strategy
- Performance Testing
安装
请先添加 Marketplace
/plugin marketplace add jeffallan/claude-skills/plugin install claude-skills@fullstack-dev-skills质量评分
已验证类似扩展
Observability Designer
100Observability Designer (POWERFUL)
LangSmith Observability
99LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.
Service Mesh Observability
98Implement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication.
Ops Fires
100Production incidents dashboard. Reads ECS health, Sentry errors, CI failures. Offers to dispatch fix agents for active fires.
Meta Observer
100Track skill performance and emerging patterns
Sentry Python SDK
100Full Sentry SDK setup for Python. Use when asked to "add Sentry to Python", "install sentry-sdk", "setup Sentry in Python", or configure error monitoring, tracing, profiling, logging, metrics, crons, or AI monitoring for Python applications. Supports Django, Flask, FastAPI, Celery, Starlette, AIOHTTP, Tornado, and more.