Service Mesh Observability
Skill Verified ActiveImplement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication.
To enable users to set up robust observability for their service meshes, facilitating debugging of performance issues and the implementation of service-level objectives.
Features
- Implement distributed tracing for service meshes
- Set up service mesh metrics and dashboards
- Provide templates for Istio, Linkerd, Prometheus, Grafana, Jaeger
- Guide on debugging latency and error issues
- Assist in defining SLOs for service communication
- Visualize service dependencies and topology
Use Cases
- When setting up mesh monitoring and dashboards
- When debugging latency or error issues within a service mesh
- When defining and implementing SLOs for inter-service communication
- When visualizing service dependencies and network topology
Non-Goals
- Implementing the actual observability backend infrastructure (focus is on configuration and integration)
- General-purpose monitoring outside of service meshes
- Deep dives into specific tool internals beyond their integration with service meshes
Installation
First, add the marketplace
/plugin marketplace add wshobson/agents/plugin install cloud-infrastructure@claude-code-workflowsQuality Score
VerifiedTrust Signals
Similar Extensions
Setup Service Mesh
98Deploy and configure a service mesh (Istio or Linkerd) to enable secure service-to-service communication, traffic management, observability, and policy enforcement in Kubernetes clusters. Covers installation, mTLS configuration, traffic routing, circuit breaking, and integration with monitoring tools. Use when microservices need encrypted service-to-service communication, fine-grained traffic control for canary or A/B deployments, observability across all service interactions without application changes, or consistent circuit breaking and retry policies.
Grafana Dashboards
99Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Instrument Distributed Tracing
99Instrument applications with OpenTelemetry for distributed tracing, including auto and manual instrumentation, context propagation, sampling strategies, and integration with Jaeger or Tempo. Use when debugging latency issues in distributed systems, understanding request flow across microservices, correlating traces with logs and metrics for root cause analysis, measuring end-to-end latency, or migrating from legacy tracing systems to OpenTelemetry.
Plan Capacity
99Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.
Define SLO/SLI/SLA
99Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.
LangSmith Observability
99LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.