Skip to main content

Service Mesh Observability

Skill Verified Active

Implement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication.

Purpose

To enable users to set up robust observability for their service meshes, facilitating debugging of performance issues and the implementation of service-level objectives.

Features

  • Implement distributed tracing for service meshes
  • Set up service mesh metrics and dashboards
  • Provide templates for Istio, Linkerd, Prometheus, Grafana, Jaeger
  • Guide on debugging latency and error issues
  • Assist in defining SLOs for service communication
  • Visualize service dependencies and topology

Use Cases

  • When setting up mesh monitoring and dashboards
  • When debugging latency or error issues within a service mesh
  • When defining and implementing SLOs for inter-service communication
  • When visualizing service dependencies and network topology

Non-Goals

  • Implementing the actual observability backend infrastructure (focus is on configuration and integration)
  • General-purpose monitoring outside of service meshes
  • Deep dives into specific tool internals beyond their integration with service meshes

Installation

First, add the marketplace

/plugin marketplace add wshobson/agents
/plugin install cloud-infrastructure@claude-code-workflows

Quality Score

Verified
98 /100
Analyzed about 11 hours ago

Trust Signals

Last commit2 days ago
Stars35.3k
LicenseMIT
Status
View Source

Similar Extensions

Setup Service Mesh

98

Deploy and configure a service mesh (Istio or Linkerd) to enable secure service-to-service communication, traffic management, observability, and policy enforcement in Kubernetes clusters. Covers installation, mTLS configuration, traffic routing, circuit breaking, and integration with monitoring tools. Use when microservices need encrypted service-to-service communication, fine-grained traffic control for canary or A/B deployments, observability across all service interactions without application changes, or consistent circuit breaking and retry policies.

Skill
pjt222

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Skill
wshobson

Instrument Distributed Tracing

99

Instrument applications with OpenTelemetry for distributed tracing, including auto and manual instrumentation, context propagation, sampling strategies, and integration with Jaeger or Tempo. Use when debugging latency issues in distributed systems, understanding request flow across microservices, correlating traces with logs and metrics for root cause analysis, measuring end-to-end latency, or migrating from legacy tracing systems to OpenTelemetry.

Skill
pjt222

Plan Capacity

99

Perform capacity planning using historical metrics and growth models. Use predict_linear for forecasting, identify resource constraints, calculate headroom, and recommend scaling actions before saturation. Use before seasonal traffic spikes or product launches, during quarterly capacity reviews, when resource utilization trends upward, or before budget planning cycles.

Skill
pjt222

Define SLO/SLI/SLA

99

Establish Service Level Objectives (SLO), Service Level Indicators (SLI), and Service Level Agreements (SLA) with error budget tracking, burn rate alerts, and automated reporting using Prometheus and tools like Sloth or Pyrra. Use when defining reliability targets for customer-facing services, balancing feature velocity against system reliability through error budgets, migrating from arbitrary uptime goals to data-driven metrics, or implementing Site Reliability Engineering practices.

Skill
pjt222

LangSmith Observability

99

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

Skill
Orchestra-Research

© 2025 SkillRepo · Find the right skill, skip the noise.