跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Correlate Observability Signals

技能 已验证 活跃

Unify metrics, logs, and traces for cohesive debugging. Implement exemplars for log-to-trace linking, build unified dashboards using RED/USE methods, and enable rapid root cause analysis across observability signals. Use when investigating complex incidents spanning multiple systems, reducing mean time to resolution, implementing distributed tracing, or moving from siloed tools to a unified observability platform.

目的

To enable cohesive debugging and rapid root cause analysis by unifying metrics, logs, and traces into a single observability view.

功能

  • Implement trace context propagation in logs and metrics
  • Configure Prometheus exemplars for log-to-trace linking
  • Build unified dashboards using RED and USE methods
  • Link logs to traces in Loki for cohesive debugging
  • Provide step-by-step guidance for incident investigation workflows

使用场景

  • Investigating complex incidents spanning multiple systems
  • Reducing mean time to resolution (MTTR)
  • Building unified observability dashboards
  • Implementing distributed tracing across services

非目标

  • Configuring the underlying observability backends (Prometheus, Loki, Tempo)
  • Writing application code beyond instrumentation for trace propagation
  • Replacing existing monitoring and alerting tools

工作流

  1. Implement Trace Context Propagation
  2. Configure Exemplars in Prometheus
  3. Build Unified Dashboard with RED Method
  4. Implement USE Method for Resources
  5. Link Logs to Traces in Loki
  6. Create Unified Incident View

实践

  • Observability
  • Distributed Tracing
  • Debugging
  • Incident Response

先决条件

  • Prometheus (metrics)
  • Log aggregation system (Loki, Elasticsearch, CloudWatch)
  • Distributed tracing backend (Tempo, Jaeger, Zipkin)
  • Optional: Grafana for unified visualization
  • Optional: OpenTelemetry instrumentation

安装

/plugin install agent-almanac@pjt222-agent-almanac

质量评分

已验证
97 /100
about 22 hours ago 分析

信任信号

最近提交1 day ago
星标14
许可证MIT
状态
查看源代码

类似扩展

Service Mesh Observability

98

Implement comprehensive observability for service meshes including distributed tracing, metrics, and visualization. Use when setting up mesh monitoring, debugging latency issues, or implementing SLOs for service communication.

技能
wshobson

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

技能
wshobson

Observability Gap Hunt

98

Inspects services, jobs, and code paths for missing or weak logs, metrics, traces, alerts, dashboards, or deployment-linked telemetry, then returns a tightly scoped backlog of observability gaps. Use when a user says `find observability gaps`, `audit telemetry coverage`, `what logs or metrics are missing`, `check alerting coverage`, or asks for a recurring telemetry review. Do NOT use for live incident response, root-cause analysis, generic performance tuning, or a broad code review.

技能
ckorhonen

Azure Monitor Query Py

100

Azure Monitor Query SDK for Python. Use for querying Log Analytics workspaces and Azure Monitor metrics. Triggers: "azure-monitor-query", "LogsQueryClient", "MetricsQueryClient", "Log Analytics", "Kusto queries", "Azure metrics".

技能
microsoft

Query Netdata Cloud

100

Query Netdata Cloud via its REST API -- metrics, logs (systemd-journal / windows-events / otel-logs), topology graphs (topology:snmp), network flows (flows:netflow), alerts, dynamic configuration (DynCfg), and generic Functions on a node. Use when the user asks about querying Netdata Cloud, fetching metrics from the cloud, querying logs / topology / netflow / sflow / ipfix through Cloud, listing or modifying configurations via DynCfg, calling agent Functions through Cloud, listing spaces/rooms/nodes, or building a curl command against `app.netdata.cloud`. Pairs with the `query-netdata-agents` skill when direct-agent access is needed.

技能
netdata

LangSmith Observability

99

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

技能
Orchestra-Research