跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Observability Gap Hunt

技能 已验证 活跃
属于:Swe Skills

Inspects services, jobs, and code paths for missing or weak logs, metrics, traces, alerts, dashboards, or deployment-linked telemetry, then returns a tightly scoped backlog of observability gaps. Use when a user says `find observability gaps`, `audit telemetry coverage`, `what logs or metrics are missing`, `check alerting coverage`, or asks for a recurring telemetry review. Do NOT use for live incident response, root-cause analysis, generic performance tuning, or a broad code review.

目的

To help users proactively identify and address blind spots in their system's observability coverage, ensuring services can be operated and diagnosed confidently.

功能

  • Identifies missing logs, metrics, traces, and alerts
  • Audits dashboard and deployment-linked telemetry coverage
  • Returns a ranked backlog of observability gaps
  • Focuses on operational visibility rather than performance tuning

使用场景

  • Audit telemetry coverage for a specific service or package
  • Find missing logs or metrics for critical workflows
  • Check alerting coverage before deploying new features
  • Perform recurring observability reviews over time

非目标

  • Live incident response or root-cause analysis
  • Generic performance tuning or optimization
  • Broad application code review without an observability goal
  • Redesigning entire monitoring stacks

安装

/plugin install swe-skills@ckorhonen-swe-skills

质量评分

已验证
98 /100
1 day ago 分析

信任信号

最近提交5 days ago
星标1
许可证MIT
状态
查看源代码

类似扩展

Azure Monitor Query Py

100

Azure Monitor Query SDK for Python. Use for querying Log Analytics workspaces and Azure Monitor metrics. Triggers: "azure-monitor-query", "LogsQueryClient", "MetricsQueryClient", "Log Analytics", "Kusto queries", "Azure metrics".

技能
microsoft

Query Netdata Cloud

100

Query Netdata Cloud via its REST API -- metrics, logs (systemd-journal / windows-events / otel-logs), topology graphs (topology:snmp), network flows (flows:netflow), alerts, dynamic configuration (DynCfg), and generic Functions on a node. Use when the user asks about querying Netdata Cloud, fetching metrics from the cloud, querying logs / topology / netflow / sflow / ipfix through Cloud, listing or modifying configurations via DynCfg, calling agent Functions through Cloud, listing spaces/rooms/nodes, or building a curl command against `app.netdata.cloud`. Pairs with the `query-netdata-agents` skill when direct-agent access is needed.

技能
netdata

Query Netdata Agents

99

Query Netdata Agents (parents and children) directly via their HTTP API on port 19999. Includes a bearer-token helper that mints, caches, and transparently refreshes a per-agent bearer from a long-lived Netdata Cloud token, and auto-detects bearer-protected agents. Use when the user asks how to call an agent's REST API or Function directly, query an agent's logs/metrics/alerts directly, mint a bearer token from a cloud token, or work around bearer protection.

技能
netdata

Conduct Empirical Wire Capture

99

Capture outbound HTTP and telemetry from a CLI harness at runtime. Covers capture-channel selection (transcript file vs verbose-fetch stderr vs outbound proxy vs on-disk state), hook-driven per-event capture vs long-running session capture, JSONL output format for diff-friendly artifacts, and the observability table that maps each target to the cheapest channel that captures it. Use when a static finding needs runtime confirmation, when a payload shape is needed for a client re-implementation, or when dark-vs-live disambiguation requires watching what the binary actually sends.

技能
pjt222

Arize Trace Skill

99

Downloads, exports, and inspects existing Arize traces and spans to understand what an LLM app is doing or debug runtime issues. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation using the ax CLI. Use when the user wants to look at existing trace data, see what their LLM app is doing, export traces, download spans, investigate errors, or analyze behavior regressions.

技能
github

Observe Metrics

98

Aggregate and display system metrics with anomaly detection for a time period

技能
ruvnet