Skip to main content

LangSmith Observability

Skill Verified Active

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.

Purpose

To provide a robust platform for debugging, evaluating, and monitoring LLM applications by leveraging LangSmith's tracing, dataset, and monitoring features.

Features

  • LLM tracing for inputs, outputs, and latency
  • Systematic model evaluation against datasets
  • Production system monitoring for metrics and errors
  • Integration with OpenAI, Anthropic, LangChain, LlamaIndex
  • Client API for programmatic interaction with LangSmith

Use Cases

  • Debugging LLM application issues
  • Evaluating model outputs against datasets
  • Monitoring production LLM systems
  • Building regression testing pipelines for AI applications

Non-Goals

  • General deep learning experiment tracking (use Weights & Biases)
  • General ML lifecycle management (use MLflow)
  • ML monitoring focused on data drift (use Arize/WhyLabs)

Practices

  • LLM Observability
  • LLM Evaluation
  • LLM Monitoring
  • LLM Tracing
  • LLMOps

Prerequisites

  • Python 3.7+
  • LangSmith account and API key
  • Set LANGSMITH_API_KEY and LANGSMITH_TRACING environment variables

Execution

  • info:Pinned dependenciesDependencies are listed in SKILL.md, but not explicitly pinned with versions in a lockfile, which could lead to potential compatibility issues.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified
99 /100
Analyzed about 19 hours ago

Trust Signals

Last commit16 days ago
Stars8.3k
LicenseMIT
Status
View Source

Similar Extensions

Playwright Best Practices

100

Use when writing Playwright tests, fixing flaky tests, debugging failures, implementing Page Object Model, configuring CI/CD, optimizing performance, mocking APIs, handling authentication or OAuth, testing accessibility (axe-core), file uploads/downloads, date/time mocking, WebSockets, geolocation, permissions, multi-tab/popup flows, mobile/responsive layouts, touch gestures, GraphQL, error handling, offline mode, multi-user collaboration, third-party services (payments, email verification), console error monitoring, global setup/teardown, test annotations (skip, fixme, slow), test tags (@smoke, @fast, @critical, filtering with --grep), project dependencies, security testing (XSS, CSRF, auth), performance budgets (Web Vitals, Lighthouse), iframes, component testing, canvas/WebGL, service workers/PWA, test coverage, i18n/localization, Electron apps, or browser extension testing. Covers E2E, component, API, visual, accessibility, security, Electron, and extension testing.

Skill
currents-dev

Status

100

Show DAG state, agent progress, and branch status for an AgentHub session.

Skill
alirezarezvani

Observability Designer

100

Observability Designer (POWERFUL)

Skill
alirezarezvani

Grafana Dashboards

99

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

Skill
wshobson

Monitor Stream

99

Stream live swarm events using the Monitor tool for real-time observability

Skill
ruvnet

Instrument Distributed Tracing

99

Instrument applications with OpenTelemetry for distributed tracing, including auto and manual instrumentation, context propagation, sampling strategies, and integration with Jaeger or Tempo. Use when debugging latency issues in distributed systems, understanding request flow across microservices, correlating traces with logs and metrics for root cause analysis, measuring end-to-end latency, or migrating from legacy tracing systems to OpenTelemetry.

Skill
pjt222

© 2025 SkillRepo · Find the right skill, skip the noise.