Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

LLM Cost Optimizer

Skill Verifiziert Aktiv

Use proactively whenever LLM API costs come up -- or should. Triggers include: 'my AI costs are too high', 'optimize token usage', 'which model should I use', 'LLM spend is out of control', 'implement prompt caching', 'we're about to launch an AI feature', 'build me an AI endpoint'. Don't wait for an explicit cost complaint -- if someone is building an AI feature, designing an LLM endpoint, or choosing between models, cost architecture belongs in the conversation. Apply immediately when any of these are true: a system prompt appears that exceeds a few hundred tokens, all requests are hitting the same model, max_tokens is not set, or no per-feature cost logging exists. NOT for RAG pipeline design (use rag-architect). NOT for improving prompt quality or effectiveness (use senior-prompt-engineer).

Zweck

To help users proactively manage and significantly reduce LLM API costs by providing expert-level strategies for auditing, optimizing, and architecting cost-efficient AI systems.

Funktionen

  • Cost auditing and analysis frameworks
  • Model routing strategies based on task complexity
  • Prompt caching implementation guidance
  • Output length control techniques
  • Prompt compression and semantic caching
  • Cost-efficient AI architecture design patterns
  • Proactive identification of cost optimization opportunities

Anwendungsfälle

  • When LLM API costs are too high or expected to increase
  • When designing new AI features or endpoints
  • When choosing between different LLM models for a task
  • When needing to implement prompt caching or optimize token usage

Nicht-Ziele

  • RAG pipeline design (use rag-architect)
  • Improving prompt quality or effectiveness (use senior-prompt-engineer)
  • General LLM performance tuning beyond cost implications

Workflow

  1. Classify the applicable cost optimization mode (Audit, Optimize Existing, Design New).
  2. Gather necessary context on current state, goals, and workload profile.
  3. Execute mode-specific steps: Instrument requests, identify cost drivers, or implement architectural controls.
  4. Apply techniques such as model routing, prompt caching, output length control, prompt compression, or semantic caching.
  5. Design cost-efficient architecture with budget envelopes, routing layers, and observability.
  6. Surface proactive flags for cost leaks and cost anomalies.

Installation

/plugin install llm-cost-optimizer@alirezarezvani-claude-skills

Qualitätspunktzahl

Verifiziert
98 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne14.6k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill
github

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

Skill
EveryInc

Prompt Optimization

100

Wendet Prompt-Wiederholung an, um die Genauigkeit für LLMs ohne Schlussfolgerungsfähigkeit zu verbessern

Skill
asklokesh

Design On Call Rotation

100

Design sustainable on-call rotations with balanced schedules, clear escalation policies, fatigue management, and handoff procedures. Minimize burnout while maintaining incident response coverage. Use when setting up on-call for the first time, scaling a team from 2-3 to 5+ engineers, addressing on-call burnout or alert fatigue, improving incident response times, or after a post-mortem identifies handoff issues.

Skill
pjt222

Observability Designer

100

Observability Designer (POWERFUL)

Skill
alirezarezvani

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

Skill
ruvnet