LLM Cost Optimizer
Skill Verified ActiveUse proactively whenever LLM API costs come up -- or should. Triggers include: 'my AI costs are too high', 'optimize token usage', 'which model should I use', 'LLM spend is out of control', 'implement prompt caching', 'we're about to launch an AI feature', 'build me an AI endpoint'. Don't wait for an explicit cost complaint -- if someone is building an AI feature, designing an LLM endpoint, or choosing between models, cost architecture belongs in the conversation. Apply immediately when any of these are true: a system prompt appears that exceeds a few hundred tokens, all requests are hitting the same model, max_tokens is not set, or no per-feature cost logging exists. NOT for RAG pipeline design (use rag-architect). NOT for improving prompt quality or effectiveness (use senior-prompt-engineer).
To help users proactively manage and significantly reduce LLM API costs by providing expert-level strategies for auditing, optimizing, and architecting cost-efficient AI systems.
Features
- Cost auditing and analysis frameworks
- Model routing strategies based on task complexity
- Prompt caching implementation guidance
- Output length control techniques
- Prompt compression and semantic caching
- Cost-efficient AI architecture design patterns
- Proactive identification of cost optimization opportunities
Use Cases
- When LLM API costs are too high or expected to increase
- When designing new AI features or endpoints
- When choosing between different LLM models for a task
- When needing to implement prompt caching or optimize token usage
Non-Goals
- RAG pipeline design (use rag-architect)
- Improving prompt quality or effectiveness (use senior-prompt-engineer)
- General LLM performance tuning beyond cost implications
Workflow
- Classify the applicable cost optimization mode (Audit, Optimize Existing, Design New).
- Gather necessary context on current state, goals, and workload profile.
- Execute mode-specific steps: Instrument requests, identify cost drivers, or implement architectural controls.
- Apply techniques such as model routing, prompt caching, output length control, prompt compression, or semantic caching.
- Design cost-efficient architecture with budget envelopes, routing layers, and observability.
- Surface proactive flags for cost leaks and cost anomalies.
Installation
/plugin install llm-cost-optimizer@alirezarezvani-claude-skillsQuality Score
VerifiedTrust Signals
Similar Extensions
Arize Prompt Optimization
100Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.
CE Optimize
100Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.
Prompt Optimization
100Applies prompt repetition to improve accuracy for non-reasoning LLMs
Design On Call Rotation
100Design sustainable on-call rotations with balanced schedules, clear escalation policies, fatigue management, and handoff procedures. Minimize burnout while maintaining incident response coverage. Use when setting up on-call for the first time, scaling a team from 2-3 to 5+ engineers, addressing on-call burnout or alert fatigue, improving incident response times, or after a post-mortem identifies handoff issues.
Observability Designer
100Observability Designer (POWERFUL)
Performance Analysis
100Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms