LLM Cost Optimizer

Skill Verified Active

Use proactively whenever LLM API costs come up -- or should. Triggers include: 'my AI costs are too high', 'optimize token usage', 'which model should I use', 'LLM spend is out of control', 'implement prompt caching', 'we're about to launch an AI feature', 'build me an AI endpoint'. Don't wait for an explicit cost complaint -- if someone is building an AI feature, designing an LLM endpoint, or choosing between models, cost architecture belongs in the conversation. Apply immediately when any of these are true: a system prompt appears that exceeds a few hundred tokens, all requests are hitting the same model, max_tokens is not set, or no per-feature cost logging exists. NOT for RAG pipeline design (use rag-architect). NOT for improving prompt quality or effectiveness (use senior-prompt-engineer).

Purpose

To help users proactively manage and significantly reduce LLM API costs by providing expert-level strategies for auditing, optimizing, and architecting cost-efficient AI systems.

Features

Cost auditing and analysis frameworks
Model routing strategies based on task complexity
Prompt caching implementation guidance
Output length control techniques
Prompt compression and semantic caching
Cost-efficient AI architecture design patterns
Proactive identification of cost optimization opportunities

Use Cases

When LLM API costs are too high or expected to increase
When designing new AI features or endpoints
When choosing between different LLM models for a task
When needing to implement prompt caching or optimize token usage

Non-Goals

RAG pipeline design (use rag-architect)
Improving prompt quality or effectiveness (use senior-prompt-engineer)
General LLM performance tuning beyond cost implications

Workflow

Classify the applicable cost optimization mode (Audit, Optimize Existing, Design New).
Gather necessary context on current state, goals, and workload profile.
Execute mode-specific steps: Instrument requests, identify cost drivers, or implement architectural controls.
Apply techniques such as model routing, prompt caching, output length control, prompt compression, or semantic caching.
Design cost-efficient architecture with budget envelopes, routing layers, and observability.
Surface proactive flags for cost leaks and cost anomalies.

Installation

/plugin install llm-cost-optimizer@alirezarezvani-claude-skills

Quality Score

Verified

98 /100

Analyzed about 18 hours ago

Trust Signals

Last commitabout 21 hours ago

GitHub owner alirezarezvani

Stars14.6k

LicenseMIT

Websitealirezarezvani.medium.com

Status

View Source

Similar Extensions

Arize Prompt Optimization

100

Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.

Skill

github

CE Optimize

100

Run metric-driven iterative optimization loops -- define a measurable goal, run parallel experiments, measure each against hard gates or LLM-as-judge scores, keep improvements, and converge on the best solution. Use when optimizing clustering quality, search relevance, build performance, prompt quality, or any measurable outcome that benefits from systematic experimentation.

Skill

EveryInc

Prompt Optimization

100

Applies prompt repetition to improve accuracy for non-reasoning LLMs

Skill

asklokesh

Design On Call Rotation

100

Design sustainable on-call rotations with balanced schedules, clear escalation policies, fatigue management, and handoff procedures. Minimize burnout while maintaining incident response coverage. Use when setting up on-call for the first time, scaling a team from 2-3 to 5+ engineers, addressing on-call burnout or alert fatigue, improving incident response times, or after a post-mortem identifies handoff issues.

Skill

pjt222

Observability Designer

100

Observability Designer (POWERFUL)

Skill

alirezarezvani

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

Skill

ruvnet