此内容尚未提供您的语言版本,正在以英文显示。

Speculative Decoding

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for real-time applications, or deploying models with limited compute. Covers draft models, tree-based attention, Jacobi iteration, parallel token generation, and production deployment strategies.

目的

To enable users to significantly speed up LLM inference and reduce latency by leveraging advanced decoding techniques like speculative decoding, Medusa, and Lookahead decoding.

功能

Accelerates LLM inference using speculative decoding
Implements Medusa's multiple decoding heads for faster generation
Utilizes Lookahead Decoding (Jacobi iteration) for parallel token generation
Provides code examples for integration with Transformers and vLLM
Details training methods and hyperparameter tuning for Medusa and Lookahead

使用场景

Optimizing LLM inference speed (1.5-3.6x speedup)
Reducing latency for real-time applications (chatbots, code generation)
Deploying models efficiently on limited compute hardware
Generating text faster without quality loss

非目标

Model architecture design beyond adding decoding heads
Training large language models from scratch
Providing inference servers (focus is on decoding techniques)
Handling tasks outside of LLM inference optimization

Practical Utility

info:Edge casesThe SKILL.md discusses hyperparameter tuning and method selection, which touches on optimizing performance but does not explicitly list failure modes with recovery steps.

Execution

info:Pinned dependenciesDependencies are listed, but not explicitly pinned with lockfiles in the SKILL.md, which could lead to issues if newer versions break compatibility.

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

98 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Speculative Decoding

技能

davila7

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

技能

ruvnet

Next Cache Components

100

Next.js 16 缓存组件 - PPR、use cache 指令、cacheLife、cacheTag、updateTag

技能

vercel-labs

MongoDB Connection Optimizer

100

为任何支持的驱动程序语言优化 MongoDB 客户端连接配置（池、超时、模式）。在处理/更新/审查实例化或配置 MongoDB 客户端（例如，调用 `connect()` 时）、配置连接池、对连接错误（ECONNREFUSED、超时、池耗尽）进行故障排除、优化与连接相关的性能问题时，请使用此技能。这包括构建具有 MongoDB 的无服务器函数、创建使用 MongoDB 的 API 端点、优化高流量 MongoDB 应用程序、创建长期运行任务和并发性，或调试与连接相关的失败等场景。

技能

mongodb

One On Ones

100

Design and run effective 1:1 meetings that build trust, develop people, and surface problems early. Covers cadence setup, agenda ownership, conversation frameworks, question banks, and handling difficult topics. Use when: a new manager learning to run 1:1s, resetting unproductive 1:1s that became status updates, onboarding a new direct report, preparing for a difficult performance conversation, building trust with a new team, or coaching through career development discussions.

技能

guia-matthieu

Sql Optimization

100

Universal SQL performance optimization assistant for comprehensive query tuning, indexing strategies, and database performance analysis across all SQL databases (MySQL, PostgreSQL, SQL Server, Oracle). Provides execution plan analysis, pagination optimization, batch operations, and performance monitoring guidance.

技能

github