跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Flash Attention

技能 已验证 活跃

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

目的

To enable users to significantly accelerate transformer training and inference, and reduce GPU memory usage by leveraging Flash Attention, especially for long sequence lengths.

功能

  • 2-4x speedup for transformer attention
  • 10-20x memory reduction for attention computations
  • Support for PyTorch native SDPA integration
  • Integration with flash-attn library for advanced features
  • Support for H100 FP8 optimization and sliding window attention

使用场景

  • Training transformers with long sequences (>512 tokens)
  • Running inference with long context windows
  • Mitigating GPU memory issues during transformer training
  • Accelerating inference for transformer-based applications

非目标

  • Providing a direct tool for agents to call
  • Replacing the need for GPU hardware
  • Optimizing attention mechanisms not based on transformers

工作流

  1. Check PyTorch version (>=2.2) and GPU compatibility
  2. Install flash-attn library or ensure PyTorch has native support
  3. Integrate Flash Attention into model code using provided examples
  4. Verify speedup and accuracy using profiling and comparison scripts
  5. Optionally enable advanced features like sliding window or FP8 on H100

先决条件

  • NVIDIA GPU (Ampere+ recommended)
  • CUDA 11.8+ / 12.0+
  • PyTorch 2.2+
  • Python 3.8+

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证
95 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标8.3k
许可证MIT
状态
查看源代码

类似扩展

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

技能
ruvnet

MongoDB Connection Optimizer

100

为任何支持的驱动程序语言优化 MongoDB 客户端连接配置(池、超时、模式)。在处理/更新/审查实例化或配置 MongoDB 客户端(例如,调用 `connect()` 时)、配置连接池、对连接错误(ECONNREFUSED、超时、池耗尽)进行故障排除、优化与连接相关的性能问题时,请使用此技能。这包括构建具有 MongoDB 的无服务器函数、创建使用 MongoDB 的 API 端点、优化高流量 MongoDB 应用程序、创建长期运行任务和并发性,或调试与连接相关的失败等场景。

技能
mongodb

Sql Optimization

100

Universal SQL performance optimization assistant for comprehensive query tuning, indexing strategies, and database performance analysis across all SQL databases (MySQL, PostgreSQL, SQL Server, Oracle). Provides execution plan analysis, pagination optimization, batch operations, and performance monitoring guidance.

技能
github

Core Web Vitals

100

优化核心 Web 指标(LCP、INP、CLS),以获得更好的页面体验和搜索排名。当被要求“改进核心 Web 指标”、“修复 LCP”、“减少 CLS”、“优化 INP”、“页面体验优化”或“修复布局偏移”时使用。

技能
addyosmani

Vector Index Tuning

99

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

技能
wshobson

Oraclaw Solver

100

为 AI 代理提供工业级的调度和资源优化。在几毫秒内通过能源匹配、预算分配和任何 LP/MIP 约束问题来解决任务调度。

技能
Whatsonyourmind