Flash Attention

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory issues with attention, or need faster inference. Supports PyTorch native SDPA, flash-attn library, H100 FP8, and sliding window attention.

Purpose

To enable users to significantly accelerate transformer training and inference, and reduce GPU memory usage by leveraging Flash Attention, especially for long sequence lengths.

Features

2-4x speedup for transformer attention
10-20x memory reduction for attention computations
Support for PyTorch native SDPA integration
Integration with flash-attn library for advanced features
Support for H100 FP8 optimization and sliding window attention

Use Cases

Training transformers with long sequences (>512 tokens)
Running inference with long context windows
Mitigating GPU memory issues during transformer training
Accelerating inference for transformer-based applications

Non-Goals

Providing a direct tool for agents to call
Replacing the need for GPU hardware
Optimizing attention mechanisms not based on transformers

Workflow

Check PyTorch version (>=2.2) and GPU compatibility
Install flash-attn library or ensure PyTorch has native support
Integrate Flash Attention into model code using provided examples
Verify speedup and accuracy using profiling and comparison scripts
Optionally enable advanced features like sliding window or FP8 on H100

Prerequisites

NVIDIA GPU (Ampere+ recommended)
CUDA 11.8+ / 12.0+
PyTorch 2.2+
Python 3.8+

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

95 /100

Analyzed 1 day ago

Trust Signals

Last commit17 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

Skill

ruvnet

MongoDB Connection Optimizer

100

Optimize MongoDB client connection configuration (pools, timeouts, patterns) for any supported driver language. Use this skill when working/updating/reviewing on functions that instantiate or configure a MongoDB client (eg, when calling `connect()`), configuring connection pools, troubleshooting connection errors (ECONNREFUSED, timeouts, pool exhaustion), optimizing performance issues related to connections. This includes scenarios like building serverless functions with MongoDB, creating API endpoints that use MongoDB, optimizing high-traffic MongoDB applications, creating long-running tasks and concurrency, or debugging connection-related failures.

Skill

mongodb

Sql Optimization

100

Universal SQL performance optimization assistant for comprehensive query tuning, indexing strategies, and database performance analysis across all SQL databases (MySQL, PostgreSQL, SQL Server, Oracle). Provides execution plan analysis, pagination optimization, batch operations, and performance monitoring guidance.

Skill

github

Core Web Vitals

100

Optimize Core Web Vitals (LCP, INP, CLS) for better page experience and search ranking. Use when asked to "improve Core Web Vitals", "fix LCP", "reduce CLS", "optimize INP", "page experience optimization", or "fix layout shifts".

Skill

addyosmani

Vector Index Tuning

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

Skill

wshobson

Oraclaw Solver

100

Industrial-grade scheduling and resource optimization for AI agents. Solve task scheduling with energy matching, budget allocation, and any LP/MIP constraint problem in milliseconds.

Skill

Whatsonyourmind