Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Analyze Kernel Bottleneck

Skill Verifiziert Aktiv

Systematically identify whether a GPU kernel is compute-bound, memory-bound, or latency-bound using roofline analysis, occupancy calculations, compute/load ratio per tile, and SASS instruction inspection. Produces a decision matrix for optimization strategy selection (cp.async, warp interleaving, tiling, double-buffering, or CuAssembler hand-tuning).

Zweck

To systematically identify GPU kernel performance bottlenecks and provide actionable insights for optimization strategies, enabling developers to improve kernel efficiency.

Funktionen

GPU kernel bottleneck classification (compute-bound, memory-bound, latency-bound)
Roofline analysis using arithmetic intensity and machine balance points
Occupancy calculation to determine active warps per SM
Compute/load ratio analysis from SASS instructions
SASS instruction mix and stall code inspection
Shared memory cliff analysis
Decision matrix for optimization strategy selection (cp.async, warp interleaving, etc.)
Structured bottleneck report generation

Anwendungsfälle

Before optimizing any CUDA kernel to establish a baseline and identify bottlenecks
After initial kernel implementation to pinpoint optimization paths
When a kernel's performance does not meet expectations
To decide between various optimization techniques like cp.async, tiling, or algorithmic changes

Nicht-Ziele

Directly modifying CUDA source code
Automated kernel recompilation without user input
Real-time performance monitoring beyond discrete analysis runs
Analysis of CPU-bound aspects of host-device workflows

Installation

/plugin install agent-almanac@pjt222-agent-almanac

Qualitätspunktzahl

Verifiziert

99 /100

Analysiert about 22 hours ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber pjt222

Sterne14

Downloads 308

LizenzMIT

Websitepjt222.github.io

Status

Quellcode ansehen

Ähnliche Erweiterungen

Optimize for GPU

GPU-accelerate Python code using CuPy, Numba CUDA, Warp, cuDF, cuML, cuGraph, KvikIO, cuCIM, cuxfilter, cuVS, cuSpatial, and RAFT. Use whenever the user mentions GPU/CUDA/NVIDIA acceleration, or wants to speed up NumPy, pandas, scikit-learn, scikit-image, NetworkX, GeoPandas, or Faiss workloads. Covers physics simulation, differentiable rendering, mesh ray casting, particle systems (DEM/SPH/fluids), vector/similarity search, GPUDirect Storage file IO, interactive dashboards, geospatial analysis, medical imaging, and sparse eigensolvers. Also use when you see CPU-bound Python code (loops, large arrays, ML pipelines, graph analytics, image processing) that would benefit from GPU acceleration, even if not explicitly requested.

Skill

K-Dense-AI

Pipeline Gpu Kernel

Apply software pipelining (double-buffering) to a tiled GPU kernel to overlap global memory loads with Tensor Core computation. Covers prologue/loop/epilogue restructuring, LDG-register vs cp.async (LDGSTS) variant selection based on compute/load ratio, shared memory budget verification against architecture-specific occupancy cliffs, and SASS-level verification of load/compute overlap.

Skill

pjt222

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

Skill

ruvnet

Oraclaw Solver

100

Industrietaugliche Terminplanung und Ressourcenoptimierung für KI-Agenten. Lösen Sie Aufgabenplanung mit Energieabgleich, Budgetzuweisung und beliebigen LP/MIP-Constraint-Problemen in Millisekunden.

Skill

Whatsonyourmind

Oraclaw Decide

100

Entscheidungsintelligenz für KI-Agenten. Analysieren Sie Optionen, bilden Sie Entscheidungsabhängigkeiten mit PageRank ab, erkennen Sie Konflikte zwischen Informationsquellen und finden Sie die wichtigsten Entscheidungen.

Skill

Whatsonyourmind

MongoDB Connection Optimizer

100

Optimieren Sie die Konfiguration von MongoDB-Clientverbindungen (Pools, Timeouts, Muster) für jede unterstützte Treibersprache. Verwenden Sie diese Fähigkeit, wenn Sie an Funktionen arbeiten/diese aktualisieren/überprüfen, die einen MongoDB-Client instanziieren oder konfigurieren (z. B. beim Aufruf von `connect()`), Verbindungspools konfigurieren, Verbindungsprobleme beheben (ECONNREFUSED, Timeouts, Pool-Erschöpfung), Leistungsprobleme im Zusammenhang mit Verbindungen optimieren. Dies schließt Szenarien wie das Erstellen von serverlosen Funktionen mit MongoDB, das Erstellen von API-Endpunkten, die MongoDB verwenden, die Optimierung von MongoDB-Anwendungen mit hohem Datenverkehr, das Erstellen von langlaufenden Aufgaben und Nebenläufigkeit oder das Debuggen von verbindungsbezogenen Fehlern ein.

Skill

mongodb