Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Tensorrt Llm

Skill Verifiziert Aktiv

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Zweck

To enable users to achieve state-of-the-art performance for LLM inference in production environments by utilizing NVIDIA TensorRT-LLM's advanced optimization and serving capabilities.

Funktionen

  • Optimizes LLM inference with NVIDIA TensorRT-LLM
  • Achieves high throughput and low latency on NVIDIA GPUs
  • Supports production deployment scenarios
  • Demonstrates use of quantization (FP8, INT4)
  • Covers in-flight batching and multi-GPU scaling

Anwendungsfälle

  • Deploying LLMs in production on NVIDIA A100/H100 GPUs
  • Serving models requiring maximum throughput (e.g., 24,000+ tokens/sec)
  • Reducing inference latency for real-time applications
  • Utilizing quantized models (FP8/INT4) for memory and speed gains

Nicht-Ziele

  • Optimizing LLM inference on non-NVIDIA hardware
  • Providing a user-friendly Python-first API like vLLM
  • Edge deployment without NVIDIA GPUs
  • Using non-TensorRT quantization formats like GGUF

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert
98 /100
Analysiert 2 days ago

Vertrauenssignale

Letzter Commit18 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

TensorRT LLM Inference Serving

99

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than PyTorch, or for serving models with quantization (FP8/INT4), in-flight batching, and multi-GPU scaling.

Skill
davila7

Miles RL Training

97

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Skill
Orchestra-Research

VLLM High Performance LLM Serving

97

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

Skill
Orchestra-Research

Miles Rl Training

92

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

Skill
davila7

Incident Response

100

Manage active production incidents through detection, triage, mitigation, communication, and resolution with structured roles and decision-making. Use this skill whenever the user has an active incident, a production issue, a service outage, a security incident, or needs to plan incident response procedures. Triggers on incident response, production incident, outage, service down, site down, P0, P1, severity, downtime, on-call, incident commander, status page, postmortem prep. Also triggers when something is actively broken in production and the user is figuring out what to do.

Skill
rampstackco

Video

100

When the user wants to create, generate, or produce video content using AI tools or programmatic frameworks. Also use when the user mentions 'video production,' 'AI video,' 'Remotion,' 'Hyperframes,' 'HeyGen,' 'Synthesia,' 'Veo,' 'Runway,' 'Kling,' 'Pika,' 'video generation,' 'AI avatar,' 'talking head video,' 'programmatic video,' 'video template,' 'explainer video,' 'product demo video,' 'video pipeline,' or 'make me a video.' Use this for video creation, generation, and production workflows. For video content strategy and what to post, see social-content. For paid video ad creative, see ad-creative.

Skill
coreyhaines31