Whisper

Skill Verified Active

Part of:Agent Native Research Artifact (ARA) Tooling

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Purpose

To provide highly accurate and robust speech recognition across a wide range of languages and audio conditions, enabling automated transcription, translation, and audio processing tasks.

Features

Speech-to-text transcription for 99 languages
Translation of audio to English
Language identification for audio input
Support for multiple model sizes (tiny to large, turbo)
GPU acceleration for faster processing
Word-level timestamp generation

Use Cases

Automating podcast and video transcription
Transcribing noisy or multilingual audio recordings
Extracting text from meeting audio for notes
Processing audio for multilingual content analysis

Non-Goals

Real-time streaming transcription (suggests faster-whisper for this)
Speaker diarization (identifying different speakers)
Advanced audio manipulation or editing

Workflow

Load Whisper model
Load audio file
Transcribe audio (optionally specify language, task, prompt)
Process transcription results (text, segments, timestamps)

Practices

Speech Recognition
Multilingual Processing
Audio Transcription
Model Selection

Prerequisites

Python 3.8-3.11
pip install openai-whisper
ffmpeg (for audio processing)

Code Execution

info:Error HandlingThe SKILL.md and examples show basic error handling through Python's try-except blocks for file operations or model loading, but detailed structured error reporting is not explicitly shown.

Errors

info:Actionable error messagesError handling is present for basic operations like file loading or model loading, but specific remediation steps for more complex issues are not detailed.

Practical Utility

info:Edge casesThe documentation mentions limitations like hallucinations, long-form accuracy degradation, and accent variance, but does not detail specific recovery steps for each.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified

97 /100

Analyzed about 19 hours ago

Trust Signals

Last commit16 days ago

GitHub owner Orchestra-Research

Stars8.3k

Downloads 0

LicenseMIT

Websiteorchestra-research.com

Status

View Source

Similar Extensions

Whisper

Skill

davila7

Transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

Skill

openai

Openai Whisper Api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Skill

steipete

Baoyu Imagine

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.

Skill

jimliu

Openai Whisper

Local speech-to-text with the Whisper CLI (no API key).

Skill

steipete

Cli Anything Videocaptioner

AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.

Skill

hkuds