Whisper
Skill Verified ActiveOpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
To provide highly accurate and robust speech recognition across a wide range of languages and audio conditions, enabling automated transcription, translation, and audio processing tasks.
Features
- Speech-to-text transcription for 99 languages
- Translation of audio to English
- Language identification for audio input
- Support for multiple model sizes (tiny to large, turbo)
- GPU acceleration for faster processing
- Word-level timestamp generation
Use Cases
- Automating podcast and video transcription
- Transcribing noisy or multilingual audio recordings
- Extracting text from meeting audio for notes
- Processing audio for multilingual content analysis
Non-Goals
- Real-time streaming transcription (suggests faster-whisper for this)
- Speaker diarization (identifying different speakers)
- Advanced audio manipulation or editing
Workflow
- Load Whisper model
- Load audio file
- Transcribe audio (optionally specify language, task, prompt)
- Process transcription results (text, segments, timestamps)
Practices
- Speech Recognition
- Multilingual Processing
- Audio Transcription
- Model Selection
Prerequisites
- Python 3.8-3.11
- pip install openai-whisper
- ffmpeg (for audio processing)
Code Execution
- info:Error HandlingThe SKILL.md and examples show basic error handling through Python's try-except blocks for file operations or model loading, but detailed structured error reporting is not explicitly shown.
Errors
- info:Actionable error messagesError handling is present for basic operations like file loading or model loading, but specific remediation steps for more complex issues are not detailed.
Practical Utility
- info:Edge casesThe documentation mentions limitations like hallucinations, long-form accuracy degradation, and accent variance, but does not detail specific recovery steps for each.
Installation
First, add the marketplace
/plugin marketplace add Orchestra-Research/AI-Research-SKILLs/plugin install AI-Research-SKILLs@ai-research-skillsQuality Score
VerifiedTrust Signals
Similar Extensions
Whisper
95OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
Transcribe
97Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Openai Whisper Api
95Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Baoyu Imagine
99AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.
Openai Whisper
99Local speech-to-text with the Whisper CLI (no API key).
Cli Anything Videocaptioner
99AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.