Skip to main content

Whisper

Skill Verified Active

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Purpose

To provide highly accurate and robust speech recognition across a wide range of languages and audio conditions, enabling automated transcription, translation, and audio processing tasks.

Features

  • Speech-to-text transcription for 99 languages
  • Translation of audio to English
  • Language identification for audio input
  • Support for multiple model sizes (tiny to large, turbo)
  • GPU acceleration for faster processing
  • Word-level timestamp generation

Use Cases

  • Automating podcast and video transcription
  • Transcribing noisy or multilingual audio recordings
  • Extracting text from meeting audio for notes
  • Processing audio for multilingual content analysis

Non-Goals

  • Real-time streaming transcription (suggests faster-whisper for this)
  • Speaker diarization (identifying different speakers)
  • Advanced audio manipulation or editing

Workflow

  1. Load Whisper model
  2. Load audio file
  3. Transcribe audio (optionally specify language, task, prompt)
  4. Process transcription results (text, segments, timestamps)

Practices

  • Speech Recognition
  • Multilingual Processing
  • Audio Transcription
  • Model Selection

Prerequisites

  • Python 3.8-3.11
  • pip install openai-whisper
  • ffmpeg (for audio processing)

Code Execution

  • info:Error HandlingThe SKILL.md and examples show basic error handling through Python's try-except blocks for file operations or model loading, but detailed structured error reporting is not explicitly shown.

Errors

  • info:Actionable error messagesError handling is present for basic operations like file loading or model loading, but specific remediation steps for more complex issues are not detailed.

Practical Utility

  • info:Edge casesThe documentation mentions limitations like hallucinations, long-form accuracy degradation, and accent variance, but does not detail specific recovery steps for each.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

Verified
97 /100
Analyzed about 19 hours ago

Trust Signals

Last commit16 days ago
Stars8.3k
LicenseMIT
Status
View Source

© 2025 SkillRepo · Find the right skill, skip the noise.