Whisper
技能 活跃OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
To provide a powerful and flexible solution for converting spoken audio into text, suitable for a wide range of applications from podcast transcription to multilingual audio analysis.
功能
- Multilingual speech-to-text transcription
- Translation to English
- Language identification
- Support for multiple model sizes
- Configurable transcription options
使用场景
- Transcribing podcasts and videos
- Automating meeting notes
- Processing multilingual audio content
- Speech-to-text conversion in noisy environments
非目标
- Real-time streaming transcription (faster-whisper is mentioned as an alternative)
- Speaker diarization (identifying different speakers)
- Managed API service (focus is on local execution)
Trust
- warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a slow response rate to opened issues.
安装
npx skills add davila7/claude-code-templates通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。
质量评分
类似扩展
Whisper
97OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.
Openai Whisper Api
95Transcribe audio via OpenAI Audio Transcriptions API (Whisper).
Cli Anything Videocaptioner
99AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.
Transcribe
97Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.
Video to Text Bcut
96Transcribe video/audio URL to text + word-level timestamps using Bilibili Bcut ASR API (free, no API key). Preferred for Chinese content — Bcut gives character-level timestamps vs Whisper word-level. Returns text + segments [{start, end, text}]. Requires yt-dlp + ffmpeg.
Whisper Transcription
95Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives