跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Whisper

技能 已验证 活跃

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

目的

To provide highly accurate and robust speech recognition across a wide range of languages and audio conditions, enabling automated transcription, translation, and audio processing tasks.

功能

  • Speech-to-text transcription for 99 languages
  • Translation of audio to English
  • Language identification for audio input
  • Support for multiple model sizes (tiny to large, turbo)
  • GPU acceleration for faster processing
  • Word-level timestamp generation

使用场景

  • Automating podcast and video transcription
  • Transcribing noisy or multilingual audio recordings
  • Extracting text from meeting audio for notes
  • Processing audio for multilingual content analysis

非目标

  • Real-time streaming transcription (suggests faster-whisper for this)
  • Speaker diarization (identifying different speakers)
  • Advanced audio manipulation or editing

工作流

  1. Load Whisper model
  2. Load audio file
  3. Transcribe audio (optionally specify language, task, prompt)
  4. Process transcription results (text, segments, timestamps)

实践

  • Speech Recognition
  • Multilingual Processing
  • Audio Transcription
  • Model Selection

先决条件

  • Python 3.8-3.11
  • pip install openai-whisper
  • ffmpeg (for audio processing)

Code Execution

  • info:Error HandlingThe SKILL.md and examples show basic error handling through Python's try-except blocks for file operations or model loading, but detailed structured error reporting is not explicitly shown.

Errors

  • info:Actionable error messagesError handling is present for basic operations like file loading or model loading, but specific remediation steps for more complex issues are not detailed.

Practical Utility

  • info:Edge casesThe documentation mentions limitations like hallucinations, long-form accuracy degradation, and accent variance, but does not detail specific recovery steps for each.

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证
97 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标8.3k
许可证MIT
状态
查看源代码