此内容尚未提供您的语言版本,正在以英文显示。

Whisper

技能已验证活跃

属于:Agent Native Research Artifact (ARA) Tooling

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

目的

To provide highly accurate and robust speech recognition across a wide range of languages and audio conditions, enabling automated transcription, translation, and audio processing tasks.

功能

Speech-to-text transcription for 99 languages
Translation of audio to English
Language identification for audio input
Support for multiple model sizes (tiny to large, turbo)
GPU acceleration for faster processing
Word-level timestamp generation

使用场景

Automating podcast and video transcription
Transcribing noisy or multilingual audio recordings
Extracting text from meeting audio for notes
Processing audio for multilingual content analysis

非目标

Real-time streaming transcription (suggests faster-whisper for this)
Speaker diarization (identifying different speakers)
Advanced audio manipulation or editing

工作流

Load Whisper model
Load audio file
Transcribe audio (optionally specify language, task, prompt)
Process transcription results (text, segments, timestamps)

实践

Speech Recognition
Multilingual Processing
Audio Transcription
Model Selection

先决条件

Python 3.8-3.11
pip install openai-whisper
ffmpeg (for audio processing)

Code Execution

info:Error HandlingThe SKILL.md and examples show basic error handling through Python's try-except blocks for file operations or model loading, but detailed structured error reporting is not explicitly shown.

Errors

info:Actionable error messagesError handling is present for basic operations like file loading or model loading, but specific remediation steps for more complex issues are not detailed.

Practical Utility

info:Edge casesThe documentation mentions limitations like hallucinations, long-form accuracy degradation, and accent variance, but does not detail specific recovery steps for each.

安装

请先添加 Marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

质量评分

已验证

97 /100

1 day ago 分析

信任信号

最近提交17 days ago

GitHub 所有者 Orchestra-Research

星标8.3k

下载量 0

许可证MIT

网站orchestra-research.com

状态

查看源代码

类似扩展

Whisper

技能

davila7

Transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

技能

openai

Openai Whisper Api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

技能

steipete

Baoyu Imagine

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.

技能

jimliu

Openai Whisper

Local speech-to-text with the Whisper CLI (no API key).

技能

steipete

Cli Anything Videocaptioner

AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.

技能

hkuds