跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Transcribe

技能 已验证 活跃

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

目的

Transcribe audio files to text with optional diarization and known-speaker hints, enabling extraction of speech content from recordings.

功能

  • Transcribe audio to text
  • Speaker diarization
  • Known speaker referencing
  • Support for multiple audio formats
  • Configurable response formats (text, JSON, diarized_json)

使用场景

  • Transcribing meeting recordings to extract key discussion points.
  • Converting interviews or podcasts into searchable text documents.
  • Labeling speakers in audio content for easier content indexing.
  • Extracting spoken text from video files.

非目标

  • Performing sentiment analysis on the transcribed text.
  • Translating transcribed audio into other languages.
  • Real-time transcription (focus is on file-based processing).
  • Managing or storing audio files beyond outputting transcripts.

Execution

  • info:Pinned dependenciesWhile dependency installation is described, explicit pinning via a lockfile is not evident for the 'openai' library.

安装

npx skills add openai/skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
97 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标19k
状态
查看源代码

类似扩展

Whisper

97

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

技能
Orchestra-Research

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

技能
openai

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

技能
guia-matthieu

Sheet Music Publisher

99

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

技能
bitwize-music-studio

Elevenlabs Stt

98

ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe

技能
inferen-sh

Openai Whisper Api

95

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

技能
steipete