Skip to main content

Transcribe

Skill Verified Active

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

Purpose

Transcribe audio files to text with optional diarization and known-speaker hints, enabling extraction of speech content from recordings.

Features

  • Transcribe audio to text
  • Speaker diarization
  • Known speaker referencing
  • Support for multiple audio formats
  • Configurable response formats (text, JSON, diarized_json)

Use Cases

  • Transcribing meeting recordings to extract key discussion points.
  • Converting interviews or podcasts into searchable text documents.
  • Labeling speakers in audio content for easier content indexing.
  • Extracting spoken text from video files.

Non-Goals

  • Performing sentiment analysis on the transcribed text.
  • Translating transcribed audio into other languages.
  • Real-time transcription (focus is on file-based processing).
  • Managing or storing audio files beyond outputting transcripts.

Execution

  • info:Pinned dependenciesWhile dependency installation is described, explicit pinning via a lockfile is not evident for the 'openai' library.

Installation

npx skills add openai/skills

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
97 /100
Analyzed about 20 hours ago

Trust Signals

Last commitabout 21 hours ago
Stars19k
Status
View Source

Similar Extensions

Whisper

97

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill
Orchestra-Research

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

Skill
openai

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

Skill
guia-matthieu

Sheet Music Publisher

99

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

Skill
bitwize-music-studio

Elevenlabs Stt

98

ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe

Skill
inferen-sh

Openai Whisper Api

95

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Skill
steipete

© 2025 SkillRepo · Find the right skill, skip the noise.