Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Transcribe

Skill Verifiziert Aktiv

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

Zweck

Transcribe audio files to text with optional diarization and known-speaker hints, enabling extraction of speech content from recordings.

Funktionen

  • Transcribe audio to text
  • Speaker diarization
  • Known speaker referencing
  • Support for multiple audio formats
  • Configurable response formats (text, JSON, diarized_json)

Anwendungsfälle

  • Transcribing meeting recordings to extract key discussion points.
  • Converting interviews or podcasts into searchable text documents.
  • Labeling speakers in audio content for easier content indexing.
  • Extracting spoken text from video files.

Nicht-Ziele

  • Performing sentiment analysis on the transcribed text.
  • Translating transcribed audio into other languages.
  • Real-time transcription (focus is on file-based processing).
  • Managing or storing audio files beyond outputting transcripts.

Execution

  • info:Pinned dependenciesWhile dependency installation is described, explicit pinning via a lockfile is not evident for the 'openai' library.

Installation

npx skills add openai/skills

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
97 /100
Analysiert about 23 hours ago

Vertrauenssignale

Letzter Commitabout 24 hours ago
Sterne19k
Status
Quellcode ansehen

Ähnliche Erweiterungen

Whisper

97

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill
Orchestra-Research

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

Skill
openai

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

Skill
guia-matthieu

Sheet Music Publisher

99

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

Skill
bitwize-music-studio

Elevenlabs Stt

98

ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe

Skill
inferen-sh

Openai Whisper Api

95

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Skill
steipete