Whisper Transcription

Skill Verified Active

Transcribe audio and video files to text using OpenAI Whisper. Use when: converting podcasts to blog posts; creating video subtitles; extracting quotes from interviews; repurposing video content to text; building searchable audio archives

Purpose

To accurately convert spoken word from audio and video files into searchable text formats using advanced AI, enabling content repurposing and archival.

Features

Transcribe audio and video files
Batch processing of multiple files
Translate transcriptions to specified languages
Extract timestamps with text segments
Support for multiple output formats (txt, srt, vtt, json, tsv)

Use Cases

Convert podcasts to blog posts
Create video subtitles (SRT/VTT)
Extract quotes from interviews
Build searchable audio archives

Non-Goals

Replacing professional audio engineering
Making subjective creative decisions
Directly accessing or editing audio files
Guaranteeing commercial success of content

Workflow

Specify input file and desired command (transcribe, batch, translate, timestamps).
Select model size, output format, and optionally language.
Execute the command via Python script.
Receive the transcribed text or formatted output file.

Prerequisites

Python 3
pip install openai-whisper torch ffmpeg-python click
ffmpeg installed on system

Code Execution

info:LoggingThe script provides informative output to stdout/stderr during execution, detailing model loading, transcription progress, and output file creation.

Installation

npx skills add guia-matthieu/clawfu-skills

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified

95 /100

Analyzed about 22 hours ago

Trust Signals

Last commitabout 1 month ago

GitHub owner guia-matthieu

Stars104

LicenseMIT

Websiteclawfu.com

Status

View Source

Similar Extensions

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

Skill

guia-matthieu

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill

Orchestra-Research

Openai Whisper Api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

Skill

steipete

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

Skill

openai

Ffmpeg

Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.

Skill

digitalsamba

Sheet Music Publisher

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

Skill

bitwize-music-studio