Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Video to Text Bcut

Skill Aktiv

Transcribe video/audio URL to text + word-level timestamps using Bilibili Bcut ASR API (free, no API key). Preferred for Chinese content — Bcut gives character-level timestamps vs Whisper word-level. Returns text + segments [{start, end, text}]. Requires yt-dlp + ffmpeg.

Zweck

To quickly and accurately transcribe video or audio content, especially Chinese-language media, into text with precise word-level timestamps for analysis or subtitle generation.

Funktionen

  • Transcribe video/audio URL to text
  • Provide word-level timestamps
  • Utilize Bilibili Bcut ASR API (free)
  • Preferred for Chinese content
  • Return text and segment data

Anwendungsfälle

  • Generate subtitles for Chinese videos
  • Extract text content from audio files
  • Analyze spoken content for keywords
  • Create searchable transcripts of video lectures

Nicht-Ziele

  • Real-time transcription
  • Translation of transcribed text
  • Handling of encrypted or private video content
  • API key management for transcription services

Workflow

  1. Extract audio from video URL using yt-dlp
  2. Convert audio to 16kHz mono WAV using ffmpeg
  3. Upload audio to Bcut API and create transcription task
  4. Poll Bcut API for task completion and retrieve word-level timestamps
  5. Aggregate characters into sentence-level segments
  6. Return structured JSON output with text and segments

Praktiken

  • Transcription
  • ASR

Voraussetzungen

  • yt-dlp
  • ffmpeg

Trust

  • warning:Issues AttentionOpen issues (17) are significantly higher than closed issues (3) in the last 90 days, indicating slow maintainer response to reported problems.

Installation

npx skills add 0xmariowu/Autosearch

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

96 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit3 days ago
Sterne18
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Whisper

97

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill
Orchestra-Research

Whisper

95

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill
davila7

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

Skill
guia-matthieu

Summarize

99

Summarize or transcribe URLs, YouTube/videos, podcasts, articles, transcripts, PDFs, and local files.

Skill
steipete

Cli Anything Videocaptioner

99

AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.

Skill
hkuds

Sheet Music Publisher

99

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

Skill
bitwize-music-studio