此内容尚未提供您的语言版本,正在以英文显示。

Video to Text Bcut

技能活跃

Transcribe video/audio URL to text + word-level timestamps using Bilibili Bcut ASR API (free, no API key). Preferred for Chinese content — Bcut gives character-level timestamps vs Whisper word-level. Returns text + segments [{start, end, text}]. Requires yt-dlp + ffmpeg.

目的

To quickly and accurately transcribe video or audio content, especially Chinese-language media, into text with precise word-level timestamps for analysis or subtitle generation.

功能

Transcribe video/audio URL to text
Provide word-level timestamps
Utilize Bilibili Bcut ASR API (free)
Preferred for Chinese content
Return text and segment data

使用场景

Generate subtitles for Chinese videos
Extract text content from audio files
Analyze spoken content for keywords
Create searchable transcripts of video lectures

非目标

Real-time transcription
Translation of transcribed text
Handling of encrypted or private video content
API key management for transcription services

工作流

Extract audio from video URL using yt-dlp
Convert audio to 16kHz mono WAV using ffmpeg
Upload audio to Bcut API and create transcription task
Poll Bcut API for task completion and retrieve word-level timestamps
Aggregate characters into sentence-level segments
Return structured JSON output with text and segments

实践

Transcription
ASR

先决条件

yt-dlp
ffmpeg

Trust

warning:Issues AttentionOpen issues (17) are significantly higher than closed issues (3) in the last 90 days, indicating slow maintainer response to reported problems.

安装

npx skills add 0xmariowu/Autosearch

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

96 /100

1 day ago 分析

信任信号

最近提交3 days ago

GitHub 所有者 0xmariowu

星标18

许可证MIT

网站autosearch.dev

状态

查看源代码

类似扩展

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

技能

Orchestra-Research

Whisper

技能

davila7

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

技能

guia-matthieu

Summarize

Summarize or transcribe URLs, YouTube/videos, podcasts, articles, transcripts, PDFs, and local files.

技能

steipete

Cli Anything Videocaptioner

AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.

技能

hkuds

Sheet Music Publisher

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

技能

bitwize-music-studio