Đi tới nội dung chính
Nội dung này hiện chưa có sẵn bằng ngôn ngữ của bạn và đang được hiển thị bằng tiếng Anh.

Transcription Automation

Skill Đã xác minh
92

Automate audio/video transcription, meeting notes, subtitle generation, and content processing

Tóm tắt từ AI

This skill automates audio and video transcription, generates meeting notes with speaker diarization, and creates various subtitle formats (SRT, VTT). It supports multiple transcription engines (Whisper, AssemblyAI, Deepgram) and offers detailed configuration for audio settings, language detection, and output features.

Maintenance

  • warning:Commit recencyThere are no commits on the default branch (pushedAt: n/a), indicating the extension may be unmaintained.

Cài đặt

npx skills add claude-office-skills/skills

Chạy Vercel skills CLI (skills.sh) qua npx — yêu cầu Node.js trên máy và ít nhất một agent tương thích skills đã được cài (Claude Code, Cursor, Codex, …). Giả định repo tuân theo định dạng agentskills.io.

3 months ago
98 stars
MIT
Cập nhật 5 days ago
Xem mã nguồn

Tiện ích tương tự

AI Multimodal Processing Skill

95

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Skill
samhvw8

ElevenLabs Speech-to-Text

98

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

Skill
elevenlabs

ASR (Speech to Text) Skill

95

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.

Skill
answerzhao

ElevenLabs Speech-to-Text

95

Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.

Skill
elevenlabs

FFmpeg for Video Production

95

Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.

Skill
digitalsamba

ElevenLabs Audio Generation

93

Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.

Skill
digitalsamba