이 콘텐츠는 아직 사용자의 언어로 제공되지 않아 영어로 표시됩니다.

ElevenLabs Speech-to-Text

Skill 확인됨

Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.

AI 요약

This skill leverages the ElevenLabs Scribe v2 API to convert audio and video files into text, supporting numerous languages, speaker diarization, and word-level timestamps. It offers both batch and real-time streaming transcription capabilities through well-documented SDKs for Python and JavaScript, and direct cURL examples.

Versioning

warning:Release ManagementThere is no explicit versioning information (e.g., `version` field in SKILL.md or package.json) for this skill. Installation instructions refer to `@latest`.

Compliance

info:GDPRThe extension processes audio data, which may contain personal data, and sends it to a third-party API (ElevenLabs). While the API itself likely has privacy policies, the extension does not explicitly sanitize personal data before sending it.

설치

npx skills add elevenlabs/skills

Vercel skills CLI(skills.sh)를 npx로 실행합니다. 로컬에 Node.js와 skills 호환 에이전트(Claude Code, Cursor, Codex 등) 중 하나 이상이 설치되어 있어야 합니다. 저장소가 agentskills.io 형식을 따른다고 가정합니다.

8 days ago

elevenlabs

216 stars

MIT

elevenlabs.io

6 days ago에 업데이트됨

소스 코드 보기

유사한 확장

ElevenLabs Speech-to-Text

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

Skill

elevenlabs

ElevenLabs Music

Generate music using ElevenLabs Music API. Use when creating instrumental tracks, songs with lyrics, background music, jingles, or any AI-generated music composition. Supports prompt-based generation, composition plans for granular control, and detailed output with metadata.

Skill

elevenlabs

ASR (Speech to Text) Skill

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.

Skill

answerzhao

Happy Audio Gen

100

Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.

Skill

iamzhihuix

ElevenLabs Text-to-Speech

Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.

Skill

elevenlabs

AI Multimodal Processing Skill

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Skill

samhvw8