跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

ElevenLabs Audio Generation

Skill 已验证
93

Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.

AI 摘要

This skill leverages the ElevenLabs API to generate various forms of audio content, such as text-to-speech voiceovers, sound effects, music, and voice cloning. It provides Python code snippets for direct API interaction and integrates with video production tools like Remotion by generating per-scene audio files and timing manifests.

Documentation

  • warning:Configuration & parameter referenceThe extension requires an ELEVENLABS_API_KEY to be set in .env, which is mentioned but not explicitly documented as a required parameter with precedence order in the main SKILL.md.

Security

  • warning:Secret ManagementThe extension requires an API key to be provided via an .env file, which is a common practice but poses a risk if the .env file is inadvertently committed or exposed. The secrets are not echoed into stdout/stderr.

Code Execution

  • warning:ValidationWhile the Python code uses the ElevenLabs SDK, there's no explicit schema validation or sanitization shown for user-provided inputs like text or prompts beyond what the SDK might inherently perform. The API key handling relies on environment variables.

Compliance

  • info:GDPRThe extension processes text inputs for audio generation, which could potentially include personal data. However, this data is sent only to the ElevenLabs API, and there are no indications of submission to third parties or lack of sanitization beyond what the API handles.

安装

npx skills add digitalsamba/claude-code-video-toolkit

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

3 days ago
1.1k stars
MIT
更新于 1 day ago
查看源代码

类似扩展

Happy Audio Gen

100

Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.

Skill
iamzhihuix

ElevenLabs Speech-to-Text

98

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

Skill
elevenlabs

FFmpeg for Video Production

95

Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.

Skill
digitalsamba

ElevenLabs Text-to-Speech

98

Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.

Skill
elevenlabs

ElevenLabs Music

97

Generate music using ElevenLabs Music API. Use when creating instrumental tracks, songs with lyrics, background music, jingles, or any AI-generated music composition. Supports prompt-based generation, composition plans for granular control, and detailed output with metadata.

Skill
elevenlabs

AI Multimodal Processing Skill

95

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Skill
samhvw8