Google Tts
技能 已验证 活跃Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".
To easily convert documents and text into audio using Google Cloud's advanced text-to-speech capabilities, enabling narration, audio generation, and podcast creation.
功能
- Convert text and documents (PDF, DOCX, MD, TXT) to audio
- Generate podcast-style audio with multiple speakers/voices
- Support for various Google Cloud TTS voices (Neural2, WaveNet, Studio)
- Configurable speaking rate, pitch, and audio encoding
- Automatic text chunking for long documents
使用场景
- Narrate a document or analysis for easier consumption.
- Create audio recordings of documentation or articles.
- Generate podcast episodes from a conversational script.
- Convert written content into speech for accessibility purposes.
非目标
- Performing real-time voice transcription.
- Providing voice modification effects beyond pitch and rate.
- Hosting or distributing generated audio files.
安装
请先添加 Marketplace
/plugin marketplace add sanjay3290/ai-skills/plugin install ai-skills@ai-skills质量评分
已验证类似扩展
Speech Generation Skill
100Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.
Sherpa Onnx Tts
99Local text-to-speech via sherpa-onnx (offline, no cloud)
Elevenlabs Tts
99ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis
Tts
96当用户想要将文本转换为语音、从文本生成音频或制作配音时,请使用此技能。触发词包括:提及 'TTS'、'text to speech'、'speak'、'say'、'voice'、'read aloud'、'audio narration'、'voiceover'、'dubbing',或要求将书面内容转换为口头音频。在将 EPUB/PDF/SRT/文章转换为音频、从参考音频克隆声音、控制语音中的情感或语速、将语音与字幕时间线对齐或生成每个片段的语音映射音频时,也请使用。
Characteristic Voice
95每当用户希望语音听起来更具人情味、伙伴感或情感表现力时,请使用此技能。触发词包括:任何提及“说得像”、“像...一样说话”、“听起来像”、“伙伴声音”、“安慰我”、“让我开心”、“听起来更像人”、“晚安语音”、“早安语音”,或要求为生成的语音添加填充词、情感或个性。当用户希望模仿特定角色的声音、应用说话风格预设(晚安、早安、安慰、庆祝、聊天)、调整语气等情感参数(如温暖或温柔),或使文本转语音输出听起来像真人说话时,也请使用此技能。如果用户要求“语音消息”、“伙伴音频”、“角色声音”,或者想要带有叹息、笑声、犹豫或真正温暖的声音,请使用此技能。请勿用于没有个性的纯文本转语音、音乐生成、音效或与表情语音无关的常规编码任务。
Podcast Generation
100Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.