Happy Audio Gen
Skill 확인됨Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.
This skill acts as a unified interface for generating speech from text across six different TTS providers: OpenAI, ElevenLabs, Bailian, MiniMax, SiliconFlow, and PlayHT. It handles API key management, provider selection, and output file generation, with robust error handling and clear documentation for configuration and usage.
설치
먼저 마켓플레이스를 추가하세요
/plugin marketplace add iamzhihuix/happy-claude-skills/plugin install happy-audio-gen@happy-claude-skills유사한 확장
ElevenLabs Audio Generation
93Generate AI voiceovers, sound effects, and music using ElevenLabs APIs. Use when creating audio content for videos, podcasts, or games. Triggers include generating voiceovers, narration, dialogue, sound effects from descriptions, background music, soundtrack generation, voice cloning, or any audio synthesis task.
Characteristic Voice
98Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 'talk like', 'speak like', 'companion voice', 'comfort me', 'cheer me up', 'sound more human', 'good night voice', 'good morning voice', or requests to add fillers, emotion, or personality to generated speech. Also use when the user wants to mimic a specific character's voice, apply speaking style presets (goodnight, morning, comfort, celebration, chatting), tune emotional parameters like warmth or tenderness, or make TTS output feel like a real person talking. If the user asks for a 'voice message', 'companion audio', 'character voice', or wants speech that sighs, laughs, hesitates, or sounds genuinely warm, use this skill. Do NOT use for plain text-to-speech without personality, music generation, sound effects, or general coding tasks unrelated to expressive speech.
ElevenLabs Text-to-Speech
98Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.
ElevenLabs Speech-to-Text
98Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.
ElevenLabs Music
97Generate music using ElevenLabs Music API. Use when creating instrumental tracks, songs with lyrics, background music, jingles, or any AI-generated music composition. Supports prompt-based generation, composition plans for granular control, and detailed output with metadata.
Text-to-Speech (TTS)
95Implement text-to-speech (TTS) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to convert text into natural-sounding speech, create audio content, build voice-enabled applications, or generate spoken audio files. Supports multiple voices, adjustable speed, and various audio formats.