Chat with Anyone
Skill AvisoChat with any real person or fictional character in their own voice by automatically finding their speech online, extracting a clean reference sample, and generating audio replies. Also supports generating a matching voice from an uploaded image. Use when the user says "我想跟xxx聊天", "你来扮演xxx跟我说话", "让xxx给我讲讲这篇文章", "我想跟图片中的人说话", or similar.
This skill enables users to generate audio replies in the voice of real or fictional characters by cloning speech from online videos or from uploaded images. It leverages external tools like ffmpeg, yt-dlp, and a separate TTS skill, requiring specific API keys for advanced features.
License
- critical:License usabilityNo license file or SPDX identifier is present in the repository, making the licensing status unclear and potentially restrictive.
Versioning
- warning:Release ManagementThere is no version information (e.g., version field in manifest, CHANGELOG, or GitHub releases) for this skill.
Code Execution
- info:ValidationInput validation appears to be handled by Python's standard argument parsing, but explicit schema validation libraries are not evident for all inputs like file paths.
- info:Tool FallbackWhile the skill lists `tts` as a dependency, it doesn't explicitly state if it's optional or if there's a fallback if the `tts` skill is unavailable, though it implies its use.
Compliance
- info:GDPRThe skill processes audio and image data which could potentially contain personal data. While not directly submitting PII to a third party without consent, care should be taken with uploaded images and reference audio sources.
Portability
- info:Cross-skill couplingThe skill depends on the `tts` skill, which is mentioned in prerequisites but not explicitly handled as an optional dependency with a fallback within this skill's documentation.
Instalación
npx skills add noizai/skillsEjecuta el CLI de skills de Vercel (skills.sh) mediante npx — requiere Node.js en local y al menos un agente compatible con skills instalado (Claude Code, Cursor, Codex, …). Asume que el repositorio sigue el formato de agentskills.io.
Extensiones similares
Happy Audio Gen
100Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.
Characteristic Voice
98Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 'talk like', 'speak like', 'companion voice', 'comfort me', 'cheer me up', 'sound more human', 'good night voice', 'good morning voice', or requests to add fillers, emotion, or personality to generated speech. Also use when the user wants to mimic a specific character's voice, apply speaking style presets (goodnight, morning, comfort, celebration, chatting), tune emotional parameters like warmth or tenderness, or make TTS output feel like a real person talking. If the user asks for a 'voice message', 'companion audio', 'character voice', or wants speech that sighs, laughs, hesitates, or sounds genuinely warm, use this skill. Do NOT use for plain text-to-speech without personality, music generation, sound effects, or general coding tasks unrelated to expressive speech.
ElevenLabs Text-to-Speech
98Convert text to speech using ElevenLabs voice AI. Use when generating audio from text, creating voiceovers, building voice apps, or synthesizing speech in 70+ languages.
ElevenLabs Voice Isolator
95Remove background noise and isolate vocals/speech from audio using ElevenLabs Voice Isolator (audio isolation) API. Use when cleaning up noisy recordings, removing music or background ambience from dialogue, isolating speech from field recordings, preparing audio for transcription, extracting vocals, or any "denoise / clean up / isolate voice" task.
ElevenLabs Voice Changer
95Transform the voice in an audio recording into a different target voice while preserving emotion, timing, and delivery using the ElevenLabs Voice Changer (speech-to-speech) API. Use when converting one voice to another, changing the speaker/narrator of an existing recording, dubbing a voice-over in a different voice, creating character voices from a scratch performance, anonymizing a speaker, or any "voice conversion / voice transfer / speech-to-speech" task. Make sure to use this skill whenever the user mentions voice changing, voice conversion, speech-to-speech, swapping a voice in audio, re-voicing a clip, or applying a different voice to an existing recording — even if they don't explicitly say "voice changer".
Text-to-Speech (TTS)
95Implement text-to-speech (TTS) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to convert text into natural-sounding speech, create audio content, build voice-enabled applications, or generate spoken audio files. Supports multiple voices, adjustable speed, and various audio formats.