VideoAgent Audio Studio

技能已验证活跃

厌倦了在多个音频 API 之间切换？此技能提供一键访问 TTS、音乐生成、音效和语音克隆。当您想生成任何音频而无需管理多个 API 密钥时，请使用此功能。

目的

提供一个一键式解决方案来生成任何类型的音频内容，抽象化管理多个音频 API 和凭证的复杂性。

功能

统一访问 TTS、音乐、SFX 和语音克隆
智能路由到最合适的 AI 模型
简化的 API 密钥管理
支持多个音频生成提供商

使用场景

从文本生成语音或画外音
为播客或视频创作背景音乐
按需创建特定的音效
从音频样本克隆语音

非目标

管理单个音频 API 凭证
复杂的音频编辑或混合
超出生成的音频输出的实时音频流

工作流

启动 AudioMind 服务器。
分析用户对音频生成的请求类型。
将请求路由到适当的 MCP 工具和模型。
生成音频并返回 URL。
用户审查或使用生成的音频。

实践

音频生成
API 集成
提示工程

先决条件

ELEVENLABS_API_KEY 环境变量
FAL_KEY 环境变量（音乐/SFX 可选）

安装

npx skills add pexoai/pexo-skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

98 /100

1 day ago 分析

信任信号

最近提交about 1 month ago

GitHub 所有者 pexoai

星标851

许可证MIT

状态

查看源代码

类似扩展

Elevenlabs Tts

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

技能

inferen-sh

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

技能

sanjay3290

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

技能

openai

Validate Album

100

Validates album directory structure, file locations, and content integrity. Use before release or whenever the user wants to check an album's structural health.

技能

bitwize-music-studio

Sherpa Onnx Tts

Local text-to-speech via sherpa-onnx (offline, no cloud)

技能

steipete

Sheet Music Publisher

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

技能

bitwize-music-studio