跳转到主要内容

VideoAgent Audio Studio

技能 已验证 活跃

厌倦了在多个音频 API 之间切换?此技能提供一键访问 TTS、音乐生成、音效和语音克隆。当您想生成任何音频而无需管理多个 API 密钥时,请使用此功能。

目的

提供一个一键式解决方案来生成任何类型的音频内容,抽象化管理多个音频 API 和凭证的复杂性。

功能

  • 统一访问 TTS、音乐、SFX 和语音克隆
  • 智能路由到最合适的 AI 模型
  • 简化的 API 密钥管理
  • 支持多个音频生成提供商

使用场景

  • 从文本生成语音或画外音
  • 为播客或视频创作背景音乐
  • 按需创建特定的音效
  • 从音频样本克隆语音

非目标

  • 管理单个音频 API 凭证
  • 复杂的音频编辑或混合
  • 超出生成的音频输出的实时音频流

工作流

  1. 启动 AudioMind 服务器。
  2. 分析用户对音频生成的请求类型。
  3. 将请求路由到适当的 MCP 工具和模型。
  4. 生成音频并返回 URL。
  5. 用户审查或使用生成的音频。

实践

  • 音频生成
  • API 集成
  • 提示工程

先决条件

  • ELEVENLABS_API_KEY 环境变量
  • FAL_KEY 环境变量(音乐/SFX 可选)

安装

npx skills add pexoai/pexo-skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
98 /100
1 day ago 分析

信任信号

最近提交about 1 month ago
星标851
许可证MIT
状态
查看源代码

类似扩展

Elevenlabs Tts

99

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

技能
inferen-sh

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

技能
sanjay3290

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

技能
openai

Validate Album

100

Validates album directory structure, file locations, and content integrity. Use before release or whenever the user wants to check an album's structural health.

技能
bitwize-music-studio

Sherpa Onnx Tts

99

Local text-to-speech via sherpa-onnx (offline, no cloud)

技能
steipete

Sheet Music Publisher

99

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

技能
bitwize-music-studio