Skip to main content

VideoAgent Audio Studio

Skill Verified Active

Tired of juggling multiple audio APIs? This skill gives you one-command access to TTS, music generation, sound effects, and voice cloning. Use when you want to generate any audio without managing multiple API keys.

Purpose

To provide a one-command solution for generating any type of audio content, abstracting away the complexity of managing multiple audio APIs and credentials.

Features

  • Unified access to TTS, music, SFX, and voice cloning
  • Intelligent routing to best-suited AI models
  • Simplified API key management
  • Support for multiple audio generation providers

Use Cases

  • Generate speech or voice-overs from text
  • Compose background music for podcasts or videos
  • Create specific sound effects on demand
  • Clone a voice from an audio sample

Non-Goals

  • Managing individual audio API credentials
  • Complex audio editing or mixing
  • Real-time audio streaming beyond the generated output

Workflow

  1. Start the AudioMind server.
  2. Analyze user request for audio generation type.
  3. Route request to the appropriate MCP tool and model.
  4. Generate audio and return a URL.
  5. User reviews or uses the generated audio.

Practices

  • Audio generation
  • API integration
  • Prompt engineering

Prerequisites

  • ELEVENLABS_API_KEY environment variable
  • FAL_KEY environment variable (optional for music/SFX)

Installation

npx skills add pexoai/pexo-skills

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
98 /100
Analyzed about 15 hours ago

Trust Signals

Last commitabout 1 month ago
Stars851
LicenseMIT
Status
View Source

Similar Extensions

Elevenlabs Tts

99

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

Skill
inferen-sh

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

Skill
sanjay3290

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

Skill
openai

Validate Album

100

Validates album directory structure, file locations, and content integrity. Use before release or whenever the user wants to check an album's structural health.

Skill
bitwize-music-studio

Sherpa Onnx Tts

99

Local text-to-speech via sherpa-onnx (offline, no cloud)

Skill
steipete

Sheet Music Publisher

99

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

Skill
bitwize-music-studio

© 2025 SkillRepo · Find the right skill, skip the noise.