跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Text To Speech

技能 已验证 活跃

Convert text to natural speech with Inworld TTS, ElevenLabs, DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: Inworld TTS-2 (100+ languages, emotion steering), Inworld TTS 1.5 (ultra-low latency), ElevenLabs (premium, 22+ voices, 32 languages), DIA TTS (conversational), Kokoro TTS, Chatterbox, Higgs Audio, VibeVoice (podcasts). Capabilities: text-to-speech, voice cloning, multi-speaker dialogue, podcast generation, expressive speech, emotion/delivery steering, character voices. Use for: voiceovers, audiobooks, podcasts, accessibility, video narration, IVR, voice assistants, gaming characters, avatar audio. Triggers: text to speech, tts, voice generation, ai voice, speech synthesis, voice over, generate speech, ai narrator, voice cloning, text to audio, elevenlabs, eleven labs, voice ai, ai voiceover, speech generator, natural voice, inworld, inworld tts, character voice, game voice, npc voice

目的

To provide users with a versatile tool for generating high-quality speech from text, supporting a wide range of applications from voiceovers to conversational AI.

功能

  • Text-to-speech conversion
  • Support for multiple TTS models (Inworld, ElevenLabs, DIA, etc.)
  • Emotion and delivery steering
  • Low-latency options for real-time applications
  • Voice cloning and multi-speaker dialogue

使用场景

  • Generating voiceovers for videos and product demos
  • Creating audiobooks and podcast episodes
  • Enabling accessibility features
  • Powering conversational AI and IVR systems
  • Generating character voices for games and avatars

非目标

  • Speech-to-text transcription
  • Music generation
  • Complex audio editing beyond generation

工作流

  1. Log in to inference.sh CLI (`belt login`).
  2. Select an appropriate TTS model and configure input parameters (text, voice, emotion, etc.).
  3. Execute the `belt app run` command with the chosen model and input.
  4. Receive the generated audio output.

先决条件

  • inference.sh CLI (`belt`) installed
  • Valid `belt login` session

Scope

  • info:Tool surface sizeWhile the overall inference.sh project has many skills, the text-to-speech skill itself focuses on a specific set of TTS tools.

安装

npx skills add inferen-sh/skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
97 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标433
许可证MIT
状态
查看源代码

类似扩展

Elevenlabs Tts

99

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

技能
inferen-sh

AI Voice Generation

95

AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Inworld TTS-2 (100+ languages, emotion/non-verbal steering), Inworld TTS 1.5 (ultra-low latency), ElevenLabs (22+ premium voices, 32 languages), Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation, voice transformation, delivery mode control, character voices. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility, gaming NPCs, avatar audio, UGC. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs, eleven labs, natural voice, realistic speech, voice ai, voice changer, inworld, inworld tts, character voice, npc voice

技能
inferen-sh

Speech Generation Skill

100

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

技能
openai

Sag

98

ElevenLabs text-to-speech with mac-style say UX.

技能
steipete

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

技能
sanjay3290

Podcast Generation

100

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

技能
microsoft