此内容尚未提供您的语言版本,正在以英文显示。

Speech Generation Skill

技能已验证活跃

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

目的

Use this skill when you need to generate spoken audio from text for purposes like narration, voiceovers, accessibility reads, or batch audio prompt creation.

功能

Generate single text-to-speech clips
Perform batch speech generation from JSONL files
Utilize built-in OpenAI voices
Control voice characteristics via instructions
Run via bundled CLI or OpenAI API

使用场景

Create audio narration for project demos or documentation
Generate accessibility-friendly audio descriptions
Produce IVR or phone prompt audio prompts
Batch generate speech for a large number of text inputs

非目标

Custom voice creation or fine-tuning
Using non-OpenAI TTS models
Generating audio without an API key or network access (unless in dry-run mode)

工作流

Determine intent (single vs. batch)
Collect inputs: text, voice, format, constraints
Write temporary JSONL for batch jobs
Augment instructions into a labeled spec
Run bundled CLI (`scripts/text_to_speech.py`)
Validate output: intelligibility, pacing, pronunciation
Iterate with targeted changes
Save/return final outputs

实践

API Integration
Command Line Interface
Speech Synthesis
Reproducible Builds

先决条件

OPENAI_API_KEY environment variable set
Python 3 and pip installed
openai Python package installed

安装

npx skills add openai/skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证

100 /100

1 day ago 分析

信任信号

最近提交1 day ago

GitHub 所有者 openai

星标19k

许可证Apache-2.0

状态

查看源代码

类似扩展

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

技能

sanjay3290

Elevenlabs Tts

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

技能

inferen-sh

Tts

当用户想要将文本转换为语音、从文本生成音频或制作配音时，请使用此技能。触发词包括：提及 'TTS'、'text to speech'、'speak'、'say'、'voice'、'read aloud'、'audio narration'、'voiceover'、'dubbing'，或要求将书面内容转换为口头音频。在将 EPUB/PDF/SRT/文章转换为音频、从参考音频克隆声音、控制语音中的情感或语速、将语音与字幕时间线对齐或生成每个片段的语音映射音频时，也请使用。

技能

NoizAI

AI Voice Generation

AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Inworld TTS-2 (100+ languages, emotion/non-verbal steering), Inworld TTS 1.5 (ultra-low latency), ElevenLabs (22+ premium voices, 32 languages), Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation, voice transformation, delivery mode control, character voices. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility, gaming NPCs, avatar audio, UGC. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs, eleven labs, natural voice, realistic speech, voice ai, voice changer, inworld, inworld tts, character voice, npc voice

技能

inferen-sh

Podcast Generation

100

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

技能

microsoft

Sherpa Onnx Tts

Local text-to-speech via sherpa-onnx (offline, no cloud)

技能

steipete