Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Speech Generation Skill

Skill Verifiziert Aktiv

Use when the user asks for text-to-speech narration or voiceover, accessibility reads, audio prompts, or batch speech generation via the OpenAI Audio API; run the bundled CLI (`scripts/text_to_speech.py`) with built-in voices and require `OPENAI_API_KEY` for live calls. Custom voice creation is out of scope.

Zweck

Use this skill when you need to generate spoken audio from text for purposes like narration, voiceovers, accessibility reads, or batch audio prompt creation.

Funktionen

  • Generate single text-to-speech clips
  • Perform batch speech generation from JSONL files
  • Utilize built-in OpenAI voices
  • Control voice characteristics via instructions
  • Run via bundled CLI or OpenAI API

Anwendungsfälle

  • Create audio narration for project demos or documentation
  • Generate accessibility-friendly audio descriptions
  • Produce IVR or phone prompt audio prompts
  • Batch generate speech for a large number of text inputs

Nicht-Ziele

  • Custom voice creation or fine-tuning
  • Using non-OpenAI TTS models
  • Generating audio without an API key or network access (unless in dry-run mode)

Workflow

  1. Determine intent (single vs. batch)
  2. Collect inputs: text, voice, format, constraints
  3. Write temporary JSONL for batch jobs
  4. Augment instructions into a labeled spec
  5. Run bundled CLI (`scripts/text_to_speech.py`)
  6. Validate output: intelligibility, pacing, pronunciation
  7. Iterate with targeted changes
  8. Save/return final outputs

Praktiken

  • API Integration
  • Command Line Interface
  • Speech Synthesis
  • Reproducible Builds

Voraussetzungen

  • OPENAI_API_KEY environment variable set
  • Python 3 and pip installed
  • openai Python package installed

Installation

npx skills add openai/skills

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
100 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne19k
LizenzApache-2.0
Status
Quellcode ansehen

Ähnliche Erweiterungen

Google Tts

100

Convert documents and text to audio using Google Cloud Text-to-Speech. Use this skill when the user wants to: narrate a document, read aloud text, generate audio from a file, convert text to speech, create a recording of documentation or analysis, create a podcast from a document, or use Google TTS/text-to-speech. Trigger phrases: "read this aloud", "narrate this", "create a recording", "text to speech", "TTS", "convert to audio", "audio from document", "listen to this", "generate audio", "google tts", "create a podcast".

Skill
sanjay3290

Elevenlabs Tts

99

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

Skill
inferen-sh

Tts

96

Verwenden Sie diese Fähigkeit, wann immer der Benutzer Text in Sprache umwandeln, Audio aus Text generieren oder Voiceovers erstellen möchte. Auslöser sind: jede Erwähnung von 'TTS', 'Text to Speech', 'sprechen', 'sagen', 'Stimme', 'laut vorlesen', 'Audio-Narration', 'Voiceover', 'Synchronisation' oder Anfragen, geschriebene Inhalte in gesprochene Audios umzuwandeln. Verwenden Sie es auch, wenn Sie EPUB/PDF/SRT/Artikel in Audio konvertieren, Stimmen aus Referenz-Audios klonen, Emotionen oder Geschwindigkeit in der Sprache steuern, Sprache an Zeitpläne von Untertiteln anpassen oder sprach-zugeordnete Audio pro Segment produzieren.

Skill
NoizAI

AI Voice Generation

95

AI voice generation, text-to-speech, and voice synthesis via inference.sh CLI. Models: Inworld TTS-2 (100+ languages, emotion/non-verbal steering), Inworld TTS 1.5 (ultra-low latency), ElevenLabs (22+ premium voices, 32 languages), Kokoro TTS, DIA, Chatterbox, Higgs, VibeVoice for natural speech. Capabilities: multiple voices, emotions, accents, long-form narration, conversation, voice transformation, delivery mode control, character voices. Use for: voiceovers, audiobooks, podcasts, video narration, accessibility, gaming NPCs, avatar audio, UGC. Triggers: voice cloning, tts, text to speech, ai voice, voice generation, voice synthesis, voice over, narration, speech synthesis, ai narrator, elevenlabs, eleven labs, natural voice, realistic speech, voice ai, voice changer, inworld, inworld tts, character voice, npc voice

Skill
inferen-sh

Podcast Generation

100

Generate AI-powered podcast-style audio narratives using Azure OpenAI's GPT Realtime Mini model via WebSocket. Use when building text-to-speech features, audio narrative generation, podcast creation from content, or integrating with Azure OpenAI Realtime API for real audio output. Covers full-stack implementation from React frontend to Python FastAPI backend with WebSocket streaming.

Skill
microsoft

Sherpa Onnx Tts

99

Local text-to-speech via sherpa-onnx (offline, no cloud)

Skill
steipete