このコンテンツはお使いの言語にまだ対応していないため、英語で表示されています。

Podcast Generation Skill

Skill 検証済み

Use this skill when the user requests to generate, create, or produce podcasts from text content. Converts written content into a two-host conversational podcast audio format with natural dialogue.

AI サマリー

It converts written content into a structured JSON script, synthesizes speech using Volcengine TTS, and combines the audio into an MP3 file, also generating a markdown transcript. The skill supports both English and Chinese languages.

Documentation

warning:Configuration & parameter referenceThe required environment variables (VOLCENGINE_TTS_APPID, VOLCENGINE_TTS_ACCESS_TOKEN) are documented, but VOLCENGINE_TTS_CLUSTER is optional and has a default, which is good. However, the script's internal parameters like `--script-file` and `--output-file` are documented, but there's no mention of their expected types or validation beyond being absolute paths.

Maintenance

warning:Dependency ManagementThe Python script uses several external libraries (argparse, base64, json, logging, os, uuid, concurrent.futures, requests, typing). While these are standard, there's no explicit dependency management file (like requirements.txt or pyproject.toml) for these within the skill's directory, and no mechanism for updating them is apparent.

Security

warning:Secret ManagementThe script explicitly reads sensitive Volcengine TTS credentials from environment variables (VOLCENGINE_TTS_APPID, VOLCENGINE_TTS_ACCESS_TOKEN). While this is better than hardcoding, it doesn't prevent these secrets from being potentially exposed if the environment they are stored in is compromised or if logging unintentionally reveals them.

Versioning

warning:Release ManagementThere is no explicit versioning information (e.g., a version field in a manifest or a CHANGELOG) for this skill. The README indicates the project is at version 2.0, but this skill lacks specific versioning.

Code Execution

warning:ValidationThe Python script uses argparse for command-line arguments, which provides basic validation for argument presence and type (implicitly string for paths). However, there's no explicit validation for the *content* of the paths (e.g., ensuring they are absolute, writable, or exist) or the structure of the input JSON script beyond basic dictionary access.

Practical Utility

warning:Edge casesWhile the script handles some errors (missing env vars, invalid script format, TTS API errors), it doesn't explicitly document failure modes like malformed JSON in the script file or issues with the output file path (e.g., non-writable directory). The recovery steps for these are implicit (script failure).

Portability

warning:Stack assumptionsThe Python script assumes a Python 3.12+ environment and requires specific Volcengine TTS environment variables. While Python is common, the specific TTS environment variables are a necessary precondition not explicitly declared upfront in the SKILL.md's main instruction section, though they are listed under 'Requirements'.

インストール

npx skills add bytedance/deer-flow

Vercel skills CLI(skills.sh)を npx 経由で実行します。ローカルに Node.js と、skills 対応のエージェント(Claude Code、Cursor、Codex など)が少なくとも 1 つインストールされている必要があります。リポジトリが agentskills.io 形式に従っていることを前提としています。

6 days ago

bytedance

65.2k stars

MIT

deerflow.tech

6 days ago に更新

ソースコードを表示

類似の拡張機能

TTS

Use this skill whenever the user wants to convert text into speech, generate audio from text, or produce voiceovers. Triggers include: any mention of 'TTS', 'text to speech', 'speak', 'say', 'voice', 'read aloud', 'audio narration', 'voiceover', 'dubbing', or requests to turn written content into spoken audio. Also use when converting EPUB/PDF/SRT/articles to audio, cloning voices from reference audio, controlling emotion or speed in speech, aligning speech to subtitle timelines, or producing per-segment voice-mapped audio.

Skill

noizai

Document to Narration

Convert written documents to narrated video scripts with TTS audio and word-level timing. Use when preparing essays, blog posts, or articles for video narration. Outputs scene files, audio, and VTT with precise word timestamps. Keywords: narration, voiceover, TTS, scenes, audio, timing, video script, spoken.

Skill

jwynia

Daily News Caster

Fetches the latest news using news-aggregator-skill, formats it into a podcast script in Markdown format, and uses the tts skill to generate a podcast audio file. Use when the user asks to get the latest news and read it out as a podcast.

Skill

noizai

Happy Audio Gen

100

Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.

Skill

iamzhihuix

Characteristic Voice

Use this skill whenever the user wants speech to sound more human, companion-like, or emotionally expressive. Triggers include: any mention of 'say like', 'talk like', 'speak like', 'companion voice', 'comfort me', 'cheer me up', 'sound more human', 'good night voice', 'good morning voice', or requests to add fillers, emotion, or personality to generated speech. Also use when the user wants to mimic a specific character's voice, apply speaking style presets (goodnight, morning, comfort, celebration, chatting), tune emotional parameters like warmth or tenderness, or make TTS output feel like a real person talking. If the user asks for a 'voice message', 'companion audio', 'character voice', or wants speech that sighs, laughs, hesitates, or sounds genuinely warm, use this skill. Do NOT use for plain text-to-speech without personality, music generation, sound effects, or general coding tasks unrelated to expressive speech.

Skill

noizai

arXiv Search

Searches arXiv for preprints and academic papers, retrieves abstracts, and filters by topic. Use when the user asks to find research papers, search arXiv, look up preprints, find academic articles in physics, math, CS, biology, statistics, or related fields.

Skill

langchain-ai