Ruka hadi kwenye maudhui makuu
Maudhui haya bado hayapatikani katika lugha yako na yanaonyeshwa kwa Kiingereza.

Qwen ASR

Skill Imethibitishwa
92

Transcribe audio files using Qwen ASR. Use when the user sends voice messages and wants them converted to text.

Muhtasari wa AI

This skill uses Qwen ASR to transcribe audio files, supporting formats like WAV, MP3, and OGG. It can accept audio input directly from files, standard input, or via a URL, and outputs the transcribed text to standard output. No API keys or configuration are required.

Versioning

  • warning:Release ManagementNo version information is present in the manifest files or changelog, and installation instructions do not specify a version, implying use of the main branch.

Code Execution

  • info:ValidationThe script uses argparse for basic argument parsing, but does not employ a formal schema validation library for inputs or outputs.
  • info:LoggingThe script logs information and warnings to stderr but does not append to a local audit file.

Practical Utility

  • info:Edge casesThe script handles file input and stdin, and basic error logging is present, but specific failure modes and recovery paths are not explicitly documented.

Usakinishaji

npx skills add aahl/skills

Inaendesha Vercel skills CLI (skills.sh) kupitia npx — inahitaji Node.js ndani ya kompyuta yako na angalau agent mmoja anayoendana na skills aliyesakinishwa (Claude Code, Cursor, Codex, …). Inafikiriwa kuwa repo inafuata muundo wa agentskills.io.

Imesasishwa 6 days ago
Tazama msimbo wa chanzo

Vipanuzi vinavyofanana

ASR (Speech to Text) Skill

95

Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.

Skill
answerzhao

ElevenLabs Speech-to-Text

98

Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.

Skill
elevenlabs

AI Multimodal Processing Skill

95

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Skill
samhvw8

ElevenLabs Speech-to-Text

95

Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.

Skill
elevenlabs

Transcription Automation

92

Automate audio/video transcription, meeting notes, subtitle generation, and content processing

Skill
claude-office-skills

Happy Audio Gen

100

Universal AI voice / text-to-speech skill supporting OpenAI TTS (gpt-4o-mini-tts, tts-1), ElevenLabs multilingual TTS with voice cloning, Bailian Qwen TTS (qwen-tts / qwen3-tts-vd with voice-design custom voices, long-text chunking built in), MiniMax speech-02-hd, SiliconFlow CosyVoice / SenseVoice, and PlayHT 2.0. Use this skill whenever the user asks to read text aloud, synthesize speech, generate narration, create voice-over, dub a script, or turn any text into audio (mp3 / wav / ogg / flac). Typical phrases include "read this aloud", "generate voice for ...", "create a narration of ...", "tts this", "把这段念出来", "做个配音", "合成语音", or mentions of voices / TTS model names like Alloy, Ash, Cherry, Rachel, CosyVoice, PlayHT. Always use this skill even if the user does not specify a provider — pick one from EXTEND.md defaults or available env keys.

Skill
iamzhihuix