Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Elevenlabs Stt

Skill Verifiziert Aktiv

ElevenLabs speech-to-text with Scribe models and forced alignment via inference.sh CLI. Models: Scribe v1/v2 (98%+ accuracy, 90+ languages). Capabilities: transcription, speaker diarization, audio event tagging, word-level timestamps, forced alignment, subtitle generation. Use for: meeting transcription, subtitles, podcast transcripts, lip-sync timing, karaoke. Triggers: elevenlabs stt, elevenlabs transcription, scribe, elevenlabs speech to text, forced alignment, word alignment, subtitle timing, diarization, speaker identification, audio event detection, eleven labs transcribe

Zweck

To accurately transcribe audio into text and provide precise timing information for subtitles and lip-sync, leveraging advanced AI models.

Funktionen

ElevenLabs Scribe v1/v2 models for high accuracy
Support for 90+ languages with auto-detection
Speaker diarization and audio event tagging
Word-level timestamps and forced alignment
Subtitle and karaoke timing generation

Anwendungsfälle

Transcribing meeting recordings with speaker identification
Generating timed captions for videos
Creating transcripts for podcasts
Aligning audio for lip-sync in animations

Nicht-Ziele

Real-time transcription (focus is on processed audio files)
Translation of transcribed text
Editing or proofreading of transcriptions

Installation

npx skills add inferen-sh/skills

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert

98 /100

Analysiert about 24 hours ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber inferen-sh

Sterne433

Status

Quellcode ansehen

Ähnliche Erweiterungen

YouTube Downloader

100

Download and process YouTube content for research. Use when: downloading competitor videos for analysis; extracting audio for podcasts; getting transcripts for content repurposing; archiving webinars; research content curation

Skill

guia-matthieu

Elevenlabs Tts

ElevenLabs text-to-speech with 22+ premium voices, multilingual support, and voice tuning via inference.sh CLI. Models: eleven_multilingual_v2 (highest quality), eleven_turbo_v2_5 (low latency), eleven_flash_v2_5 (ultra-fast). Capabilities: text-to-speech, voice selection, stability/style control, 32 languages. Use for: voiceovers, audiobooks, video narration, podcasts, accessibility, IVR. Triggers: elevenlabs, eleven labs, elevenlabs tts, premium tts, professional voice, ai voice, high quality tts, multilingual tts, eleven labs voice, voice generation, natural speech, realistic voice, voice over, speech synthesis

Skill

inferen-sh

Sheet Music Publisher

Converts mastered audio to sheet music and creates printable songbooks. Use after mastering when the user wants sheet music or a songbook for their album.

Skill

bitwize-music-studio

Cli Anything Videocaptioner

AI-powered video captioning — transcribe speech, optimize/translate subtitles, and burn them into video via the stable VideoCaptioner backend. Free ASR and translation included.

Skill

hkuds

Whisper

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill

Orchestra-Research

Transcribe

Transcribe audio files to text with optional diarization and known-speaker hints. Use when a user asks to transcribe speech from audio/video, extract text from recordings, or label speakers in interviews or meetings.

Skill

openai