Langkau ke kandungan utama
Kandungan ini belum tersedia dalam bahasa anda dan dipaparkan dalam bahasa Inggeris.

AI Multimodal Processing Skill

Skill Disahkan
95

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Ringkasan AI

This skill provides a unified command-line interface for interacting with the Google Gemini API, enabling processing of audio, images, videos, and documents, as well as image generation. It includes Python scripts for batch processing, media optimization, and document conversion, with clear instructions for API key setup and usage.

Versioning

  • warning:Release ManagementThe SKILL.md frontmatter has 'Manifest Version: n/a' and no other versioning signal (like CHANGELOG or GitHub releases) is apparent. Install instructions do not reference a specific version, potentially defaulting to 'main'.

Code Execution

  • info:ValidationWhile the scripts handle command-line arguments and file paths, explicit schema validation libraries (like Zod or Pydantic) are not visibly used for all inputs, though basic argument parsing is present.

Compliance

  • info:GDPRThe extension processes user-provided documents and media. While it sends this data to the Gemini API for processing, there's no explicit mention of personal data sanitization before sending, though the Gemini API likely has its own privacy measures.

Pemasangan

npx skills add samhvw8/dot-claude

Menjalankan Vercel skills CLI (skills.sh) melalui npx — memerlukan Node.js secara setempat dan sekurang-kurangnya satu ejen yang serasi skills dipasang (Claude Code, Cursor, Codex, …). Menganggap repo mengikut format agentskills.io.

5 months ago
10 stars
MIT
Dikemas kini pada 6 days ago
Lihat sumber