본문으로 건너뛰기
이 콘텐츠는 아직 사용자의 언어로 제공되지 않아 영어로 표시됩니다.

AI Multimodal Processing Skill

Skill 확인됨
95

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

AI 요약

This skill provides a unified command-line interface for interacting with the Google Gemini API, enabling processing of audio, images, videos, and documents, as well as image generation. It includes Python scripts for batch processing, media optimization, and document conversion, with clear instructions for API key setup and usage.

Versioning

  • warning:Release ManagementThe SKILL.md frontmatter has 'Manifest Version: n/a' and no other versioning signal (like CHANGELOG or GitHub releases) is apparent. Install instructions do not reference a specific version, potentially defaulting to 'main'.

Code Execution

  • info:ValidationWhile the scripts handle command-line arguments and file paths, explicit schema validation libraries (like Zod or Pydantic) are not visibly used for all inputs, though basic argument parsing is present.

Compliance

  • info:GDPRThe extension processes user-provided documents and media. While it sends this data to the Gemini API for processing, there's no explicit mention of personal data sanitization before sending, though the Gemini API likely has its own privacy measures.

설치

npx skills add samhvw8/dot-claude

Vercel skills CLI(skills.sh)를 npx로 실행합니다. 로컬에 Node.js와 skills 호환 에이전트(Claude Code, Cursor, Codex 등) 중 하나 이상이 설치되어 있어야 합니다. 저장소가 agentskills.io 형식을 따른다고 가정합니다.

5 months ago
10 stars
MIT
5 days ago에 업데이트됨
소스 코드 보기