[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"extension-skill-samhvw8-ai-multimodal-uk":3,"guides-for-samhvw8-ai-multimodal":223,"similar-k17c4avaab2db2m79et4f4hnwn867qj1":224},{"_creationTime":4,"_id":5,"children":6,"community":7,"display":9,"evaluation":24,"identity":191,"isFallback":196,"parentExtension":197,"providers":198,"relations":202,"repo":204,"workflow":220},1778054812528.7214,"k17c4avaab2db2m79et4f4hnwn867qj1",[],{"reviewCount":8},0,{"description":10,"installMethods":11,"name":12,"sourceUrl":13,"tags":14},"Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.",{},"AI Multimodal Processing Skill","https://github.com/samhvw8/dot-claude/tree/HEAD/skills/ai-multimodal",[15,16,17,18,19,20,21,22,23],"gemini-api","multimodal","audio","image","video","document-processing","text-to-image","ocr","transcription",{"_creationTime":25,"_id":26,"extensionId":5,"locale":27,"result":28,"trustSignals":179,"workflow":189},1778054896678.3242,"kn7fq0cw89vg272tbqz80yjwdd8660ha","en",{"checks":29,"evaluatedAt":169,"extensionSummary":170,"promptVersionExtension":171,"promptVersionScoring":172,"rationale":173,"score":174,"summary":175,"tags":176,"targetMarket":177,"tier":178},[30,35,38,41,45,48,52,56,59,62,66,71,74,78,81,84,87,90,93,96,100,104,108,113,118,121,124,127,131,134,137,140,143,146,150,153,156,159,162,166],{"category":31,"check":32,"severity":33,"summary":34},"Practical Utility","Problem relevance","pass","The displayed description clearly names a concrete user problem: processing various media types (audio, image, video, documents) and generating images using the Google Gemini API.",{"category":31,"check":36,"severity":33,"summary":37},"Unique selling proposition","The extension offers a unified interface for Google Gemini's multimodal capabilities, providing significant value beyond directly using the API by abstracting complexities and offering a structured approach across diverse media types.",{"category":31,"check":39,"severity":33,"summary":40},"Production readiness","The extension appears production-ready, covering a wide range of multimedia processing tasks and providing clear instructions for setup, usage, and optimization, including fallback mechanisms and detailed documentation.",{"category":42,"check":43,"severity":33,"summary":44},"Scope","Single responsibility principle","The extension focuses on multimodal AI processing via the Google Gemini API, covering audio, image, video, and document tasks, along with image generation, which is a coherent and well-defined scope.",{"category":42,"check":46,"severity":33,"summary":47},"Description quality","The displayed description is comprehensive, accurate, and well-structured, detailing capabilities, keywords, and use cases without being overly verbose or keyword-stuffed.",{"category":49,"check":50,"severity":33,"summary":51},"Invocation","Scoped tools","The extension uses specific Python scripts for distinct tasks (e.g., gemini_batch_process.py, media_optimizer.py, document_converter.py), indicating a good separation of concerns rather than a single generalist tool.",{"category":53,"check":54,"severity":33,"summary":55},"Documentation","Configuration & parameter reference","API key configuration is clearly documented with a priority order, and script parameters are explained via --help messages, covering necessary configurations.",{"category":42,"check":57,"severity":33,"summary":58},"Tool naming","The Python scripts (gemini_batch_process.py, media_optimizer.py, document_converter.py) have descriptive, kebab-case names relevant to their function.",{"category":42,"check":60,"severity":33,"summary":61},"Minimal I/O surface","The scripts use command-line arguments for input and output specifications, which are typical for utility scripts and seem well-defined for their respective tasks.",{"category":63,"check":64,"severity":33,"summary":65},"License","License usability","The extension explicitly declares the MIT license in the SKILL.md frontmatter, which is a permissive open-source license.",{"category":67,"check":68,"severity":69,"summary":70},"Maintenance","Commit recency","not_applicable","No commit history available for judging recency. The provided data shows 'n/a' for 'Last commit on default branch (pushedAt)'.",{"category":67,"check":72,"severity":33,"summary":73},"Dependency Management","The project includes requirements.txt files for its Python dependencies, facilitating management and updates. The presence of version specifiers suggests an awareness of dependency versions.",{"category":75,"check":76,"severity":33,"summary":77},"Security","Secret Management","The extension correctly handles API keys by checking environment variables and .env files with a defined priority order, avoiding hardcoding secrets in scripts.",{"category":75,"check":79,"severity":33,"summary":80},"Injection","The scripts process user-provided files and prompts, but there's no indication of executing arbitrary code or following instructions within loaded data. File inputs are treated as data for the Gemini API.",{"category":75,"check":82,"severity":33,"summary":83},"Transitive Supply-Chain Grenades","The extension relies on installed Python packages and the Gemini API. It does not fetch remote code or data at runtime for execution, keeping the supply chain contained within the bundle and managed dependencies.",{"category":75,"check":85,"severity":33,"summary":86},"Sandbox Isolation","The scripts operate as standalone Python executables, processing files and interacting with the Gemini API. They do not appear to make file system changes outside of intended output operations or access user-specific paths without explicit configuration.",{"category":75,"check":88,"severity":33,"summary":89},"Sandbox escape primitives","The Python scripts are standard executables and do not contain obvious primitives for sandbox escape such as detached processes or retry loops around denied calls.",{"category":75,"check":91,"severity":33,"summary":92},"Data Exfiltration","The extension's primary function is to send data to the Gemini API. It correctly uses API keys from environment variables or .env files and does not appear to have any undocumented outbound calls for telemetry or other purposes.",{"category":75,"check":94,"severity":33,"summary":95},"Hidden Text Tricks","The bundled scripts and markdown files appear to be free of hidden-text tricks or obfuscation techniques designed to mislead the model or curator.",{"category":97,"check":98,"severity":33,"summary":99},"Hooks","Opaque code execution","The Python scripts are provided in plain, readable source code format, with no evidence of obfuscation, base64 payloads, or runtime code fetching.",{"category":101,"check":102,"severity":33,"summary":103},"Portability","Structural Assumption","The scripts correctly locate API keys using a flexible priority order of environment variables and .env files, and find project root for outputting to 'docs/assets/', mitigating assumptions about specific project structures.",{"category":105,"check":106,"severity":69,"summary":107},"Trust","Issues Attention","No issue tracking data is available (n/a for 'Issues Opened' and 'Issues Closed').",{"category":109,"check":110,"severity":111,"summary":112},"Versioning","Release Management","warning","The SKILL.md frontmatter has 'Manifest Version: n/a' and no other versioning signal (like CHANGELOG or GitHub releases) is apparent. Install instructions do not reference a specific version, potentially defaulting to 'main'.",{"category":114,"check":115,"severity":116,"summary":117},"Code Execution","Validation","info","While the scripts handle command-line arguments and file paths, explicit schema validation libraries (like Zod or Pydantic) are not visibly used for all inputs, though basic argument parsing is present.",{"category":75,"check":119,"severity":33,"summary":120},"Unguarded Destructive Operations","The scripts are primarily focused on data processing and generation. They do not contain obvious destructive primitives like file deletion or infrastructure changes that would require additional guards.",{"category":114,"check":122,"severity":33,"summary":123},"Error Handling","The scripts include retry logic with exponential backoff for API calls and provide informative error messages for common issues like missing API keys or failed processing.",{"category":114,"check":125,"severity":33,"summary":126},"Logging","The scripts provide verbose output options (-v) which include detailed logs about file processing, uploads, and optimization steps, allowing for user review.",{"category":128,"check":129,"severity":116,"summary":130},"Compliance","GDPR","The extension processes user-provided documents and media. While it sends this data to the Gemini API for processing, there's no explicit mention of personal data sanitization before sending, though the Gemini API likely has its own privacy measures.",{"category":128,"check":132,"severity":33,"summary":133},"Target market","The extension is a general-purpose media processing tool using a global API and does not appear to have any region-specific logic or constraints, making it globally applicable.",{"category":101,"check":135,"severity":33,"summary":136},"Runtime stability","The Python scripts are designed to run with standard Python interpreters and rely on external libraries specified in requirements.txt. They handle dependencies and API key discovery flexibly, indicating good portability across environments.",{"category":49,"check":138,"severity":33,"summary":139},"Precise Purpose","The SKILL.md frontmatter and the script docstrings clearly define the extension's purpose (multimodal AI processing via Gemini API) and its specific tasks (transcription, analysis, generation, etc.), including usage guidelines.",{"category":49,"check":141,"severity":33,"summary":142},"Concise Frontmatter","The SKILL.md frontmatter is concise and effectively summarizes the core capabilities and keywords, providing sufficient information for routing.",{"category":53,"check":144,"severity":33,"summary":145},"Concise Body","The SKILL.md body is well-structured with clear sections for capabilities, model selection, quick start, etc., and does not appear excessively long, delegating deeper details to reference files.",{"category":147,"check":148,"severity":33,"summary":149},"Context","Progressive Disclosure","The SKILL.md file effectively outlines the core functionality and links to detailed `references/*.md` files for specific areas like audio processing, image understanding, video analysis, and image generation.",{"category":147,"check":151,"severity":69,"summary":152},"Forked exploration","This extension is a set of utility scripts for processing and generating media; it does not involve deep code review or extensive exploration that would necessitate `context: fork`.",{"category":31,"check":154,"severity":33,"summary":155},"Usage examples","The SKILL.md file provides numerous ready-to-use examples for common tasks across all modalities, including input commands and expected outcomes.",{"category":31,"check":157,"severity":33,"summary":158},"Edge cases","The documentation addresses various aspects like token costs, rate limits, file size limits, video processing limitations, and provides error handling for common issues, indicating consideration for edge cases and failure modes.",{"category":114,"check":160,"severity":69,"summary":161},"Tool Fallback","The extension does not rely on external tools like MCP servers; its functionality is self-contained within the provided Python scripts and the Gemini API.",{"category":163,"check":164,"severity":33,"summary":165},"Safety","Halt on unexpected state","The scripts include error handling and retry mechanisms. While not explicitly listing preconditions in a machine-readable checklist, they halt execution and report errors on issues like missing API keys or failed API calls.",{"category":101,"check":167,"severity":33,"summary":168},"Cross-skill coupling","The extension is self-contained and does not implicitly rely on other skills being loaded. Its functionality is independent and clearly defined by the provided scripts and documentation.",1778054836112,"This skill provides a unified command-line interface for interacting with the Google Gemini API, enabling processing of audio, images, videos, and documents, as well as image generation. It includes Python scripts for batch processing, media optimization, and document conversion, with clear instructions for API key setup and usage.","2.0.0","3.4.0","This extension is highly polished, with excellent documentation, clear examples, robust error handling, and a well-defined scope covering multimodal AI processing via the Gemini API. The only minor warning is the lack of explicit versioning, which is a common omission for standalone scripts. The provided scripts are well-structured, runnable, and demonstrate a strong understanding of best practices.",95,"A comprehensive and production-ready multimodal AI processing skill leveraging the Google Gemini API.",[15,16,17,18,19,20,21,22,23],"global","verified",{"codeQuality":180,"collectedAt":181,"documentation":182,"maintenance":184,"security":185,"testCoverage":188},{},1778054816353,{"descriptionLength":183,"readmeSize":8},925,{},{"hasNpmPackage":186,"license":187,"smitheryVerified":186},false,"MIT",{"hasCi":186,"hasTests":186},{"updatedAt":190},1778054896678,{"githubOwner":192,"githubRepo":193,"locale":27,"slug":194,"type":195},"samhvw8","dot-claude","ai-multimodal","skill",true,null,{"extract":199,"llm":201},{"commitSha":200,"license":187},"28c76162116d2eedab131c0e1548fdc76a2999f7",{"promptVersionExtension":171,"promptVersionScoring":172,"score":174,"targetMarket":177,"tier":178},{"repoId":203},"kd79ad9dpqazy79y2s6rvajgjn865xek",{"_creationTime":205,"_id":203,"identity":206,"providers":208,"workflow":217},1777995558409.872,{"githubOwner":192,"githubRepo":193,"sourceUrl":207},"https://github.com/samhvw8/dot-claude",{"discover":209,"github":212},{"sources":210},[211],"skills-sh",{"closedIssues90d":8,"forks":8,"openIssues90d":213,"pushedAt":214,"readmeSize":8,"stars":215,"topics":216},1,1765248784000,10,[],{"discoverAt":218,"extractAt":219,"githubAt":219,"updatedAt":219},1777995558409,1778054814968,{"anyEnrichmentAt":221,"extractAt":222,"githubAt":221,"llmAt":190,"updatedAt":190},1778054813688,1778054812528,[],[225,253,280,307,334,363],{"_creationTime":226,"_id":227,"community":228,"display":229,"identity":238,"providers":242,"relations":247,"workflow":249},1778053148350.4788,"k17dj7ajf0w4cb3z4qw4khc5v18662n4",{"reviewCount":8},{"description":230,"name":231,"sourceUrl":232,"tags":233},"Automate audio/video transcription, meeting notes, subtitle generation, and content processing","Transcription Automation","https://github.com/claude-office-skills/skills/tree/HEAD/transcription-automation",[23,17,19,234,235,236,237],"meetings","subtitles","multimedia","mcp",{"githubOwner":239,"githubRepo":240,"locale":27,"slug":241,"type":195},"claude-office-skills","skills","transcription-automation",{"extract":243,"llm":245},{"commitSha":244,"license":187},"9c4c7d5cd2813a8936bf2c9fdb174ea883b85a11",{"promptVersionExtension":171,"promptVersionScoring":172,"score":246,"targetMarket":177,"tier":178},92,{"repoId":248},"kd7fw7xbj58qc2z8whrrjptbed8659db",{"anyEnrichmentAt":250,"extractAt":251,"githubAt":250,"llmAt":252,"updatedAt":252},1778053151766,1778053148350,1778053561145,{"_creationTime":254,"_id":255,"community":256,"display":257,"identity":266,"providers":269,"relations":273,"workflow":275},1778000156777.95,"k175n9d53268ka9ydn3pw9fspn864had",{"reviewCount":8},{"description":258,"installMethods":259,"name":260,"sourceUrl":261,"tags":262},"Extension from 0xKaroshi/contendeo-mcp",{},"Contendeo","https://github.com/0xKaroshi/contendeo-mcp",[19,16,23,263,22,237,264,265],"vision","authentication","remote-service",{"githubOwner":267,"githubRepo":268,"locale":27,"slug":268,"type":195},"0xKaroshi","contendeo-mcp",{"extract":270,"smithery":272},{"commitSha":271,"license":187},"fac117ce3d5fb8501df582709809a626b0ccdb23",{"qualityScore":8,"totalActivations":8,"uniqueUsers":8,"useCount":8,"verified":186},{"repoId":274},"kd7cxtghwmakmyhvyt9r5bvdk98643ht",{"anyEnrichmentAt":276,"extractAt":277,"githubAt":278,"invalidatedAt":276,"llmAt":279,"smitheryAt":276,"updatedAt":276},1778007780389,1778000156778,1778000157007,1778006249801,{"_creationTime":281,"_id":282,"community":283,"display":284,"identity":294,"providers":296,"relations":301,"workflow":303},1778053440456.6584,"k17120x7me8p1n30wxpg972esx866b8q",{"reviewCount":8},{"description":285,"installMethods":286,"name":287,"sourceUrl":288,"tags":289},"Transcribe audio to text using ElevenLabs Scribe. Supports batch transcription, realtime streaming from URLs, microphone input, and local files.",{},"ElevenLabs Speech-to-Text","https://github.com/elevenlabs/skills/tree/HEAD/openclaw/elevenlabs-transcribe",[23,17,290,291,292,293],"elevenlabs","python","realtime","batch",{"githubOwner":290,"githubRepo":240,"locale":27,"slug":295,"type":195},"elevenlabs-transcribe",{"extract":297,"llm":299},{"commitSha":298,"license":187},"b476f0ccf4be0e22b2e77cc39307665425d1472b",{"promptVersionExtension":171,"promptVersionScoring":172,"score":300,"targetMarket":177,"tier":178},98,{"repoId":302},"kd71z3hz1pg97d1k2d6kaqeqtx864knt",{"anyEnrichmentAt":304,"extractAt":305,"githubAt":304,"llmAt":306,"updatedAt":306},1778053440833,1778053440456,1778053480675,{"_creationTime":308,"_id":309,"community":310,"display":311,"identity":321,"providers":324,"relations":328,"workflow":330},1778054691785.2515,"k17ev68gbw25zazp0w5z2a61hd8662cc",{"reviewCount":8},{"description":312,"installMethods":313,"name":314,"sourceUrl":315,"tags":316},"Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.",{},"ASR (Speech to Text) Skill","https://github.com/answerzhao/agent-skills/tree/HEAD/glm-skills/ASR",[317,318,23,319,320,17],"asr","speech-to-text","sdk","cli",{"githubOwner":322,"githubRepo":323,"locale":27,"slug":317,"type":195},"answerzhao","agent-skills",{"extract":325,"llm":327},{"commitSha":326,"license":187},"aad73edbd0d9ffbc3d6a402b6eafa6dab96d5ebb",{"promptVersionExtension":171,"promptVersionScoring":172,"score":174,"targetMarket":177,"tier":178},{"repoId":329},"kd712v2g1pay70swwj0jpv2ggs864zgh",{"anyEnrichmentAt":331,"extractAt":332,"githubAt":331,"llmAt":333,"updatedAt":333},1778054692243,1778054691785,1778054738050,{"_creationTime":335,"_id":336,"community":337,"display":338,"identity":349,"providers":352,"relations":357,"workflow":359},1778054035325.875,"k1786xrb93cze519jccqw2h6hx867q00",{"reviewCount":8},{"description":339,"installMethods":340,"name":341,"sourceUrl":342,"tags":343},"Video and audio processing with FFmpeg. Use for format conversion, resizing, compression, audio extraction, and preparing assets for Remotion. Triggers include converting GIF to MP4, resizing video, extracting audio, compressing files, or any media transformation task.",{},"FFmpeg for Video Production","https://github.com/digitalsamba/claude-code-video-toolkit/tree/HEAD/.claude/skills/ffmpeg",[344,19,17,345,346,347,348],"ffmpeg","media","remotion","processing","conversion",{"githubOwner":350,"githubRepo":351,"locale":27,"slug":344,"type":195},"digitalsamba","claude-code-video-toolkit",{"extract":353,"llm":356},{"commitSha":354,"license":355},"dc1bbd251ef137bde9cf460bacb88f13adb3a808","MIT-0",{"promptVersionExtension":171,"promptVersionScoring":172,"score":174,"targetMarket":177,"tier":178},{"repoId":358},"kd77w77a4w1f7nnb9v4fmh2eb1865dn1",{"anyEnrichmentAt":360,"extractAt":361,"githubAt":360,"llmAt":362,"updatedAt":362},1778054036248,1778054035325,1778054079849,{"_creationTime":364,"_id":365,"community":366,"display":367,"identity":373,"providers":374,"relations":377,"workflow":378},1778053440456.66,"k176861yt3z945kzntpp4a5m95866aq8",{"reviewCount":8},{"description":368,"installMethods":369,"name":287,"sourceUrl":370,"tags":371},"Transcribe audio to text using ElevenLabs Scribe v2. Use when converting audio/video to text, generating subtitles, transcribing meetings, or processing spoken content.",{},"https://github.com/elevenlabs/skills/tree/HEAD/speech-to-text",[23,17,290,372,318],"api",{"githubOwner":290,"githubRepo":240,"locale":27,"slug":318,"type":195},{"extract":375,"llm":376},{"commitSha":298,"license":187},{"promptVersionExtension":171,"promptVersionScoring":172,"score":174,"targetMarket":177,"tier":178},{"repoId":302},{"anyEnrichmentAt":304,"extractAt":305,"githubAt":304,"llmAt":306,"updatedAt":306}]