Skip to main content

Clip

Skill Active

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

Purpose

To enable AI agents to understand and process images in conjunction with natural language, facilitating tasks like image search and classification without fine-tuning.

Features

  • Zero-shot image classification
  • Image-text similarity and matching
  • Semantic image search
  • Cross-modal retrieval (image-to-text, text-to-image)
  • Content moderation capabilities

Use Cases

  • Use when performing zero-shot image classification on custom datasets.
  • Use when needing to find images semantically related to a text description.
  • Use for content moderation tasks like detecting NSFW or violent imagery.
  • Use for building vision-language applications requiring understanding of image content.

Non-Goals

  • Use for fine-grained object detection or segmentation.
  • Use for tasks requiring extensive fine-tuning on domain-specific data.
  • Use when only text-based analysis is required.
  • Use for tasks where spatial understanding (position, counting) is critical.

Code Execution

  • info:ValidationWhile the code uses standard libraries and parameter passing, explicit schema validation for inputs like file paths is not evident in the provided snippets.

Execution

  • warning:Pinned dependenciesWhile dependencies are listed, they are not explicitly pinned with versions or accompanied by a lockfile in the SKILL.md, which could lead to compatibility issues.

Installation

First, add the marketplace

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Quality Score

98 /100
Analyzed about 22 hours ago

Trust Signals

Last commit16 days ago
Stars8.3k
LicenseMIT
Status
View Source

Similar Extensions

CLIP

95

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

Skill
davila7

Blip 2 Vision Language

98

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

Skill
Orchestra-Research

Baoyu Imagine

99

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.

Skill
jimliu

Whisper

97

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill
Orchestra-Research

Llava

96

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

Skill
Orchestra-Research

Segment Anything Model

95

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

Skill
davila7

© 2025 SkillRepo · Find the right skill, skip the noise.