Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Clip

Skill Aktiv

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

Zweck

To enable AI agents to understand and process images in conjunction with natural language, facilitating tasks like image search and classification without fine-tuning.

Funktionen

  • Zero-shot image classification
  • Image-text similarity and matching
  • Semantic image search
  • Cross-modal retrieval (image-to-text, text-to-image)
  • Content moderation capabilities

Anwendungsfälle

  • Use when performing zero-shot image classification on custom datasets.
  • Use when needing to find images semantically related to a text description.
  • Use for content moderation tasks like detecting NSFW or violent imagery.
  • Use for building vision-language applications requiring understanding of image content.

Nicht-Ziele

  • Use for fine-grained object detection or segmentation.
  • Use for tasks requiring extensive fine-tuning on domain-specific data.
  • Use when only text-based analysis is required.
  • Use for tasks where spatial understanding (position, counting) is critical.

Code Execution

  • info:ValidationWhile the code uses standard libraries and parameter passing, explicit schema validation for inputs like file paths is not evident in the provided snippets.

Execution

  • warning:Pinned dependenciesWhile dependencies are listed, they are not explicitly pinned with versions or accompanied by a lockfile in the SKILL.md, which could lead to compatibility issues.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs
/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

98 /100
Analysiert about 22 hours ago

Vertrauenssignale

Letzter Commit16 days ago
Sterne8.3k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

CLIP

95

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

Skill
davila7

Blip 2 Vision Language

98

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

Skill
Orchestra-Research

Baoyu Imagine

99

AI image generation with OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope, Z.AI GLM-Image, MiniMax, Jimeng, Seedream and Replicate APIs. Supports text-to-image, reference images, aspect ratios, and batch generation from saved prompt files. Sequential by default; use batch parallel generation when the user already has multiple prompts or wants stable multi-image throughput. Use when user asks to generate, create, or draw images.

Skill
jimliu

Whisper

97

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M params). Use for speech-to-text, podcast transcription, or multilingual audio processing. Best for robust, multilingual ASR.

Skill
Orchestra-Research

Llava

96

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

Skill
Orchestra-Research

Segment Anything Model

95

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

Skill
davila7