跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Segment Anything Model

技能 已验证 活跃

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

目的

To enable users to perform zero-shot image segmentation on any object in images using flexible prompts, or to automatically generate all object masks, without requiring task-specific training.

功能

  • Zero-shot image segmentation
  • Flexible prompting (points, boxes, masks)
  • Automatic mask generation
  • Support for multiple model variants (ViT-B/L/H)
  • Clear installation and usage instructions

使用场景

  • Segmenting any object in images without fine-tuning
  • Building interactive annotation tools
  • Generating training data for computer vision models
  • Processing specialized image domains (medical, satellite)

非目标

  • Real-time object detection with predefined classes (use YOLO/Detectron2)
  • Semantic/panoptic segmentation with categories (use Mask2Former)
  • Text-prompted segmentation (use GroundingDINO + SAM)
  • Video segmentation tasks (use SAM 2)

安装

npx skills add davila7/claude-code-templates

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
95 /100
1 day ago 分析

信任信号

最近提交1 day ago
星标27.2k
许可证MIT
状态
查看源代码

类似扩展

Segment Anything Model

99

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

技能
Orchestra-Research

Transformers

98

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

技能
K-Dense-AI

Senior Computer Vision

98

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

技能
alirezarezvani

Clip

98

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

技能
Orchestra-Research

Blip 2 Vision Language

98

Vision-language pre-training framework bridging frozen image encoders and LLMs. Use when you need image captioning, visual question answering, image-text retrieval, or multimodal chat with state-of-the-art zero-shot performance.

技能
Orchestra-Research

Stable Diffusion Image Generation

95

State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.

技能
davila7