跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Hugging Face Vision Trainer

技能 已验证 活跃

Trains and fine-tunes vision models for object detection (D-FINE, RT-DETR v2, DETR, YOLOS), image classification (timm models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3 — plus any Transformers classifier), and SAM/SAM2 segmentation using Hugging Face Transformers on Hugging Face Jobs cloud GPUs. Covers COCO-format dataset preparation, Albumentations augmentation, mAP/mAR evaluation, accuracy metrics, SAM segmentation with bbox/point prompts, DiceCE loss, hardware selection, cost estimation, Trackio monitoring, and Hub persistence. Use when users mention training object detection, image classification, SAM, SAM2, segmentation, image matting, DETR, D-FINE, RT-DETR, ViT, timm, MobileNet, ResNet, bounding box models, or fine-tuning vision models on Hugging Face Jobs.

目的

Enables users to train and fine-tune vision models on cloud GPUs without local setup, abstracting away complex configurations and providing end-to-end workflow management.

功能

  • Train object detection, image classification, and SAM/SAM2 segmentation models
  • Utilizes Hugging Face Transformers and Jobs for cloud-based training
  • Supports COCO dataset format, Albumentations augmentation, and mAP/mAR evaluation
  • Automates Hub persistence for trained models and includes cost/time estimation
  • Provides sample scripts and detailed guidance for various vision tasks

使用场景

  • Fine-tuning object detection models like D-FINE or RT-DETR on custom datasets.
  • Training image classification models using timm or Transformers classifiers on cloud GPUs.
  • Fine-tuning SAM/SAM2 for segmentation tasks using bounding box or point prompts.
  • Estimating training time and cost before launching a job.

非目标

  • Training text or language models (use `hugging-face-llm-trainer` skill).
  • Managing local GPU environments or dependencies.
  • Providing general Hugging Face Hub operations (use `hf-cli` skill).

工作流

  1. Verify prerequisites (account, token, dataset).
  2. Validate dataset format using provided inspector script.
  3. Ask user about dataset size preferences and validation split.
  4. Prepare training script based on task (OD, IC, SAM).
  5. Save script locally, submit job via `hf_jobs` or `HfApi`, and report details.

实践

  • Model training workflows
  • Dataset preparation and validation
  • Cloud-based ML operations
  • Experiment tracking and monitoring

先决条件

  • Hugging Face Account with Pro, Team, or Enterprise plan
  • Authenticated login (`hf_whoami`) with token having write permissions
  • Token must be passed in job secrets

安装

/plugin install skills@huggingface-skills

质量评分

已验证
99 /100
1 day ago 分析

信任信号

最近提交2 days ago
星标10.5k
许可证Apache-2.0
状态
查看源代码

类似扩展

Transformers

98

This skill should be used when working with pre-trained transformer models for natural language processing, computer vision, audio, or multimodal tasks. Use for text generation, classification, question answering, translation, summarization, image classification, object detection, speech recognition, and fine-tuning models on custom datasets.

技能
K-Dense-AI

Segment Anything Model

99

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

技能
Orchestra-Research

Huggingface Accelerate

99

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for DeepSpeed/FSDP/Megatron/DDP. Automatic device placement, mixed precision (FP16/BF16/FP8). Interactive config, single launch command. HuggingFace ecosystem standard.

技能
davila7

Senior Computer Vision

98

Computer vision engineering skill for object detection, image segmentation, and visual AI systems. Covers CNN and Vision Transformer architectures, YOLO/Faster R-CNN/DETR detection, Mask R-CNN/SAM segmentation, and production deployment with ONNX/TensorRT. Includes PyTorch, torchvision, Ultralytics, Detectron2, and MMDetection frameworks. Use when building detection pipelines, training custom models, optimizing inference, or deploying vision systems.

技能
alirezarezvani

PyTorch Lightning

100

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard), distributed training (DDP, FSDP, DeepSpeed), for scalable neural network training.

技能
K-Dense-AI

Hf Cli

100

Hugging Face Hub CLI (`hf`) for downloading, uploading, and managing models, datasets, spaces, buckets, repos, papers, jobs, and more on the Hugging Face Hub. Use when: handling authentication; managing local cache; managing Hugging Face Buckets; running or scheduling jobs on Hugging Face infrastructure; managing Hugging Face repos; discussions and pull requests; browsing models, datasets and spaces; reading, searching, or browsing academic papers; managing collections; querying datasets; configuring spaces; setting up webhooks; or deploying and managing HF Inference Endpoints. Make sure to use this skill whenever the user mentions 'hf', 'huggingface', 'Hugging Face', 'huggingface-cli', or 'hugging face cli', or wants to do anything related to the Hugging Face ecosystem and to AI and ML in general. Also use for cloud storage needs like training checkpoints, data pipelines, or agent traces. Use even if the user doesn't explicitly ask for a CLI command. Replaces the deprecated `huggingface-cli`.

技能
huggingface