Huggingface Vision Trainer

Plugin Verified Active

Train and fine-tune object detection models (RTDETRv2, YOLOS, DETR and others) and image classification models (timm and transformers models — MobileNetV3, MobileViT, ResNet, ViT/DINOv3) using Transformers Trainer API on Hugging Face Jobs infrastructure or locally. Includes COCO dataset format support, Albumentations augmentation, mAP/mAR metrics, trackio tracking, hardware selection, and Hub persistence.

Purpose

To provide a seamless and powerful way for users to train and fine-tune computer vision models without managing local GPU infrastructure, leveraging Hugging Face's cloud capabilities.

Features

Train object detection models (RTDETRv2, YOLOS, DETR)
Train image classification models (timm, transformers)
Train SAM/SAM2 segmentation models
Support for COCO dataset format and Albumentations augmentation
Integration with Hugging Face Jobs for cloud GPU training
Automated dataset validation and Hub persistence

Use Cases

Fine-tuning object detection models on custom datasets.
Training image classification models for specific tasks.
Experimenting with SAM/SAM2 models for segmentation on new data.
Leveraging cloud GPUs for computationally intensive vision model training.

Non-Goals

Running training jobs on local hardware (though scripts can be run locally for inspection).
Providing a graphical user interface for model training.
Managing or providing datasets; users must supply their own datasets on the Hub.

Installation

First, add the marketplace

/plugin marketplace add huggingface/skills

/plugin install huggingface-vision-trainer@huggingface-skills

Quality Score

Verified

96 /100

Analyzed about 14 hours ago

Trust Signals

Last commit1 day ago

GitHub owner huggingface

Stars10.5k

LicenseApache-2.0

Websitehuggingface.co

Status

View Source

Similar Extensions

Autoresearch Agent

100

Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).

Plugin

alirezarezvani

Train Sentence Transformers

Train or fine-tune sentence-transformers models across all three architectures: SentenceTransformer (bi-encoder embeddings), CrossEncoder (rerankers), and SparseEncoder (SPLADE). Covers loss selection, hard-negative mining, evaluators, distillation, LoRA, Matryoshka, and Hugging Face Hub publishing.

Plugin

huggingface

Transformers Js

Run state-of-the-art machine learning models directly in JavaScript/TypeScript for NLP, computer vision, audio processing, and multimodal tasks. Works in Node.js and browsers with WebGPU/WASM using Hugging Face models.

Plugin

huggingface

PM Market Research

Market research skills for PMs: user personas, market segmentation, sentiment analysis, and competitive analysis.

Plugin

phuryn