Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

LLaVA Large Language and Vision Assistant

Skill Aktiv

Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image chat, visual question answering, and instruction following. Use for vision-language chatbots or image understanding tasks. Best for conversational image analysis.

Zweck

To enable conversational image understanding and visual instruction following through a powerful multimodal large language model.

Funktionen

Conversational image analysis
Visual question answering
Multi-turn image chat
Visual instruction tuning
Support for multiple LLaVA models

Anwendungsfälle

Building vision-language chatbots
Performing visual question answering on images
Generating detailed image descriptions
Following visual instructions

Nicht-Ziele

Providing the highest quality API-based vision models (e.g., GPT-4V)
Simple zero-shot classification (use CLIP)
Image captioning only (use BLIP-2)
Research-only models (use Flamingo)

Workflow

Load pretrained LLaVA model
Process input image
Format conversation prompt with image token
Generate response using the model
Decode and return response

Voraussetzungen

Python 3.8+
PyTorch
Transformers
Pillow
Sufficient GPU VRAM (e.g., ~4GB for 7B 4-bit, ~14GB for 7B FP16)

Trust

warning:Issues AttentionIn the last 90 days, 17 issues were opened and 4 were closed, indicating a slow issue closure rate.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

75 /100

Analysiert about 23 hours ago

Vertrauenssignale

Letzter Commit1 day ago

GitHub-Inhaber davila7

Sterne27.2k

Downloads 23k

LizenzMIT

Websiteaitmpl.com

Status

Quellcode ansehen

LLaVA Large Language and Vision Assistant

Funktionen

Anwendungsfälle

Nicht-Ziele

Workflow

Voraussetzungen

Trust

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Llava

Blip 2 Vision Language

Clip

Blip 2 Vision Language

Segment Anything Model

Azure Ai Contentunderstanding Py