ข้ามไปยังเนื้อหาหลัก
เนื้อหานี้ยังไม่มีในภาษาของคุณ และกำลังแสดงเป็นภาษาอังกฤษ

Smart OCR Skill

Skill ยืนยันแล้ว
92

>

สรุปโดย AI

This skill leverages the PaddleOCR engine to extract text from images, scanned PDFs, and handwritten documents. It supports multilingual processing, provides detailed configuration options, and includes practical examples for business cards, receipts, and multi-language documents.

Scope

  • warning:Description qualityThe 'Displayed Description' field is minimal (a single '>'), providing no useful information about the skill's functionality.

Code Execution

  • warning:ValidationWhile configuration options are documented, the provided Python snippets for processing images and PDFs do not explicitly show the use of a schema library for input validation or sanitization of file paths and arguments.

Compliance

  • info:GDPRThe skill processes document content which may include personal data. While not directly submitting this data to a 3rd party, it's processed by the LLM, and no explicit sanitization is detailed.

การติดตั้ง

npx skills add claude-office-skills/skills

เรียกใช้ Vercel skills CLI (skills.sh) ผ่าน npx — ต้องติดตั้ง Node.js ในเครื่อง และมี agent ที่รองรับ skills อย่างน้อยหนึ่งตัว (Claude Code, Cursor, Codex, …) ทั้งนี้สมมติว่า repo ใช้รูปแบบ agentskills.io

3 months ago
98 stars
MIT
อัปเดตเมื่อ 2 days ago
ดูซอร์สโค้ด

ส่วนขยายที่คล้ายกัน

Document Parser Skill

92

>

Skill
claude-office-skills

PaddleOCR Document Parsing

98

Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.

Skill
aidenwu0209

AI Multimodal Processing Skill

95

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentation, visual Q&A), video (scene detection, 6hr max, YouTube URLs, temporal analysis), documents (PDF extraction, tables, forms, charts), image generation (text-to-image, editing). Actions: transcribe, analyze, extract, caption, detect, segment, generate from media. Keywords: Gemini API, audio transcription, image captioning, OCR, object detection, video analysis, PDF extraction, text-to-image, multimodal, speech recognition, visual Q&A, scene detection, YouTube transcription, table extraction, form processing, image generation, Imagen. Use when: transcribing audio/video, analyzing images/screenshots, extracting data from PDFs, processing YouTube videos, generating images from text, implementing multimodal AI features.

Skill
samhvw8

PDF Extraction

95

Extract text, tables, and metadata from PDFs using pdfplumber

Skill
claude-office-skills

Table Extractor

92

>

Skill
claude-office-skills

Layout Analyzer

90

>

Skill
claude-office-skills