此内容尚未提供您的语言版本,正在以英文显示。

Data Extractor

Skill 警告

AI 摘要

This skill leverages the unstructured Python library to process a wide range of document types, including PDFs, Word docs, emails, and HTML. It automatically detects and partitions elements, extracts text and metadata, and supports advanced features like table structure inference, OCR, and semantic chunking for RAG applications.

Scope

critical:Description qualityThe description is materially misleading as it contains only a single character ('>') and provides no actual information about the extension's functionality, which is contrary to the provided content in SKILL.md.

Documentation

info:Configuration & parameter referenceWhile the SKILL.md provides extensive code examples, it does not explicitly document all configuration options or parameters for the `partition` function or its variations, nor does it detail precedence order for any potential configurations.

Code Execution

info:ValidationThe SKILL.md demonstrates the use of `unstructured` library functions, which likely perform internal validation on file paths and parameters, but explicit schema validation within the skill's logic is not showcased.

Compliance

info:GDPRThe skill extracts data from documents. While it doesn't explicitly handle personal data, the extracted content could potentially contain PII, which would be submitted to the LLM without additional sanitization by this skill itself.

安装

npx skills add claude-office-skills/skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

3 months ago

claude-office-skills

98 stars

MIT

更新于 6 days ago

查看源代码

类似扩展

Table Extractor

Skill

claude-office-skills

Document Parser Skill

Skill

claude-office-skills

PDF to DOCX Converter

Convert PDF files to editable Word documents using pdf2docx

Skill

claude-office-skills

PaddleOCR Document Parsing

Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.

Skill

aidenwu0209

PDF Extraction

Extract text, tables, and metadata from PDFs using pdfplumber

Skill

claude-office-skills

Chat with PDF

Answer questions about PDF content, summarize, and extract information

Skill

claude-office-skills