[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"extension-skill-claude-office-skills-data-extractor-ms":3,"guides-for-claude-office-skills-data-extractor":221,"similar-k1777kfm9xh9cdd70xedsxbres8664js":222},{"_creationTime":4,"_id":5,"children":6,"community":7,"display":9,"evaluation":21,"identity":187,"isFallback":192,"parentExtension":193,"providers":194,"relations":198,"repo":200,"workflow":218},1778053148350.4329,"k1777kfm9xh9cdd70xedsxbres8664js",[],{"reviewCount":8},0,{"description":10,"installMethods":11,"name":12,"sourceUrl":13,"tags":14},">",{},"Data Extractor","https://github.com/claude-office-skills/skills/tree/HEAD/data-extractor",[15,16,17,18,19,20],"parsing","extraction","data","unstructured","pdf","python",{"_creationTime":22,"_id":23,"extensionId":5,"locale":24,"result":25,"trustSignals":175,"workflow":185},1778053561145.62,"kn7d9dchrvfwpzzegffv72bp7h867s9f","en",{"checks":26,"evaluatedAt":165,"extensionSummary":166,"promptVersionExtension":167,"promptVersionScoring":168,"rationale":169,"score":170,"summary":171,"tags":172,"targetMarket":173,"tier":174},[27,32,35,38,42,46,51,56,59,62,66,70,73,77,80,83,86,89,92,95,98,102,106,110,114,117,120,123,127,130,133,136,139,142,146,149,152,155,158,162],{"category":28,"check":29,"severity":30,"summary":31},"Practical Utility","Problem relevance","pass","The description clearly states the skill's purpose: extracting structured data from documents using the unstructured library, addressing the common pain point of handling diverse document formats.",{"category":28,"check":33,"severity":30,"summary":34},"Unique selling proposition","The skill offers significant value beyond a simple prompt by abstracting the complexity of the unstructured library and providing a unified interface for various document formats, enabling consistent data extraction.",{"category":28,"check":36,"severity":30,"summary":37},"Production readiness","The skill is production-ready, offering a clear interface for document extraction, supporting multiple formats, and providing practical examples and best practices for integration.",{"category":39,"check":40,"severity":30,"summary":41},"Scope","Single responsibility principle","The skill is focused solely on document data extraction and parsing, aligning with its name and description without venturing into unrelated domains.",{"category":39,"check":43,"severity":44,"summary":45},"Description quality","critical","The description is materially misleading as it contains only a single character ('>') and provides no actual information about the extension's functionality, which is contrary to the provided content in SKILL.md.",{"category":47,"check":48,"severity":49,"summary":50},"Invocation","Scoped tools","not_applicable","This skill does not expose specific tools; it acts as an interface to the unstructured library.",{"category":52,"check":53,"severity":54,"summary":55},"Documentation","Configuration & parameter reference","info","While the SKILL.md provides extensive code examples, it does not explicitly document all configuration options or parameters for the `partition` function or its variations, nor does it detail precedence order for any potential configurations.",{"category":39,"check":57,"severity":49,"summary":58},"Tool naming","The skill does not expose user-facing tools with distinct names; it acts as a wrapper for the unstructured library's functionality.",{"category":39,"check":60,"severity":30,"summary":61},"Minimal I/O surface","The skill's interface, as implied by its SKILL.md, takes a file path and optional extraction parameters, and returns structured elements, which is a focused I/O surface.",{"category":63,"check":64,"severity":30,"summary":65},"License","License usability","The extension includes an MIT license file and declares MIT in the SKILL.md frontmatter, indicating a permissive open-source license.",{"category":67,"check":68,"severity":49,"summary":69},"Maintenance","Commit recency","No commit history is available for evaluation.",{"category":67,"check":71,"severity":30,"summary":72},"Dependency Management","The SKILL.md clearly lists installation instructions for the `unstructured` library and its dependencies (e.g., `pip install unstructured[all-docs]`), indicating a managed dependency approach.",{"category":74,"check":75,"severity":49,"summary":76},"Security","Secret Management","The skill does not appear to handle or expose any secrets.",{"category":74,"check":78,"severity":30,"summary":79},"Injection","The skill operates on provided documents and uses the `unstructured` library, which is generally designed to treat loaded content as data rather than instructions. The SKILL.md does not indicate any mechanisms for executing arbitrary code from loaded documents.",{"category":74,"check":81,"severity":30,"summary":82},"Transitive Supply-Chain Grenades","The skill relies on the `unstructured` Python library. The SKILL.md does not show any runtime downloads or execution of untrusted remote code. All dependencies are managed via pip.",{"category":74,"check":84,"severity":30,"summary":85},"Sandbox Isolation","The skill operates on provided files and relies on the `unstructured` library, which is expected to operate within its own execution environment and not modify files outside of its intended scope. No evidence of filesystem modification outside the bundle is present.",{"category":74,"check":87,"severity":30,"summary":88},"Sandbox escape primitives","No evidence of sandbox escape primitives like detached processes or retry loops around denied calls was found in the provided SKILL.md.",{"category":74,"check":90,"severity":30,"summary":91},"Data Exfiltration","The skill's function is to extract data from documents; there are no imperative instructions to read and submit confidential data or undocumented outbound calls.",{"category":74,"check":93,"severity":30,"summary":94},"Hidden Text Tricks","The bundled files (SKILL.md, LICENSE) are free of hidden-steering tricks, invisible characters, or obfuscated content.",{"category":74,"check":96,"severity":30,"summary":97},"Opaque code execution","The SKILL.md uses standard Python imports and library calls, with no evidence of obfuscation, base64-decoded payloads, or runtime code fetching.",{"category":99,"check":100,"severity":30,"summary":101},"Portability","Structural Assumption","The skill operates on provided file paths and does not make assumptions about the user's project structure beyond the ability to access the input file.",{"category":103,"check":104,"severity":49,"summary":105},"Trust","Issues Attention","No issue tracker data available for evaluation.",{"category":107,"check":108,"severity":30,"summary":109},"Versioning","Release Management","The SKILL.md frontmatter explicitly declares `version: \"1.0\"`, fulfilling the release management requirement.",{"category":111,"check":112,"severity":54,"summary":113},"Code Execution","Validation","The SKILL.md demonstrates the use of `unstructured` library functions, which likely perform internal validation on file paths and parameters, but explicit schema validation within the skill's logic is not showcased.",{"category":74,"check":115,"severity":30,"summary":116},"Unguarded Destructive Operations","The skill is focused on data extraction and does not perform any destructive operations on files or infrastructure.",{"category":111,"check":118,"severity":30,"summary":119},"Error Handling","The provided Python code examples within SKILL.md show basic error handling (e.g., try-except blocks for file processing and batch operations), suggesting a reasonable approach to unexpected states.",{"category":111,"check":121,"severity":49,"summary":122},"Logging","The skill's primary function is data extraction, and the SKILL.md does not indicate any need for local audit logging of destructive actions or outbound calls.",{"category":124,"check":125,"severity":54,"summary":126},"Compliance","GDPR","The skill extracts data from documents. While it doesn't explicitly handle personal data, the extracted content could potentially contain PII, which would be submitted to the LLM without additional sanitization by this skill itself.",{"category":124,"check":128,"severity":30,"summary":129},"Target market","The skill processes documents based on their format and content, with no apparent regional or jurisdictional limitations. The target market is global.",{"category":99,"check":131,"severity":30,"summary":132},"Runtime stability","The skill relies on standard Python libraries and the `unstructured` package, which is designed for multi-platform compatibility. No specific OS or shell assumptions are evident.",{"category":47,"check":134,"severity":30,"summary":135},"Precise Purpose","The SKILL.md clearly states the purpose (data extraction from any document format using `unstructured`) and provides numerous examples of use cases and prompts, making its scope and utility evident.",{"category":47,"check":137,"severity":30,"summary":138},"Concise Frontmatter","The frontmatter is dense and effectively summarizes the core capability (data extraction via `unstructured`) and lists example prompts, providing clear routing information.",{"category":52,"check":140,"severity":30,"summary":141},"Concise Body","The SKILL.md body is well-structured, using code blocks and sections effectively. While lengthy, it delegates deeper material like code examples to their respective sections and does not appear to bloat token consumption unnecessarily.",{"category":143,"check":144,"severity":30,"summary":145},"Context","Progressive Disclosure","The SKILL.md demonstrates good progressive disclosure by embedding Python code snippets and explanations within the main file rather than excessively linking to external references for core functionality.",{"category":143,"check":147,"severity":49,"summary":148},"Forked exploration","This skill is not designed for deep exploration or code review; it performs a specific extraction task and returns results directly.",{"category":28,"check":150,"severity":30,"summary":151},"Usage examples","The SKILL.md provides numerous end-to-end, ready-to-use Python code examples for various scenarios like paper extraction, invoice parsing, and corpus building, demonstrating clear inputs, invocations, and expected outcomes.",{"category":28,"check":153,"severity":30,"summary":154},"Edge cases","The 'Limitations' section addresses potential edge cases such as complex layouts, OCR quality, large files, unsupported formats, and API rate limits, providing context for users.",{"category":111,"check":156,"severity":49,"summary":157},"Tool Fallback","The skill directly utilizes the `unstructured` Python library and does not appear to have external tool dependencies that would require fallbacks.",{"category":159,"check":160,"severity":30,"summary":161},"Safety","Halt on unexpected state","The provided Python examples include try-except blocks for file processing and batch operations, indicating that the skill is designed to halt and report on unexpected states rather than continuing silently.",{"category":99,"check":163,"severity":30,"summary":164},"Cross-skill coupling","The skill is self-contained and focuses on document parsing; it does not implicitly rely on or cross-link to other skills.",1778053269862,"This skill leverages the unstructured Python library to process a wide range of document types, including PDFs, Word docs, emails, and HTML. It automatically detects and partitions elements, extracts text and metadata, and supports advanced features like table structure inference, OCR, and semantic chunking for RAG applications.","2.0.0","3.4.0","The skill provides robust functionality for document data extraction with comprehensive documentation and examples. The critical finding for 'Description quality' stems from the placeholder description in the provided context, which is directly contradicted by the rich content in SKILL.md. This is a metadata issue rather than a functional one.",85,"A highly capable skill for extracting structured data from various document formats using the unstructured library.",[15,16,17,18,19,20],"global","flagged",{"codeQuality":176,"collectedAt":177,"documentation":178,"maintenance":180,"security":181,"testCoverage":184},{},1778053253015,{"descriptionLength":179,"readmeSize":8},1,{},{"hasNpmPackage":182,"license":183,"smitheryVerified":182},false,"MIT",{"hasCi":182,"hasTests":182},{"updatedAt":186},1778053561145,{"githubOwner":188,"githubRepo":189,"locale":24,"slug":190,"type":191},"claude-office-skills","skills","data-extractor","skill",true,null,{"extract":195,"llm":197},{"commitSha":196,"license":183},"9c4c7d5cd2813a8936bf2c9fdb174ea883b85a11",{"promptVersionExtension":167,"promptVersionScoring":168,"score":170,"targetMarket":173,"tier":174},{"repoId":199},"kd7fw7xbj58qc2z8whrrjptbed8659db",{"_creationTime":201,"_id":199,"identity":202,"providers":204,"workflow":215},1777995558409.8474,{"githubOwner":188,"githubRepo":189,"sourceUrl":203},"https://github.com/claude-office-skills/skills",{"discover":205,"github":208},{"sources":206},[207],"skills-sh",{"closedIssues90d":8,"forks":209,"license":183,"openIssues90d":210,"pushedAt":211,"readmeSize":212,"stars":213,"topics":214},27,2,1769868236000,29630,98,[],{"discoverAt":216,"extractAt":217,"githubAt":217,"updatedAt":217},1777995558409,1778053155657,{"anyEnrichmentAt":219,"extractAt":220,"githubAt":219,"llmAt":186,"updatedAt":186},1778053151766,1778053148350,[],[223,242,263,282,309,331],{"_creationTime":224,"_id":225,"community":226,"display":227,"identity":234,"providers":236,"relations":240,"workflow":241},1778053148350.4768,"k17c4t5g480bzq5t7qrjgbjsys867fb5",{"reviewCount":8},{"description":10,"installMethods":228,"name":229,"sourceUrl":230,"tags":231},{},"Table Extractor","https://github.com/claude-office-skills/skills/tree/HEAD/table-extractor",[19,16,232,233,15],"table","camelot",{"githubOwner":188,"githubRepo":189,"locale":24,"slug":235,"type":191},"table-extractor",{"extract":237,"llm":238},{"commitSha":196,"license":183},{"promptVersionExtension":167,"promptVersionScoring":168,"score":239,"targetMarket":173,"tier":174},92,{"repoId":199},{"anyEnrichmentAt":219,"extractAt":220,"githubAt":219,"llmAt":186,"updatedAt":186},{"_creationTime":243,"_id":244,"community":245,"display":246,"identity":255,"providers":257,"relations":261,"workflow":262},1778053148350.4373,"k1776t2fdx4h35mkwpc5h201dd866zms",{"reviewCount":8},{"description":10,"installMethods":247,"name":248,"sourceUrl":249,"tags":250},{},"Document Parser Skill","https://github.com/claude-office-skills/skills/tree/HEAD/doc-parser",[15,251,19,20,252,16,253,254],"document-processing","ocr","layout-analysis","docling",{"githubOwner":188,"githubRepo":189,"locale":24,"slug":256,"type":191},"doc-parser",{"extract":258,"llm":259},{"commitSha":196,"license":183},{"promptVersionExtension":167,"promptVersionScoring":168,"score":239,"targetMarket":173,"tier":260},"verified",{"repoId":199},{"anyEnrichmentAt":219,"extractAt":220,"githubAt":219,"llmAt":186,"updatedAt":186},{"_creationTime":264,"_id":265,"community":266,"display":267,"identity":275,"providers":277,"relations":280,"workflow":281},1778053148350.4656,"k171nxqak0bb4qq89mkfwf02s5867cf6",{"reviewCount":8},{"description":268,"installMethods":269,"name":270,"sourceUrl":271,"tags":272},"Convert PDF files to editable Word documents using pdf2docx",{},"PDF to DOCX Converter","https://github.com/claude-office-skills/skills/tree/HEAD/pdf-to-docx",[19,273,274,251,20],"docx","conversion",{"githubOwner":188,"githubRepo":189,"locale":24,"slug":276,"type":191},"pdf-to-docx",{"extract":278,"llm":279},{"commitSha":196,"license":183},{"promptVersionExtension":167,"promptVersionScoring":168,"score":213,"targetMarket":173,"tier":260},{"repoId":199},{"anyEnrichmentAt":219,"extractAt":220,"githubAt":219,"llmAt":186,"updatedAt":186},{"_creationTime":283,"_id":284,"community":285,"display":286,"identity":294,"providers":298,"relations":303,"workflow":305},1778053339109.673,"k170fjdnm4zmjtz1rgs8zwq4418663pv",{"reviewCount":8},{"description":287,"installMethods":288,"name":289,"sourceUrl":290,"tags":291},"Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.",{},"PaddleOCR Document Parsing","https://github.com/aidenwu0209/paddleocr-skills/tree/HEAD/skills/paddleocr-doc-parsing",[19,292,252,253,293,20],"document-parsing","paddleocr",{"githubOwner":295,"githubRepo":296,"locale":24,"slug":297,"type":191},"aidenwu0209","paddleocr-skills","paddleocr-doc-parsing",{"extract":299,"llm":302},{"commitSha":300,"license":301},"ca41406b66e5a475f43b073a5b731dfd1b9c50b1","Apache-2.0",{"promptVersionExtension":167,"promptVersionScoring":168,"score":213,"targetMarket":173,"tier":260},{"repoId":304},"kd7b1t00prnctc7258swvw0hs5865sjq",{"anyEnrichmentAt":306,"extractAt":307,"githubAt":306,"llmAt":308,"updatedAt":308},1778053339393,1778053339109,1778053352237,{"_creationTime":310,"_id":311,"community":312,"display":313,"identity":323,"providers":325,"relations":329,"workflow":330},1778053148350.4636,"k171dtxahnz3h8q0jz3gk6akks867ym1",{"reviewCount":8},{"description":314,"installMethods":315,"name":316,"sourceUrl":317,"tags":318},"Extract text, tables, and metadata from PDFs using pdfplumber",{},"PDF Extraction","https://github.com/claude-office-skills/skills/tree/HEAD/pdf-extraction",[19,16,319,320,321,322,251],"text","tables","metadata","pdfplumber",{"githubOwner":188,"githubRepo":189,"locale":24,"slug":324,"type":191},"pdf-extraction",{"extract":326,"llm":327},{"commitSha":196,"license":183},{"promptVersionExtension":167,"promptVersionScoring":168,"score":328,"targetMarket":173,"tier":260},95,{"repoId":199},{"anyEnrichmentAt":219,"extractAt":220,"githubAt":219,"llmAt":186,"updatedAt":186},{"_creationTime":332,"_id":333,"community":334,"display":335,"identity":345,"providers":347,"relations":350,"workflow":351},1778053148350.4265,"k171agyyd8nv26rt447dvhy0998669wm",{"reviewCount":8},{"description":336,"installMethods":337,"name":338,"sourceUrl":339,"tags":340},"Answer questions about PDF content, summarize, and extract information",{},"Chat with PDF","https://github.com/claude-office-skills/skills/tree/HEAD/chat-with-pdf",[19,341,16,342,343,344],"qa","summarization","mcp","documentation",{"githubOwner":188,"githubRepo":189,"locale":24,"slug":346,"type":191},"chat-with-pdf",{"extract":348,"llm":349},{"commitSha":196,"license":183},{"promptVersionExtension":167,"promptVersionScoring":168,"score":328,"targetMarket":173,"tier":260},{"repoId":199},{"anyEnrichmentAt":219,"extractAt":220,"githubAt":219,"llmAt":186,"updatedAt":186}]