Polaris AI DataInsight — Doc Extract Skill
Skill Verified ActiveExtract structured data from Office documents (DOCX, PPTX, XLSX, HWP, HWPX) using the Polaris AI DataInsight Doc Extract API. Use when the user wants to parse, analyze, or extract text, tables, charts, images, or shapes from document files. Invoke this skill whenever the user mentions extracting content from Word, PowerPoint, Excel, HWP, or HWPX files, wants to parse document structure, needs to convert document data for RAG pipelines, or asks about reading tables, charts, or text from Office-format documents — even if they don't explicitly mention "DataInsight" or "Polaris".
To enable users to parse, analyze, and extract structured content from various Office document types using a specialized API.
Features
- Extract text, tables, charts, images, shapes, equations from documents
- Supports DOCX, PPTX, XLSX, HWP, HWPX file formats
- Provides data in a structured `unifiedSchema` JSON format
- Handles API authentication and response parsing
Use Cases
- Parse and analyze content from Office documents
- Convert document data for RAG pipelines
- Extract tables and charts into structured formats (CSV, JSON)
- Automate data extraction from document files
Non-Goals
- Editing or modifying documents
- Extracting data from file formats not listed
- Replacing the Polaris AI DataInsight API itself
Workflow
- Authenticate with Polaris DataInsight API using the provided API key.
- Upload the target document file (DOCX, PPTX, XLSX, HWP, HWPX) via multipart/form-data POST.
- Receive a ZIP file response from the API.
- Extract the ZIP file and load the contained `unifiedSchema` JSON.
- Return the structured JSON data, organized by page and element type.
Prerequisites
- Polaris AI DataInsight API Key (stored in POLARIS_DATAINSIGHT_API_KEY environment variable)
Execution
- info:Pinned dependenciesThe Python script relies on standard libraries, but specific versions are not pinned, which could lead to compatibility issues.
Installation
npx skills add jacob-g-park/polaris-datainsight-doc-extractRuns the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.
Quality Score
VerifiedTrust Signals
Similar Extensions
Website Extraction Api
100Extract typed JSON from public website pages using a schema.
Extract Supplier Catalog From Website
100Extract SKUs, product names, unit prices, availability, and minimum order quantities from a supplier catalog page.
Extract Real Estate Listing
100Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.
Extract Fleet Vehicle Registration
100Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.
Extract Receipt Data
99Extract merchant, date, line items, tax, and total from receipts.
Extract Property Appraisal
99Extract appraised value, property details, and comparable sales from a property appraisal report into structured JSON.