Skip to main content

Polaris AI DataInsight — Doc Extract Skill

Skill Verified Active

Extract structured data from Office documents (DOCX, PPTX, XLSX, HWP, HWPX) using the Polaris AI DataInsight Doc Extract API. Use when the user wants to parse, analyze, or extract text, tables, charts, images, or shapes from document files. Invoke this skill whenever the user mentions extracting content from Word, PowerPoint, Excel, HWP, or HWPX files, wants to parse document structure, needs to convert document data for RAG pipelines, or asks about reading tables, charts, or text from Office-format documents — even if they don't explicitly mention "DataInsight" or "Polaris".

Purpose

To enable users to parse, analyze, and extract structured content from various Office document types using a specialized API.

Features

  • Extract text, tables, charts, images, shapes, equations from documents
  • Supports DOCX, PPTX, XLSX, HWP, HWPX file formats
  • Provides data in a structured `unifiedSchema` JSON format
  • Handles API authentication and response parsing

Use Cases

  • Parse and analyze content from Office documents
  • Convert document data for RAG pipelines
  • Extract tables and charts into structured formats (CSV, JSON)
  • Automate data extraction from document files

Non-Goals

  • Editing or modifying documents
  • Extracting data from file formats not listed
  • Replacing the Polaris AI DataInsight API itself

Workflow

  1. Authenticate with Polaris DataInsight API using the provided API key.
  2. Upload the target document file (DOCX, PPTX, XLSX, HWP, HWPX) via multipart/form-data POST.
  3. Receive a ZIP file response from the API.
  4. Extract the ZIP file and load the contained `unifiedSchema` JSON.
  5. Return the structured JSON data, organized by page and element type.

Prerequisites

  • Polaris AI DataInsight API Key (stored in POLARIS_DATAINSIGHT_API_KEY environment variable)

Execution

  • info:Pinned dependenciesThe Python script relies on standard libraries, but specific versions are not pinned, which could lead to compatibility issues.

Installation

npx skills add jacob-g-park/polaris-datainsight-doc-extract

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
99 /100
Analyzed about 22 hours ago

Trust Signals

Last commit3 months ago
Stars2
LicenseApache-2.0
Status
View Source

© 2025 SkillRepo · Find the right skill, skip the noise.