Convert Document For Rag Ingestion

Skill Active

Convert a document to clean markdown suitable for chunking and embedding in a RAG pipeline.

Purpose

To enable AI teams to efficiently prepare documents for RAG pipelines by converting them into a clean, chunkable markdown format.

Features

Convert documents to markdown
Suitable for RAG chunking and embedding
API integration via SDKs and curl
Example workflows for n8n

Use Cases

Preparing PDF documents for a RAG knowledge base
Converting technical manuals into clean markdown for LLM processing
Automating document cleaning for embedding pipelines

Non-Goals

Performing complex text analysis beyond format conversion
Acting as a general-purpose document editor
Interacting with local file systems directly

Security

warning:Secret ManagementThe API key is expected to be provided via an environment variable or directly in code examples, with no explicit guidance on secure secret management practices.

Installation

First, add the marketplace

/plugin marketplace add iterationlayer/skills

/plugin install skills@iterationlayer-skills

Quality Score

97 /100

Analyzed about 22 hours ago

Trust Signals

Last commit16 days ago

GitHub owner iterationlayer

Stars0

LicenseMIT

Websiteiterationlayer.com

Status

View Source

Similar Extensions

Convert Resume to Markdown

100

Convert a resume PDF to clean markdown for LLM parsing or candidate pipelines.

Skill

iterationlayer

Extract Fleet Vehicle Registration

100

Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.

Skill

iterationlayer

Document Extraction API

Extract structured data from documents using AI-powered field extraction.

Skill

iterationlayer

Convert Contract To Markdown

Convert a contract PDF to clean markdown for clause extraction or LLM analysis.

Skill

iterationlayer

Polaris AI DataInsight — Doc Extract Skill

Extract structured data from Office documents (DOCX, PPTX, XLSX, HWP, HWPX) using the Polaris AI DataInsight Doc Extract API. Use when the user wants to parse, analyze, or extract text, tables, charts, images, or shapes from document files. Invoke this skill whenever the user mentions extracting content from Word, PowerPoint, Excel, HWP, or HWPX files, wants to parse document structure, needs to convert document data for RAG pipelines, or asks about reading tables, charts, or text from Office-format documents — even if they don't explicitly mention "DataInsight" or "Polaris".

Skill

jacob-g-park

PaddleOCR Document Parsing

Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.

Skill

PaddlePaddle