Convert Document For Rag Ingestion
技能 活跃Convert a document to clean markdown suitable for chunking and embedding in a RAG pipeline.
To enable AI teams to efficiently prepare documents for RAG pipelines by converting them into a clean, chunkable markdown format.
功能
- Convert documents to markdown
- Suitable for RAG chunking and embedding
- API integration via SDKs and curl
- Example workflows for n8n
使用场景
- Preparing PDF documents for a RAG knowledge base
- Converting technical manuals into clean markdown for LLM processing
- Automating document cleaning for embedding pipelines
非目标
- Performing complex text analysis beyond format conversion
- Acting as a general-purpose document editor
- Interacting with local file systems directly
Security
- warning:Secret ManagementThe API key is expected to be provided via an environment variable or directly in code examples, with no explicit guidance on secure secret management practices.
安装
请先添加 Marketplace
/plugin marketplace add iterationlayer/skills/plugin install skills@iterationlayer-skills质量评分
类似扩展
Convert Resume to Markdown
100Convert a resume PDF to clean markdown for LLM parsing or candidate pipelines.
Extract Fleet Vehicle Registration
100Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.
Document Extraction API
99Extract structured data from documents using AI-powered field extraction.
Convert Contract To Markdown
99Convert a contract PDF to clean markdown for clause extraction or LLM analysis.
Polaris AI DataInsight — 文档提取技能
99使用 Polaris AI DataInsight Doc Extract API 从 Office 文档(DOCX、PPTX、XLSX、HWP、HWPX)中提取结构化数据。当用户想要解析、分析或提取文档文件中的文本、表格、图表、图像或形状时使用。每当用户提及从 Word、PowerPoint、Excel、HWP 或 HWPX 文件提取内容、想要解析文档结构、需要为 RAG 管道转换文档数据,或询问有关读取 Office 格式文档中的表格、图表或文本时,都可以调用此技能 — 即使他们没有明确提到“DataInsight”或“Polaris”。
PaddleOCR 文档解析
99使用此技能可从 PDF 和文档图像中提取结构化 Markdown/JSON — 表格(精确到单元格)、公式(LaTeX 格式)、图形、印章、图表、页眉/页脚、多栏布局和正确的阅读顺序。触发词:文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.