Convert Website To Markdown For Rag
技能 活跃Extract clean title, summary, markdown sections, and source metadata from a public documentation page for RAG ingestion.
To efficiently transform public documentation web pages into clean, structured markdown data suitable for ingestion into Retrieval Augmented Generation (RAG) systems.
功能
- Extracts page title, canonical URL, and summary
- Parses markdown sections with headings
- Extracts last updated date metadata
- Provides multiple language SDK examples for integration
使用场景
- Preparing website documentation for RAG model training
- Ingesting technical docs into a knowledge base
- Automating the conversion of web content for AI processing
- Extracting structured information from API reference pages
非目标
- Processing local files or non-publicly accessible URLs
- Performing complex analysis or summarization beyond data extraction
- Modifying the content of the website pages
Versioning
- warning:Release ManagementThere is no explicit versioning in the SKILL.md frontmatter or GitHub releases, and installation instructions reference 'main'.
安装
请先添加 Marketplace
/plugin marketplace add iterationlayer/skills/plugin install skills@iterationlayer-skills质量评分
类似扩展
Chatgpt Search
100Search ChatGPT and extract the full response + hydration JSON that powers the UI. Attaches to a running Chrome instance (port 9222 by default), opens ChatGPT, submits a query, waits for the streamed response, and returns structured data: messages, product cards, hydration JSON, and API calls. Use when asked to "search chatgpt", "ask chatgpt", "chatgpt search", "get chatgpt response", or "scrape chatgpt".
Website Extraction Api
100Extract typed JSON from public website pages using a schema.
Extract Supplier Catalog From Website
100Extract SKUs, product names, unit prices, availability, and minimum order quantities from a supplier catalog page.
Extract Real Estate Listing
100Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.
Extract Public Registry Page
100Extract organization name, registration number, status, registration date, and officers from a public registry page.
Extract Legal Invoice Data
100Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.