Convert Website To Markdown For Rag
Skill ActiveExtract clean title, summary, markdown sections, and source metadata from a public documentation page for RAG ingestion.
To efficiently transform public documentation web pages into clean, structured markdown data suitable for ingestion into Retrieval Augmented Generation (RAG) systems.
Features
- Extracts page title, canonical URL, and summary
- Parses markdown sections with headings
- Extracts last updated date metadata
- Provides multiple language SDK examples for integration
Use Cases
- Preparing website documentation for RAG model training
- Ingesting technical docs into a knowledge base
- Automating the conversion of web content for AI processing
- Extracting structured information from API reference pages
Non-Goals
- Processing local files or non-publicly accessible URLs
- Performing complex analysis or summarization beyond data extraction
- Modifying the content of the website pages
Versioning
- warning:Release ManagementThere is no explicit versioning in the SKILL.md frontmatter or GitHub releases, and installation instructions reference 'main'.
Installation
First, add the marketplace
/plugin marketplace add iterationlayer/skills/plugin install skills@iterationlayer-skillsQuality Score
Trust Signals
Similar Extensions
Chatgpt Search
100Search ChatGPT and extract the full response + hydration JSON that powers the UI. Attaches to a running Chrome instance (port 9222 by default), opens ChatGPT, submits a query, waits for the streamed response, and returns structured data: messages, product cards, hydration JSON, and API calls. Use when asked to "search chatgpt", "ask chatgpt", "chatgpt search", "get chatgpt response", or "scrape chatgpt".
Website Extraction Api
100Extract typed JSON from public website pages using a schema.
Extract Supplier Catalog From Website
100Extract SKUs, product names, unit prices, availability, and minimum order quantities from a supplier catalog page.
Extract Real Estate Listing
100Extract property address, price, room count, and features from a listing document into structured JSON for MLS and property platforms.
Extract Public Registry Page
100Extract organization name, registration number, status, registration date, and officers from a public registry page.
Extract Legal Invoice Data
100Extract timekeeper entries, disbursements, matter references, and billing summaries from law firm invoices.