此内容尚未提供您的语言版本,正在以英文显示。

Extract Article Text

技能活跃

属于:Iterationlayer

Extract clean article content — title, author, date, and body text — from PDFs, Word docs, and web pages.

目的

To extract clean, structured article content from various document formats for use in content aggregation, summarization, or knowledge base ingestion.

功能

Extracts title, author, and publication date
Extracts main body text, ignoring headers/footers
Supports PDFs, Word docs, and web pages
Defines custom extraction schema for specific fields

使用场景

Processing articles for a newsletter platform
Ingesting research papers into a knowledge base
Extracting key information from legal documents
Content summarization workflows

非目标

Analyzing or interpreting the extracted content
Modifying or generating documents
Performing OCR on image-based documents

Security

warning:Secret ManagementThe API key is documented as 'YOUR_API_KEY' in code examples and prompts, but there is no clear guidance on how to securely manage this key, such as using environment variables or a secrets manager.

Compliance

info:GDPRWhile the skill extracts content from documents, it does not explicitly operate on personal data without sanitization. However, the LLM processing of extracted content may involve personal data.

Practical Utility

info:Edge casesWhile the API schema defines required fields and max lengths, the SKILL.md does not explicitly document failure modes for specific edge cases like malformed inputs or API rate limits.

安装

请先添加 Marketplace

/plugin marketplace add iterationlayer/skills

/plugin install skills@iterationlayer-skills

质量评分

95 /100

about 2 months ago 分析

信任信号

最近提交2 months ago

GitHub 所有者 iterationlayer

星标0

许可证MIT

网站iterationlayer.com

状态

查看源代码

类似扩展

Extract Fleet Vehicle Registration

Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.

Eyeball

Document analysis with inline source screenshots. When you ask Copilot to analyze a document, Eyeball generates a Word doc where every factual claim includes a highlighted screenshot from the source material so you can verify it with your own eyes.

Document Extraction API

Extract structured data from documents using AI-powered field extraction.

Chatgpt Search

Search ChatGPT and extract the full response + hydration JSON that powers the UI. Attaches to a running Chrome instance (port 9222 by default), opens ChatGPT, submits a query, waits for the streamed response, and returns structured data: messages, product cards, hydration JSON, and API calls. Use when asked to "search chatgpt", "ask chatgpt", "chatgpt search", "get chatgpt response", or "scrape chatgpt".

Website Extraction Api

Extract typed JSON from public website pages using a schema.

Extract Supplier Catalog From Website

Extract SKUs, product names, unit prices, availability, and minimum order quantities from a supplier catalog page.