PDF Processing OpenAI

Skill Verified Active

Part of:Pdf Processing Openai

Toolkit for comprehensive PDF reading, reviwing, and creation with visual quality control. Use to work with PDFs (.pdf files) for: (1) Reading or extracting content from existing PDFs, (2) Creating new PDF documents with professional formatting, (3) Generating reports, documents, or layouts that require precise typography and design, or any other PDF reading or generation tasks.

Purpose

To enable AI agents to perform comprehensive PDF reading, reviewing, and creation tasks with a focus on visual quality control and professional formatting.

Features

Reading and extracting content from existing PDFs
Creating new PDF documents with formatting
Generating reports and layouts with precise typography
Visual quality control of rendered PDF pages
Programmatic PDF generation using reportlab

Use Cases

Use to extract text and data from PDFs when layout fidelity is important.
Use to programmatically generate professional-looking PDF documents for reports or proposals.
Use to review and validate the visual appearance of generated PDFs before delivery.
Use for any task requiring nuanced PDF content manipulation or generation.

Non-Goals

Editing existing PDF content directly (focus is on creation and extraction).
Handling highly complex interactive PDF forms beyond basic content.
Providing a GUI or visual editor; operates via LLM prompts and scripts.

Workflow

Render PDF pages to PNGs for visual inspection, using pdftoppm if available.
Use reportlab for programmatic PDF creation.
Employ pdfplumber or pypdf for text extraction and quick checks.
Re-render pages after updates to verify alignment, spacing, and legibility.
Clean up or remove intermediate files after final approval.

Prerequisites

Python 3 environment
Poppler utils (for rendering PDFs to PNGs)
uv or pip for Python package management

Code Execution

info:ValidationThe skill relies on external tools for input handling and does not explicitly mention schema validation libraries within its own instructions.
info:Error HandlingThe skill mentions telling the user about missing dependencies if installation fails, but doesn't specify structured error reporting for other operational failures.
info:LoggingThe skill mentions deleting intermediate files and keeping final artifacts organized, implying some level of state management but no explicit audit logging.

Errors

info:Actionable error messagesThe skill mentions informing the user about missing dependencies, but lacks detailed error handling for other potential operational failures.

Execution

info:Pinned dependenciesWhile Python dependencies are listed, explicit pinning via a lockfile (like `uv.lock` or `poetry.lock`) is not mentioned or evident. System tool installation commands are standard but not pinned to specific versions.

Practical Utility

info:Usage examplesThe SKILL.md provides dependency installation commands and a rendering command example, but lacks end-to-end examples demonstrating input, invocation, and output for the core PDF processing tasks.
info:Edge casesThe skill mentions handling missing dependencies as a failure mode and recovery step, but does not explicitly detail other edge cases for PDF processing (e.g., corrupted files, complex layouts).

Installation

First, add the marketplace

/plugin marketplace add lawvable/awesome-legal-skills

/plugin install pdf-processing-openai@lawvable

Quality Score

Verified

85 /100

Analyzed about 13 hours ago

Trust Signals

Last commit2 months ago

GitHub owner lawvable

Stars349

LicenseApache-2.0

Websitelawvable.com

Status

Similar Extensions

Extract Fleet Vehicle Registration

Extract vehicle identification, owner details, registration dates, and technical specifications from vehicle registration documents.

Convert Resume to Markdown

Convert a resume PDF to clean markdown for LLM parsing or candidate pipelines.

Aws Cdk Development

AWS Cloud Development Kit (CDK) expert for building cloud infrastructure with TypeScript/Python. Use when creating CDK stacks, defining CDK constructs, implementing infrastructure as code, or when the user mentions CDK, CloudFormation, IaC, cdk synth, cdk deploy, or wants to define AWS infrastructure programmatically. Covers CDK app structure, construct patterns, stack composition, and deployment workflows.

Cleanup Cycles

Detect and untangle circular dependencies. Runs madge/skott (TS), pycycle (Py), or compiler-only checks (Go/Rust). Auto-fixes leaf-extractable cycles; reports core cycles for human review. Use when the user asks to find circular imports, fix dependency cycles, or untangle module graph. Example queries — "find circular imports", "fix dependency cycles", "untangle our module graph", "why is madge complaining".

raintree-technology

Document Extraction API

Extract structured data from documents using AI-powered field extraction.

Convert Contract To Markdown

Convert a contract PDF to clean markdown for clause extraction or LLM analysis.