Data Engineering

Plugin Verified Active

ETL pipeline construction, data warehouse design, batch processing workflows, and data-driven feature development

4 Skills 0 MCPs

Purpose

To equip users with the knowledge and patterns required to build robust, scalable, and efficient data engineering solutions, from initial pipeline design to data-driven feature implementation.

Features

ETL pipeline construction patterns
Data warehouse and lakehouse design guidance
Batch and streaming data processing workflows
Data-driven feature development orchestration
Spark, dbt, and Airflow optimization techniques
Data quality framework implementation
API design and integration for data systems

Use Cases

Building a new data pipeline from scratch
Optimizing existing slow Spark jobs
Implementing robust data quality checks
Designing a modern data warehouse schema
Orchestrating complex batch processing workflows

Non-Goals

Providing a fully automated, one-click data pipeline generator
Replacing specialized data engineering tools directly
Offering real-time data visualization dashboards

Compliance

info:GDPRWhile the skills focus on data pipeline construction and do not directly handle PII, the data quality patterns mention GDPR compliance, implying awareness but no explicit sanitization mechanisms are detailed.

Installation

First, add the marketplace

/plugin marketplace add wshobson/agents

/plugin install data-engineering@claude-code-workflows

Contains 4 extensions

Skill (4)

Airflow Dag Patterns Skill

Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.

Data Quality Frameworks Skill

Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.

Dbt Transformation Patterns Skill

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

Spark Optimization Skill

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Quality Score

Verified

98 /100

Analyzed about 23 hours ago

Trust Signals

Last commit3 days ago

GitHub owner wshobson

Stars35.3k

LicenseMIT

Websitesethhobson.com

Status

View Source

Similar Extensions

Snowflake Development

Snowflake SQL, data pipelines (Dynamic Tables, Streams+Tasks), Cortex AI functions, Snowpark Python, and dbt integration. Includes query helper script, 3 reference guides, and troubleshooting.

Plugin

alirezarezvani

Voltagent Data Ai

Data engineering, ML, and AI specialists - data pipelines, machine learning, LLM architecture

Plugin

VoltAgent

Mongodb

100

Official Claude plugin for MongoDB (MCP Server + Skills). Connect to databases, explore data, manage collections, optimize queries, generate reliable code, implement best practices, develop advanced features, and more.

Plugin

mongodb

Autoresearch Agent

100

Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).

Plugin

alirezarezvani

Train Sentence Transformers

Train or fine-tune sentence-transformers models across all three architectures: SentenceTransformer (bi-encoder embeddings), CrossEncoder (rerankers), and SparseEncoder (SPLADE). Covers loss selection, hard-negative mining, evaluators, distillation, LoRA, Matryoshka, and Hugging Face Hub publishing.

Plugin

huggingface

Build with Claude

Agents for data engineering, machine learning, and AI development

Plugin

davepoon