Data Engineering
Plugin Verified ActiveETL pipeline construction, data warehouse design, batch processing workflows, and data-driven feature development
To equip users with the knowledge and patterns required to build robust, scalable, and efficient data engineering solutions, from initial pipeline design to data-driven feature implementation.
Features
- ETL pipeline construction patterns
- Data warehouse and lakehouse design guidance
- Batch and streaming data processing workflows
- Data-driven feature development orchestration
- Spark, dbt, and Airflow optimization techniques
- Data quality framework implementation
- API design and integration for data systems
Use Cases
- Building a new data pipeline from scratch
- Optimizing existing slow Spark jobs
- Implementing robust data quality checks
- Designing a modern data warehouse schema
- Orchestrating complex batch processing workflows
Non-Goals
- Providing a fully automated, one-click data pipeline generator
- Replacing specialized data engineering tools directly
- Offering real-time data visualization dashboards
Compliance
- info:GDPRWhile the skills focus on data pipeline construction and do not directly handle PII, the data quality patterns mention GDPR compliance, implying awareness but no explicit sanitization mechanisms are detailed.
Installation
First, add the marketplace
/plugin marketplace add wshobson/agents/plugin install data-engineering@claude-code-workflowsContains 4 extensions
Skill (4)
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.
Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
Quality Score
VerifiedTrust Signals
Similar Extensions
Snowflake Development
98Snowflake SQL, data pipelines (Dynamic Tables, Streams+Tasks), Cortex AI functions, Snowpark Python, and dbt integration. Includes query helper script, 3 reference guides, and troubleshooting.
Voltagent Data Ai
97Data engineering, ML, and AI specialists - data pipelines, machine learning, LLM architecture
Mongodb
100Official Claude plugin for MongoDB (MCP Server + Skills). Connect to databases, explore data, manage collections, optimize queries, generate reliable code, implement best practices, develop advanced features, and more.
Autoresearch Agent
100Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Train Sentence Transformers
99Train or fine-tune sentence-transformers models across all three architectures: SentenceTransformer (bi-encoder embeddings), CrossEncoder (rerankers), and SparseEncoder (SPLADE). Covers loss selection, hard-negative mining, evaluators, distillation, LoRA, Matryoshka, and Hugging Face Hub publishing.
Build with Claude
96Agents for data engineering, machine learning, and AI development