Data Engineering
插件 已验证 活跃ETL pipeline construction, data warehouse design, batch processing workflows, and data-driven feature development
To equip users with the knowledge and patterns required to build robust, scalable, and efficient data engineering solutions, from initial pipeline design to data-driven feature implementation.
功能
- ETL pipeline construction patterns
- Data warehouse and lakehouse design guidance
- Batch and streaming data processing workflows
- Data-driven feature development orchestration
- Spark, dbt, and Airflow optimization techniques
- Data quality framework implementation
- API design and integration for data systems
使用场景
- Building a new data pipeline from scratch
- Optimizing existing slow Spark jobs
- Implementing robust data quality checks
- Designing a modern data warehouse schema
- Orchestrating complex batch processing workflows
非目标
- Providing a fully automated, one-click data pipeline generator
- Replacing specialized data engineering tools directly
- Offering real-time data visualization dashboards
Compliance
- info:GDPRWhile the skills focus on data pipeline construction and do not directly handle PII, the data quality patterns mention GDPR compliance, implying awareness but no explicit sanitization mechanisms are detailed.
安装
请先添加 Marketplace
/plugin marketplace add wshobson/agents/plugin install data-engineering@claude-code-workflows包含 4 个扩展
Skill (4)
Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.
Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.
质量评分
已验证类似扩展
Snowflake Development
98Snowflake SQL, data pipelines (Dynamic Tables, Streams+Tasks), Cortex AI functions, Snowpark Python, and dbt integration. Includes query helper script, 3 reference guides, and troubleshooting.
Voltagent Data Ai
97数据工程、机器学习和人工智能专家 - 数据管道、机器学习、LLM 架构
MongoDB MCP Server + Skills
100官方 Claude 插件,用于 MongoDB (MCP Server + Skills)。连接到数据库,探索数据,管理集合,优化查询,生成可靠的代码,实施最佳实践,开发高级功能,等等。
Autoresearch Agent
100Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Train Sentence Transformers
99Train or fine-tune sentence-transformers models across all three architectures: SentenceTransformer (bi-encoder embeddings), CrossEncoder (rerankers), and SparseEncoder (SPLADE). Covers loss selection, hard-negative mining, evaluators, distillation, LoRA, Matryoshka, and Hugging Face Hub publishing.
Build with Claude
96Agents for data engineering, machine learning, and AI development