Senior Data Engineer
Skill Verified ActiveData engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.
To empower users with the tools and knowledge needed to design, build, and optimize robust data pipelines and infrastructure.
Features
- Build scalable data pipelines
- Develop ETL/ELT systems
- Orchestrate data workflows
- Implement data quality frameworks
- Optimize data infrastructure performance
Use Cases
- Designing data architectures
- Building robust data pipelines
- Optimizing data workflow performance
- Implementing data governance and quality checks
Non-Goals
- Real-time data analysis
- Machine learning model deployment
- Application development
Documentation
- info:Configuration & parameter referenceWhile the SKILL.md provides example CLI commands and code snippets, explicit documentation for all configuration options, parameters, and their precedence is not detailed.
Code Execution
- info:ValidationInput validation and sanitization are present in tools like `etl_performance_optimizer.py` and `data_quality_validator.py` through argument parsing and schema checks, but a formal validation library like Zod or Pydantic is not explicitly demonstrated across all scripts.
- info:LoggingThe Python scripts utilize the `logging` module for messages, but there is no explicit mention or implementation of a persistent audit log file for tracking destructive actions or outbound calls.
Scope
- info:Tool surface sizeThe repository contains multiple large Python scripts and numerous reference markdown files, suggesting a broad surface area, though individual tools seem focused.
Errors
- info:Actionable error messagesThe CLI tools provide basic error messages, and scripts include logging, but detailed remediation steps or links for every error path are not consistently provided.
Practical Utility
- info:Edge casesWhile the code handles some errors and provides usage patterns, explicit documentation on failure modes (e.g., malformed input, missing dependencies) with recovery steps is limited.
Installation
First, add the marketplace
/plugin marketplace add alirezarezvani/claude-skills/plugin install engineering-team@claude-code-skillsQuality Score
VerifiedTrust Signals
Similar Extensions
Data Engineer
94Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.
Spark Engineer
99Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.
Dbt Transformation Patterns
98Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Data Quality Frameworks
97Implement data quality validation with Great Expectations, dbt tests, and data contracts. Use when building data quality pipelines, implementing validation rules, or establishing data contracts.
Data Quality Auditor
97Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan.
Airflow Dag Patterns
95Build production Apache Airflow DAGs with best practices for operators, sensors, testing, and deployment. Use when creating data pipelines, orchestrating workflows, or scheduling batch jobs.