Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Spark Engineer

Skill Verifiziert Aktiv

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

Zweck

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads.

Funktionen

  • Write DataFrame transformations and RDD pipelines
  • Optimize Spark SQL queries and performance
  • Tune shuffle operations and executor memory
  • Handle data partitioning and caching strategies
  • Build structured streaming analytics

Anwendungsfälle

  • Developing high-performance Spark jobs
  • Debugging distributed data processing bottlenecks
  • Configuring Spark cluster settings for optimal resource utilization
  • Implementing advanced data partitioning and caching techniques

Nicht-Ziele

  • Writing general Python or Scala code
  • Configuring Hadoop or other distributed systems (beyond Spark's interaction)
  • Providing generic data analysis without Spark context

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add jeffallan/claude-skills
/plugin install claude-skills@fullstack-dev-skills

Qualitätspunktzahl

Verifiziert
99 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit13 days ago
Sterne9k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Spark Optimization

99

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Skill
wshobson

Dask Data Science

99

Part of the AlterLab Academic Skills suite. Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

Skill
AlterLab-IEU

Senior Data Engineer

95

Data engineering skill for building scalable data pipelines, ETL/ELT systems, and data infrastructure. Expertise in Python, SQL, Spark, Airflow, dbt, Kafka, and modern data stack. Includes data modeling, pipeline orchestration, data quality, and DataOps. Use when designing data architectures, building data pipelines, optimizing data workflows, implementing data governance, or troubleshooting data issues.

Skill
alirezarezvani

Ray Data

95

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

Skill
Orchestra-Research

Ray Data

95

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

Skill
davila7

Data Engineer

94

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.

Skill
davila7