Skip to main content

Spark Optimization

Skill Verified Active

Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Purpose

Optimize Apache Spark jobs by providing expert patterns and configurations for partitioning, memory management, shuffle optimization, and caching.

Features

  • Optimize Apache Spark jobs
  • Improve Spark performance
  • Debug slow Spark jobs
  • Scale data processing pipelines
  • Provide best practices for partitioning, caching, memory, and shuffle tuning

Use Cases

  • Optimizing slow Spark jobs
  • Tuning memory and executor configuration
  • Implementing efficient partitioning strategies
  • Debugging Spark performance issues
  • Scaling Spark pipelines for large datasets

Non-Goals

  • Running Spark jobs directly
  • Managing Spark cluster infrastructure
  • Providing a general-purpose Python coding assistant

Versioning

  • info:Release ManagementWhile there is no explicit versioning in the skill's frontmatter or CHANGELOG, the installation method refers to 'HEAD' and the code itself is updated frequently.

Installation

First, add the marketplace

/plugin marketplace add wshobson/agents
/plugin install data-engineering@claude-code-workflows

Quality Score

Verified
99 /100
Analyzed about 12 hours ago

Trust Signals

Last commit2 days ago
Stars35.3k
LicenseMIT
Status
View Source

Similar Extensions

Spark Engineer

99

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

Skill
jeffallan

Data Engineer

94

Build scalable data pipelines, modern data warehouses, and real-time streaming architectures. Implements Apache Spark, dbt, Airflow, and cloud-native data platforms.

Skill
davila7

Performance Analysis

100

Comprehensive performance analysis, bottleneck detection, and optimization recommendations for Claude Flow swarms

Skill
ruvnet

Oraclaw Solver

100

Industrial-grade scheduling and resource optimization for AI agents. Solve task scheduling with energy matching, budget allocation, and any LP/MIP constraint problem in milliseconds.

Skill
Whatsonyourmind

Oraclaw Decide

100

Decision intelligence for AI agents. Analyze options, map decision dependencies with PageRank, detect when information sources conflict, and find the choices that matter most.

Skill
Whatsonyourmind

MongoDB Connection Optimizer

100

Optimize MongoDB client connection configuration (pools, timeouts, patterns) for any supported driver language. Use this skill when working/updating/reviewing on functions that instantiate or configure a MongoDB client (eg, when calling `connect()`), configuring connection pools, troubleshooting connection errors (ECONNREFUSED, timeouts, pool exhaustion), optimizing performance issues related to connections. This includes scenarios like building serverless functions with MongoDB, creating API endpoints that use MongoDB, optimizing high-traffic MongoDB applications, creating long-running tasks and concurrency, or debugging connection-related failures.

Skill
mongodb

© 2025 SkillRepo · Find the right skill, skip the noise.