跳转到主要内容
此内容尚未提供您的语言版本,正在以英文显示。

Dask Data Science

技能 已验证 活跃

Part of the AlterLab Academic Skills suite. Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

目的

To provide an expert assistant for scaling data science workflows using Dask, enabling users to process datasets that exceed single-machine memory or require parallel computation.

功能

  • Distributed computing for pandas/NumPy
  • Larger-than-memory data processing
  • Parallel file processing
  • Integration with existing pandas/NumPy code
  • Scales from laptops to clusters

使用场景

  • Scaling pandas operations to larger datasets
  • Parallelizing computations for performance
  • Processing multiple files efficiently (CSVs, Parquet, JSON)
  • Distributing workloads across multiple cores or machines

非目标

  • Out-of-core analytics on a single machine (use vaex)
  • In-memory speed optimization (use polars)
  • Replacing core pandas/NumPy functionality for in-memory data

工作流

  1. Load data using Dask's parallel readers (read_csv, read_parquet)
  2. Perform operations (filtering, transformations, aggregations) on Dask DataFrames, Arrays, or Bags
  3. Leverage Dask's lazy evaluation and task graph construction
  4. Trigger computation with .compute() or dask.compute()
  5. Optimize performance through chunking, persist, and scheduler selection
  6. Save results or convert to pandas for final analysis

安装

npx skills add AlterLab-IEU/AlterLab-Academic-Skills

通过 npx 运行 Vercel skills CLI(skills.sh)— 需要本地安装 Node.js,以及至少一个兼容 skills 的智能体(Claude Code、Cursor、Codex 等)。前提是仓库遵循 agentskills.io 格式。

质量评分

已验证
99 /100
1 day ago 分析

信任信号

最近提交17 days ago
星标15
许可证MIT
状态
查看源代码

类似扩展

AlterLab Zarr

99

Part of the AlterLab Academic Skills suite. Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

技能
AlterLab-IEU

Dask

98

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

技能
K-Dense-AI

Spark Engineer

99

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

技能
jeffallan

Zarr Python

97

Chunked N-D arrays for cloud storage. Compressed arrays, parallel I/O, S3/GCS integration, NumPy/Dask/Xarray compatible, for large-scale scientific computing pipelines.

技能
K-Dense-AI

OraClaw Forecast

100

AI 代理的时间序列预测。ARIMA 和 Holt-Winters 预测(含置信区间)。预测收入、流量、价格或任何序列数据。推理延迟低于 5 毫秒。

技能
Whatsonyourmind

SHAP Model Interpretability

100

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model.

技能
K-Dense-AI