Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Ray Data

Skill Verifiziert Aktiv

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

Zweck

To enable efficient and scalable data processing for machine learning workloads, facilitating batch inference, data preprocessing, multi-modal data loading, and distributed ETL pipelines.

Funktionen

  • Scalable data processing for ML workloads
  • Streaming execution across CPU/GPU
  • Support for Parquet, CSV, JSON, and image formats
  • Integration with Ray Train, PyTorch, and TensorFlow
  • Scales from single machine to hundreds of nodes

Anwendungsfälle

  • Processing large datasets (>100GB) for ML training
  • Distributed data preprocessing across a cluster
  • Building batch inference pipelines
  • Loading multi-modal data (images, audio, video)

Nicht-Ziele

  • Processing small data (<1GB) on a single machine (use Pandas)
  • SQL-like operations on tabular data (use Dask or Spark)
  • Enterprise ETL and complex SQL queries (use Spark)

Trust

  • info:Issues AttentionThe repository shows 17 issues opened in the last 90 days and 4 closed, indicating a closure rate below 50%, but the number of open issues is relatively low.

Compliance

  • info:GDPRThe skill processes datasets which may contain personal data, and while it doesn't submit this data to third parties, it doesn't include specific sanitization steps before potential LLM interaction.

Practical Utility

  • info:Edge casesWhile the documentation covers common transformations and integrations, explicit documentation of failure modes and recovery steps for edge cases (e.g., malformed input, rate limits) is not detailed.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
95 /100
Analysiert about 21 hours ago

Vertrauenssignale

Letzter Commitabout 22 hours ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Ray Data

95

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

Skill
Orchestra-Research

TimesFM Forecasting

100

Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.

Skill
K-Dense-AI

PyTDC (Therapeutics Data Commons)

99

Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.

Skill
K-Dense-AI

Polars

99

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

Skill
K-Dense-AI

Spark Engineer

99

Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.

Skill
jeffallan

Build Feature Store

99

Build a feature store using Feast for centralized feature management, configure offline and online stores for batch and real-time serving, define feature views with transformations, and implement point-in-time correct joins for ML pipelines. Use when managing features for multiple ML models, ensuring training-serving consistency, serving low-latency features for real-time inference, reusing feature definitions across projects, or building a feature catalog for discovery and governance.

Skill
pjt222