Ray Data
Skill Verifiziert AktivScalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.
To enable efficient and scalable data processing for machine learning workloads, facilitating batch inference, data preprocessing, multi-modal data loading, and distributed ETL pipelines.
Funktionen
- Scalable data processing for ML workloads
- Streaming execution across CPU/GPU
- Support for Parquet, CSV, JSON, and image formats
- Integration with Ray Train, PyTorch, and TensorFlow
- Scales from single machine to hundreds of nodes
Anwendungsfälle
- Processing large datasets (>100GB) for ML training
- Distributed data preprocessing across a cluster
- Building batch inference pipelines
- Loading multi-modal data (images, audio, video)
Nicht-Ziele
- Processing small data (<1GB) on a single machine (use Pandas)
- SQL-like operations on tabular data (use Dask or Spark)
- Enterprise ETL and complex SQL queries (use Spark)
Trust
- info:Issues AttentionThe repository shows 17 issues opened in the last 90 days and 4 closed, indicating a closure rate below 50%, but the number of open issues is relatively low.
Compliance
- info:GDPRThe skill processes datasets which may contain personal data, and while it doesn't submit this data to third parties, it doesn't include specific sanitization steps before potential LLM interaction.
Practical Utility
- info:Edge casesWhile the documentation covers common transformations and integrations, explicit documentation of failure modes and recovery steps for edge cases (e.g., malformed input, rate limits) is not detailed.
Installation
npx skills add davila7/claude-code-templatesFührt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.
Qualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Ray Data
95Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.
TimesFM Forecasting
100Zero-shot time series forecasting with Google's TimesFM foundation model. Use for any univariate time series (sales, sensors, energy, vitals, weather) without training a custom model. Supports CSV/DataFrame/array inputs with point forecasts and prediction intervals. Includes a preflight system checker script to verify RAM/GPU before first use.
PyTDC (Therapeutics Data Commons)
99Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.
Polars
99Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
Spark Engineer
99Use when writing Spark jobs, debugging performance issues, or configuring cluster settings for Apache Spark applications, distributed data processing pipelines, or big data workloads. Invoke to write DataFrame transformations, optimize Spark SQL queries, implement RDD pipelines, tune shuffle operations, configure executor memory, process .parquet files, handle data partitioning, or build structured streaming analytics.
Build Feature Store
99Build a feature store using Feast for centralized feature management, configure offline and online stores for batch and real-time serving, define feature views with transformations, and implement point-in-time correct joins for ML pipelines. Use when managing features for multiple ML models, ensuring training-serving consistency, serving low-latency features for real-time inference, reusing feature definitions across projects, or building a feature catalog for discovery and governance.