Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Ray Data

Skill Verifiziert Aktiv

Teil von:Agent Native Research Artifact (ARA) Tooling

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.

Zweck

To enable scalable and efficient processing of large datasets for machine learning workloads, leveraging distributed computing and GPU acceleration.

Funktionen

Scalable data processing for ML workloads
Streaming execution across CPU/GPU
Support for Parquet/CSV/JSON/images
Integrates with Ray Train, PyTorch, TensorFlow
Scales from single machine to 100s of nodes

Anwendungsfälle

Batch inference pipelines
Distributed data preprocessing
Multi-modal data loading
Distributed ETL pipelines

Nicht-Ziele

Processing small datasets on a single machine (use Pandas)
Performing SQL-like operations on tabular data (use Dask/Spark)
Enterprise ETL and complex SQL queries (use Spark)

Workflow

Read data from various sources (cloud storage, Python objects).
Transform data using vectorized or row-by-row operations, filtering, or grouping.
Optionally accelerate transforms with GPUs.
Write processed data to various formats (Parquet, CSV, JSON).
Integrate with ML frameworks like PyTorch and TensorFlow for training.

Voraussetzungen

ray[data]
pyarrow
pandas

Code Execution

info:ValidationWhile the code demonstrates structured usage of Ray Data APIs, explicit mention or demonstration of schema validation libraries (like Zod or Pydantic) for input parameters is not evident.

Practical Utility

info:Edge casesWhile the documentation covers core operations and performance, explicit documentation of failure modes (e.g., malformed input, rate limits on cloud storage) and their recovery steps is not detailed.

Installation

Zuerst Marketplace hinzufügen

/plugin marketplace add Orchestra-Research/AI-Research-SKILLs

/plugin install AI-Research-SKILLs@ai-research-skills

Qualitätspunktzahl

Verifiziert

95 /100

Analysiert 1 day ago

Vertrauenssignale

Letzter Commit17 days ago

GitHub-Inhaber Orchestra-Research

Sterne8.3k

Downloads 0

LizenzMIT

Websiteorchestra-research.com

Status

Quellcode ansehen

Ray Data

Funktionen

Anwendungsfälle

Nicht-Ziele

Workflow

Voraussetzungen

Code Execution

Practical Utility

Qualitätspunktzahl

Vertrauenssignale

Ähnliche Erweiterungen

Ray Train

Ray Data

Polars

Spark Engineer

Openrlhf Training

Dask