Chdb Datastore

Skill Verified Active

Drop-in pandas replacement with ClickHouse performance. Use `import chdb.datastore as pd` (or `from datastore import DataStore`) and write standard pandas code — same API, 10-100x faster on large datasets. Supports 16+ data sources (MySQL, PostgreSQL, S3, MongoDB, ClickHouse, Iceberg, Delta Lake, etc.) and 10+ file formats (Parquet, CSV, JSON, Arrow, ORC, etc.) with cross-source joins. Use this skill when the user wants to analyze data with pandas-style syntax, speed up slow pandas code, query remote databases or cloud storage as DataFrames, or join data across different sources — even if they don't explicitly mention chdb or DataStore. Do NOT use for raw SQL queries, ClickHouse server administration, or non-Python languages.

Purpose

To enable users to perform data analysis with familiar pandas syntax but at ClickHouse speeds, and to easily query and join data from diverse sources.

Features

Drop-in replacement for pandas API
10-100x faster performance
Connects to 16+ data sources (databases, cloud storage, files)
Supports 10+ file formats (Parquet, CSV, JSON, etc.)
Performs cross-source joins seamlessly

Use Cases

Analyzing large datasets with pandas-style syntax
Speeding up slow pandas code
Querying remote databases or cloud storage as DataFrames
Joining data across different sources (e.g., database table and parquet file)

Non-Goals

Performing raw SQL queries (use chdb-sql skill)
ClickHouse server administration
Usage in non-Python languages

Trust

info:Issues Attention22 issues opened, 0 closed in the last 90 days, indicating slow response times from maintainers.

Compliance

info:GDPRThe skill operates on user-provided data sources, which may contain personal data. No explicit sanitization is mentioned, but data is not sent to third parties without user action.

Installation

First, add the marketplace

/plugin marketplace add clickhouse/agent-skills

/plugin install agent-skills@clickhouse-agent-skills

Quality Score

Verified

95 /100

Analyzed about 22 hours ago

Trust Signals

Last commit1 day ago

GitHub owner clickhouse

Stars425

LicenseApache-2.0

Websiteclickhouse.ai

Status

View Source

Similar Extensions

Polars

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

Skill

K-Dense-AI

AlterLab Polars

Part of the AlterLab Academic Skills suite. Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

Skill

AlterLab-IEU

Survey Insect Population

100

Design and execute insect population surveys covering survey design, sampling methods, field execution, specimen identification, diversity index calculation including Shannon-Wiener and Simpson indices, statistical analysis, and reporting. Covers defining survey objectives, selecting study sites, determining sampling intensity and replication, choosing sampling methods appropriate to target taxa, standardizing collection effort, recording environmental covariates, identifying specimens to the lowest practical taxonomic level, calculating species richness, Shannon-Wiener diversity (H'), Simpson diversity (1-D), evenness, rarefaction curves, multivariate ordination, and producing survey reports with species lists and conservation implications. Use when conducting baseline biodiversity assessments, monitoring insect populations over time, comparing insect communities across habitats or treatments, assessing environmental impact, or supporting conservation planning with quantitative ecological data.

Skill

pjt222

Fit Drift Diffusion Model

100

Fit cognitive drift-diffusion models (Ratcliff DDM) to reaction time and accuracy data with parameter estimation (drift rate, boundary separation, non-decision time), model comparison, and parameter recovery validation. Use when modeling binary decision-making with reaction time data, estimating cognitive parameters from experimental data, comparing sequential sampling model variants, or decomposing speed-accuracy tradeoff effects into latent cognitive components.

Skill

pjt222

Measure Experiment Design

100

Designs an A/B test or experiment with clear hypothesis, variants, success metrics, sample size, and duration. Use when planning experiments to validate product changes or test hypotheses.

Skill

product-on-purpose

PyDESeq2

100

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

Skill

K-Dense-AI