Chdb Datastore
技能 已验证 活跃与 ClickHouse 性能兼容的即插即用 pandas 替代品。使用 `import chdb.datastore as pd`(或 `from datastore import DataStore`)并编写标准的 pandas 代码 — API 相同,在大数据集上速度提升 10-100 倍。支持 16 种以上数据源(MySQL、PostgreSQL、S3、MongoDB、ClickHouse、Iceberg、Delta Lake 等)和 10 种以上文件格式(Parquet、CSV、JSON、Arrow、ORC 等)以及跨源连接。当用户希望使用 pandas 风格的语法分析数据、加速缓慢的 pandas 代码、将远程数据库或云存储作为 DataFrame 查询,或连接不同来源的数据时,请使用此技能 — 即使他们没有明确提及 chdb 或 DataStore。请勿用于原始 SQL 查询、ClickHouse 服务器管理或非 Python 语言。
使用熟悉的 pandas 语法以 ClickHouse 的速度执行数据分析,并轻松查询和连接来自不同来源的数据。
功能
- pandas API 的即插即用替代品
- 性能提升 10-100 倍
- 连接 16 种以上数据源(数据库、云存储、文件)
- 支持 10 种以上文件格式(Parquet、CSV、JSON 等)
- 无缝执行跨源连接
使用场景
- 使用 pandas 风格的语法分析大型数据集
- 加速缓慢的 pandas 代码
- 将远程数据库或云存储作为 DataFrame 查询
- 连接不同来源的数据(例如,数据库表和 parquet 文件)
非目标
- 执行原始 SQL 查询(请使用 chdb-sql 技能)
- ClickHouse 服务器管理
- 在非 Python 语言中使用
Trust
- info:Issues Attention过去 90 天内已打开 22 个问题,关闭 0 个,表明维护者响应速度较慢。
Compliance
- info:GDPR该技能操作用户提供的数据源,其中可能包含个人数据。没有明确提到数据清理,但数据在用户操作前不会发送给第三方。
安装
请先添加 Marketplace
/plugin marketplace add clickhouse/agent-skills/plugin install agent-skills@clickhouse-agent-skills质量评分
已验证类似扩展
Polars
99Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
AlterLab Polars
78Part of the AlterLab Academic Skills suite. Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.
Survey Insect Population
100Design and execute insect population surveys covering survey design, sampling methods, field execution, specimen identification, diversity index calculation including Shannon-Wiener and Simpson indices, statistical analysis, and reporting. Covers defining survey objectives, selecting study sites, determining sampling intensity and replication, choosing sampling methods appropriate to target taxa, standardizing collection effort, recording environmental covariates, identifying specimens to the lowest practical taxonomic level, calculating species richness, Shannon-Wiener diversity (H'), Simpson diversity (1-D), evenness, rarefaction curves, multivariate ordination, and producing survey reports with species lists and conservation implications. Use when conducting baseline biodiversity assessments, monitoring insect populations over time, comparing insect communities across habitats or treatments, assessing environmental impact, or supporting conservation planning with quantitative ecological data.
Fit Drift Diffusion Model
100Fit cognitive drift-diffusion models (Ratcliff DDM) to reaction time and accuracy data with parameter estimation (drift rate, boundary separation, non-decision time), model comparison, and parameter recovery validation. Use when modeling binary decision-making with reaction time data, estimating cognitive parameters from experimental data, comparing sequential sampling model variants, or decomposing speed-accuracy tradeoff effects into latent cognitive components.
Measure Experiment Design
100Designs an A/B test or experiment with clear hypothesis, variants, success metrics, sample size, and duration. Use when planning experiments to validate product changes or test hypotheses.
PyDESeq2
100Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.