Polars Bio
Skill Verifiziert AktivHigh-performance genomic interval operations and bioinformatics file I/O on Polars DataFrames. Overlap, nearest, merge, coverage, complement, subtract for BED/VCF/BAM/GFF intervals. Streaming, cloud-native, faster bioframe alternative.
To enable efficient, high-performance genomic interval operations and bioinformatics file I/O directly within Polars DataFrames, offering a faster and more scalable alternative for bioinformatics data processing.
Funktionen
- Genomic interval operations (overlap, nearest, merge, coverage, complement, subtract)
- High-performance bioinformatics file I/O (BED, VCF, BAM, CRAM, GFF, FASTA, FASTQ)
- Polars DataFrame and LazyFrame integration
- Streaming and out-of-core processing for large datasets
- Cloud-native file access (S3, GCS, Azure)
- SQL interface for genomic data via DataFusion
Anwendungsfälle
- Performing complex genomic interval arithmetic on large datasets.
- Reading, writing, and processing standard bioinformatics file formats.
- Analyzing genomic data that exceeds available RAM using streaming capabilities.
- Querying genomic files directly using SQL.
Nicht-Ziele
- Replacing general-purpose data analysis libraries (use Polars directly).
- Providing a graphical user interface for bioinformatics analysis.
- Performing wet-lab experimental design or interpretation (focus is on data processing).
Installation
npx skills add K-Dense-AI/claude-scientific-skillsFührt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.
Qualitätspunktzahl
VerifiziertVertrauenssignale
Ähnliche Erweiterungen
Pysam
99Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
PyDESeq2
100Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scanpy
99Standard single-cell RNA-seq analysis pipeline. Use for QC, normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression, and visualization. Best for exploratory scRNA-seq analysis with established workflows. For deep learning models use scvi-tools; for data format questions use anndata.
Gtars
99High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.
Geniml
99This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.
Biopython
99Comprehensive molecular biology toolkit. Use for sequence manipulation, file parsing (FASTA/GenBank/PDB), phylogenetics, and programmatic NCBI/PubMed access (Bio.Entrez). Best for batch processing, custom bioinformatics pipelines, BLAST automation. For quick lookups use gget; for multi-service integration use bioservices.