Nextflow Development
Skill Verified ActiveRun nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.
To simplify and automate complex omics data analysis for researchers by leveraging nf-core pipelines through an AI agent.
Features
- Automated GEO/SRA data acquisition
- FASTQ, BAM, CRAM file processing
- Sample sheet generation for multiple pipelines
- Environment and resource validation
- Pipeline execution orchestration via Nextflow
Use Cases
- Analyzing RNA-seq data for gene expression
- Performing variant calling on WGS/WES data
- Investigating chromatin accessibility with ATAC-seq
- Reanalyzing public datasets from GEO/SRA
Non-Goals
- Performing the bioinformatics analysis itself (delegated to nf-core pipelines)
- Managing computational infrastructure (relies on Nextflow/Docker)
- Providing direct interpretation of analysis results
Workflow
- Acquire data (if from GEO/SRA)
- Check environment (Docker, Nextflow, Java)
- Detect data type and suggest pipeline
- Generate samplesheet
- Configure and run nf-core pipeline
- Verify outputs
Practices
- Bioinformatics workflow automation
- Data acquisition and preparation
- Pipeline execution management
Prerequisites
- Docker installed and running
- Nextflow version >= 23.04
- Java version >= 11
- Network access to NCBI, ENA, Docker Hub, and GitHub
Installation
First, add the marketplace
/plugin marketplace add anthropics/knowledge-work-plugins/plugin install bio-research@knowledge-work-pluginsQuality Score
VerifiedTrust Signals
Similar Extensions
PyDESeq2
100Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scanpy
99Standard single-cell RNA-seq analysis pipeline. Use for QC, normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression, and visualization. Best for exploratory scRNA-seq analysis with established workflows. For deep learning models use scvi-tools; for data format questions use anndata.
Pysam
99Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.
Polars Bio
99High-performance genomic interval operations and bioinformatics file I/O on Polars DataFrames. Overlap, nearest, merge, coverage, complement, subtract for BED/VCF/BAM/GFF intervals. Streaming, cloud-native, faster bioframe alternative.
Gtars
99High-performance toolkit for genomic interval analysis in Rust with Python bindings. Use when working with genomic regions, BED files, coverage tracks, overlap detection, tokenization for ML models, or fragment analysis in computational genomics and machine learning applications.
Geniml
99This skill should be used when working with genomic interval data (BED files) for machine learning tasks. Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions. Applies to BED file collections, scATAC-seq data, chromatin accessibility datasets, and region-based genomic feature learning.