Skip to main content

Datamol Cheminformatics Skill

Skill Verified Active

Pythonic wrapper around RDKit with simplified interface and sensible defaults. Preferred for standard drug discovery including SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformers, parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly. Part of the AlterLab Academic Skills suite.

Purpose

To simplify complex molecular operations for drug discovery and related research by offering a user-friendly interface with sensible defaults over the RDKit library.

Features

  • SMILES parsing and standardization
  • Molecular descriptor and fingerprint computation
  • 3D conformer generation and analysis
  • Clustering and diversity selection
  • Scaffold and fragment analysis
  • Support for various molecular file formats (SDF, SMILES, CSV, Excel)
  • Remote file support via fsspec
  • Parallel processing for batch operations

Use Cases

  • Standardizing and cleaning molecular datasets
  • Calculating molecular properties for filtering or QSAR
  • Generating diverse sets of molecules for screening
  • Analyzing scaffold diversity in compound libraries
  • Visualizing molecular structures and conformers

Non-Goals

  • Providing a full replacement for RDKit's advanced control
  • Handling highly specialized or non-standard cheminformatics tasks
  • Advanced quantum chemistry calculations

Workflow

  1. Load molecular data from files or strings
  2. Standardize and sanitize molecules
  3. Compute descriptors or fingerprints
  4. Perform clustering, diversity selection, or scaffold analysis
  5. Generate 3D conformers if needed
  6. Visualize results or export processed data

Practices

  • Data standardization
  • Molecular descriptor calculation
  • Similarity analysis
  • Structure-activity relationship analysis
  • Machine learning feature generation

Prerequisites

  • Python environment
  • uv pip installed

Installation

npx skills add AlterLab-IEU/AlterLab-Academic-Skills

Runs the Vercel skills CLI (skills.sh) via npx — needs Node.js locally and at least one installed skills-compatible agent (Claude Code, Cursor, Codex, …). Assumes the repo follows the agentskills.io format.

Quality Score

Verified
99 /100
Analyzed 1 day ago

Trust Signals

Last commit17 days ago
Stars15
LicenseMIT
Status
View Source

Similar Extensions

RDKit Cheminformatics Toolkit

99

Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms.

Skill
K-Dense-AI

Alterlab Rdkit

98

Cheminformatics toolkit for fine-grained molecular control. SMILES/SDF parsing, descriptors (MW, LogP, TPSA), fingerprints, substructure search, 2D/3D generation, similarity, reactions. For standard workflows with simpler interface, use datamol (wrapper around RDKit). Use rdkit for advanced control, custom sanitization, specialized algorithms. Part of the AlterLab Academic Skills suite.

Skill
AlterLab-IEU

Datamol

97

Pythonic wrapper around RDKit with simplified interface and sensible defaults. Preferred for standard drug discovery including SMILES parsing, standardization, descriptors, fingerprints, clustering, 3D conformers, parallel processing. Returns native rdkit.Chem.Mol objects. For advanced control or custom parameters, use rdkit directly.

Skill
K-Dense-AI

Fit Drift Diffusion Model

100

Fit cognitive drift-diffusion models (Ratcliff DDM) to reaction time and accuracy data with parameter estimation (drift rate, boundary separation, non-decision time), model comparison, and parameter recovery validation. Use when modeling binary decision-making with reaction time data, estimating cognitive parameters from experimental data, comparing sequential sampling model variants, or decomposing speed-accuracy tradeoff effects into latent cognitive components.

Skill
pjt222

PyTDC (Therapeutics Data Commons)

99

Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.

Skill
K-Dense-AI

Molfeat

99

Molecular featurization for ML (100+ featurizers). ECFP, MACCS, descriptors, pretrained models (ChemBERTa), convert SMILES to features, for QSAR and molecular ML.

Skill
K-Dense-AI

© 2025 SkillRepo · Find the right skill, skip the noise.