Zum Hauptinhalt springen
Dieser Inhalt ist noch nicht in Ihrer Sprache verfügbar und wird auf Englisch angezeigt.

Nemo Curator

Skill Verifiziert Aktiv

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

Zweck

To efficiently prepare high-quality training datasets for LLMs by accelerating data curation tasks like deduplication and filtering, significantly reducing processing time and cost.

Funktionen

  • GPU-accelerated data curation
  • Multimodal data support (text, image, video, audio)
  • Fast fuzzy deduplication (16x speedup)
  • Advanced quality filtering (30+ heuristics)
  • PII redaction and NSFW detection

Anwendungsfälle

  • Preparing LLM training data from web scrapes
  • Cleaning and deduplicating large corpora
  • Curating multi-modal datasets for AI models
  • Filtering low-quality or sensitive content from datasets

Nicht-Ziele

  • CPU-based data processing
  • Basic data cleaning without advanced curation features
  • Data processing focused on non-LLM use cases
  • Acting as a general data analysis tool

Trust

  • info:Issues AttentionThere are 17 open issues and 4 closed issues in the last 90 days, indicating moderate engagement but a lower closure rate.

Compliance

  • info:GDPRThe tool processes data which may include personal data, but it is for curation within a training dataset and not submitted to a third party without user approval.

Installation

npx skills add davila7/claude-code-templates

Führt das Vercel skills CLI (skills.sh) via npx aus — benötigt Node.js lokal und mindestens einen installierten skills-kompatiblen Agent (Claude Code, Cursor, Codex, …). Setzt voraus, dass das Repo dem agentskills.io-Format folgt.

Qualitätspunktzahl

Verifiziert
98 /100
Analysiert 1 day ago

Vertrauenssignale

Letzter Commit1 day ago
Sterne27.2k
LizenzMIT
Status
Quellcode ansehen

Ähnliche Erweiterungen

Nemo Curator

95

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

Skill
Orchestra-Research

Create Spatial Visualization

100

Create interactive maps, elevation profiles, and spatial visualizations from GPX tracks, waypoints, or route data using R (sf, leaflet, tmap) or Observable (D3, deck.gl). Covers data import, coordinate system handling, map styling, and export to HTML or image formats. Use when visualizing a planned or completed tour route on an interactive map, creating elevation profiles for hiking or cycling routes, overlaying waypoints and POIs on a basemap, or building a web-based trip dashboard.

Skill
pjt222

PyTDC (Therapeutics Data Commons)

99

Therapeutics Data Commons. AI-ready drug discovery datasets (ADME, toxicity, DTI), benchmarks, scaffold splits, molecular oracles, for therapeutic ML and pharmacological prediction.

Skill
K-Dense-AI

Pysam

99

Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines.

Skill
K-Dense-AI

Polars Bio

99

High-performance genomic interval operations and bioinformatics file I/O on Polars DataFrames. Overlap, nearest, merge, coverage, complement, subtract for BED/VCF/BAM/GFF intervals. Streaming, cloud-native, faster bioframe alternative.

Skill
K-Dense-AI

Polars

99

Fast in-memory DataFrame library for datasets that fit in RAM. Use when pandas is too slow but data still fits in memory. Lazy evaluation, parallel execution, Apache Arrow backend. Best for 1-100GB datasets, ETL pipelines, faster pandas replacement. For larger-than-RAM data use dask or vaex.

Skill
K-Dense-AI