What it does
- Takes paired-end FASTQ files (R1, R2) or a single concatenated FASTQ as input
- Runs Kraken2 taxonomic classification against a standard database (e.g., Standard-8, PlusPF)
- Refines abundances with Bracken at species level (read re-estimation)
- Detects antimicrobial resistance genes with RGI against the CARD database
- Classifies detected ARGs by WHO critical priority pathogen association
- Optionally runs HUMAnN3 for functional pathway profiling (MetaCyc + UniRef)
- Generates three publication-quality figures:
- Figure 1: Taxonomy bar chart — top 20 species by relative abundance
- Figure 2: Resistome heatmap — ARG families by drug class with abundance
- Figure 3: WHO-critical ARG summary — priority-tier breakdown of detected resistance genes
- Produces a full reproducibility bundle (commands.sh, environment.yml, checksums.sha256)
Why this exists
If you ask a general AI to "analyse a metagenome," it will:
- Not know which Kraken2 database to use or how to set confidence thresholds
- Hallucinate Bracken parameters for read-length and taxonomic level
- Miss the connection between detected ARGs and WHO priority pathogen lists
- Skip HUMAnN3 entirely (or misconfigure its database paths)
- Produce a single bar chart with no resistance context
- Not provide a reproducibility bundle
This skill encodes the correct methodological decisions:
- Kraken2 confidence threshold of 0.2 (reduces false positives in environmental samples)
- Bracken re-estimation at species level with minimum 10 reads
- RGI MAIN with "Perfect" and "Strict" hit criteria only (no "Loose" hits)
- WHO Critical Priority Pathogen list mapped to detected ARG families
- HUMAnN3 with MetaCyc stratification for pathway-level functional context
- Thread count auto-detected from available CPUs
- Full reproducibility bundle for every run
Validated On
The skill works with any shotgun metagenome but has been validated on:
- Peru sewage metagenomics study (6 samples, 3 collection sites: Lima, Cusco, Iquitos)
- Environmental sewage samples with mixed microbial communities
- Read depths ranging from 2M to 15M paired-end reads per sample