Data & ReproDrug Discovery & CheminformaticsFreedomIntelligence/OpenClaw-Medical-SkillsData & Reproduction
CL

claw-ancestry-pca

Maintainer FreedomIntelligence · Last updated April 1, 2026

Ancestry decomposition PCA against the Simons Genome Diversity Project.

OpenClawNanoClawAnalysisReproductionclaw-ancestry-pca⚙️ clawbio pipelinesgenomics, ancestry & pharmacogenomicsancestry

Original source

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/claw-ancestry-pca

Maintainer
FreedomIntelligence
License
MIT
Last updated
April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

  • Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) — 345 samples from 164 populations spanning every inhabited continent.
  • Panel A: PC1 vs PC2 — main population structure of your cohort.
  • Panel B: PC3 vs PC2 with regional groupings and confidence ellipses.
  • Panel C: PC3 vs PC1 with language/cultural groupings.
  • Panel D: Global context — your samples (circles) vs SGDP (triangles).

Source Doc

Excerpt From SKILL.md

Why this exists

If you ask ChatGPT to "run a PCA against a global reference panel," it will:

  • Not know which reference panel to use
  • Hallucinate PLINK flags for merging datasets with different variant sets
  • Skip IBD removal (related individuals distort PCA)
  • Not normalise contig names between your VCF and the reference
  • Produce a single scatter plot with no population labels

This skill encodes the correct methodological decisions:

  • Uses SGDP (the gold-standard reference for global diversity)
  • Handles contig normalisation (chr1 vs 1)
  • Filters to common biallelic SNPs shared between datasets
  • Removes related individuals via IBD checks
  • Produces publication-quality multi-panel figures with confidence ellipses
  • Differentiates your samples (circles) from reference (triangles)

Reference Panel

The skill bundles the SGDP v4 dataset (Mallick et al., 2016, Nature):

  • 345 samples from 164 populations
  • Whole-genome sequencing at high coverage
  • MAF > 0.1% filter applied
  • Populations span: Africa, Americas, Central/South Asia, East Asia, Europe, Middle East, Oceania

Demo (works out of the box)

The demo uses pre-computed PCA results from the Peruvian Genome Project (736 samples, 28 populations) and generates the full 4-panel figure instantly.

Use cases

  • Use claw-ancestry-pca for medicinal chemistry and drug-discovery work.
  • Apply claw-ancestry-pca to compound, target, or screening workflows.

Not for

  • Do not rely on this catalog entry alone for installation or maintenance details.

Related skills

Related skills

Back to directory
AA
Data & ReproDrug Discovery & Cheminformatics

aav-vector-design-agent

AAV vector design: capsid selection, promoter optimization, payload capacity.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
AG
Data & ReproDrug Discovery & Cheminformatics

agentd-drug-discovery

AgentD autonomous drug discovery: target identification, hit finding, ADMET optimization.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproDrug Discovery & Cheminformatics

bio-clinical-databases-hla-typing

Call HLA alleles from NGS data using OptiType, HLA-HD, or arcasHLA for immunogenomics applications. Use when determining HLA genotype for tr…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproDrug Discovery & Cheminformatics

bio-clinical-databases-pharmacogenomics

Query PharmGKB and CPIC for drug-gene interactions, pharmacogenomic annotations, and dosing guidelines. Use when predicting drug response fr…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView