Data & ReproSingle-Cell & Spatial OmicsK-Dense-AI/claude-scientific-skillsData & Reproduction
GE

geniml

Maintainer K-Dense Inc. · Last updated April 1, 2026

Geniml is a Python package for building machine learning models on genomic interval data from BED files. It provides unsupervised methods for learning embeddings of genomic regions, single cells, and metadata labels, enabling similarity searches, clustering, and downstream ML tasks.

Claude CodeOpenClawNanoClawAnalysisReproductiongenimlbioinformaticspackagebioinformatics & genomics

Original source

K-Dense-AI/claude-scientific-skills

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/geniml

Maintainer
K-Dense Inc.
License
BSD-2-Clause license
Last updated
April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

  • Geniml is a Python package for building machine learning models on genomic interval data from BED files. It provides unsupervised methods for learning embeddings of genomic regions, single cells, and metadata labels, enabling similarity searches, clustering, and downstream ML tasks.
  • hard_tokenization( src_folder='bed_files/', dst_folder='tokens/', universe_file='universe.bed', p_value_threshold=1e-9 ).

Source Doc

Excerpt From SKILL.md

Installation

Install geniml using uv:

For ML dependencies (PyTorch, etc.):

Development version from GitHub:

Core Capabilities

Geniml provides five primary capabilities, each detailed in dedicated reference files:

1. Region2Vec: Genomic Region Embeddings

Train unsupervised embeddings of genomic regions using word2vec-style learning.

Use for: Dimensionality reduction of BED files, region similarity analysis, feature vectors for downstream ML.

Workflow:

  1. Tokenize BED files using a universe reference
  2. Train Region2Vec model on tokens
  3. Generate embeddings for regions

Reference: See references/region2vec.md for detailed workflow, parameters, and examples.

Use cases

  • Use for training region embeddings (Region2Vec, BEDspace), single-cell ATAC-seq analysis (scEmbed), building consensus peaks (universes), or any ML-based analysis of genomic regions.

Not for

  • Do not rely on this catalog entry alone for installation or maintenance details.

Related skills

Related skills

Back to directory
AN
Data & ReproSingle-Cell & Spatial Omics

AnnData

AnnData is a Python package for handling annotated data matrices, storing experimental measurements (X) alongside observation metadata (obs)…

Claude CodeOpenClawAnalysis
K-Dense-AI/claude-scientific-skillsView
AR
Data & ReproSingle-Cell & Spatial Omics

Arboreto

Arboreto is a computational library for inferring gene regulatory networks (GRNs) from gene expression data using paralleli.

Claude CodeOpenClawAnalysis
K-Dense-AI/claude-scientific-skillsView
BI
Data & ReproSingle-Cell & Spatial Omics

bio-imaging-mass-cytometry-cell-segmentation

Cell segmentation from multiplexed tissue images. Covers deep learning (Cellpose, Mesmer) and classical approaches for nuclear and whole-cel…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproSingle-Cell & Spatial Omics

bio-read-qc-umi-processing

Extract, process, and deduplicate reads using Unique Molecular Identifiers (UMIs) with umi_tools. Use when library prep includes UMIs and ac…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView