Data & ReproScientific VisualizationK-Dense-AI/claude-scientific-skillsData & Reproduction
PO

Polars

Maintainer K-Dense Inc. · Last updated April 1, 2026

polars-bio is a high-performance Python library for genomic interval operations and bioinformatics file I/O, built on Polars, Apache Arrow, and Apache DataFusion. It provides a familiar DataFrame-centric API for interval arithmetic (overlap, nearest, merge, coverage, complement, subtract) and reading/writing common bioinformatics formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ). Key value propositions: - **6-38x….

Claude CodeAnalysisWritingpolars-biodata-analysispackagedata analysis & visualization

Original source

K-Dense-AI/claude-scientific-skills

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/polars-bio

Maintainer
K-Dense Inc.
License
https://github.com/biodatageeks/polars-bio/blob/main/LICENSE
Last updated
April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

  • 6-38x faster than bioframe on real-world genomic benchmarks.
  • Streaming/out-of-core support for large genomes via DataFusion.
  • Cloud-native file I/O (S3, GCS, Azure) with predicate pushdown.
  • Two API styles: functional (pb.overlap(df1, df2)) and method-chaining (df1.lazy().pb.overlap(df2)).
  • SQL interface for genomic data via DataFusion SQL engine.

Source Doc

Excerpt From SKILL.md

When to Use This Skill

Use this skill when:

  • Performing genomic interval operations (overlap, nearest, merge, coverage, complement, subtract)
  • Reading/writing bioinformatics file formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ)
  • Processing large genomic datasets that don't fit in memory (streaming mode)
  • Running SQL queries on genomic data files
  • Migrating from bioframe to a faster alternative
  • Computing read depth/pileup from BAM/CRAM files
  • Working with Polars DataFrames containing genomic intervals

Basic Overlap Example

import polars as pl
import polars_bio as pb

## Create two interval DataFrames

df1 = pl.DataFrame({
    "chrom": ["chr1", "chr1", "chr1"],
    "start": [1, 5, 22],
    "end":   [6, 9, 30],
})

df2 = pl.DataFrame({
    "chrom": ["chr1", "chr1"],
    "start": [3, 25],
    "end":   [8, 28],
})

Use cases

  • Performing genomic interval operations (overlap, nearest, merge, coverage, complement, subtract).
  • Reading/writing bioinformatics file formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ).
  • Processing large genomic datasets that don't fit in memory (streaming mode).
  • Running SQL queries on genomic data files.

Not for

  • Do not rely on this catalog entry alone for installation or maintenance details.

Related skills

Related skills

Back to directory
BI
Data & ReproScientific Visualization

bio-chipseq-visualization

Visualize ChIP-seq data using deepTools, Gviz, and ChIPseeker. Create heatmaps, profile plots, and genome browser tracks. Visualize signal a…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproScientific Visualization

bio-consensus-sequences

Generate consensus FASTA sequences by applying VCF variants to a reference using bcftools consensus. Use when creating sample-specific refer…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproScientific Visualization

bio-copy-number-cnv-visualization

Visualize copy number profiles, segments, and compare across samples. Create publication-quality plots of CNV data from CNVkit, GATK, or oth…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproScientific Visualization

bio-data-visualization-circos-plots

Circular genome visualization with Circos or pycirclize.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView