数据与复现科研绘图与可视化K-Dense-AI/claude-scientific-skills数据与复现
PO

Polars

维护者 K-Dense Inc. · 最近更新 2026年4月1日

Polars:polars-bio是一个high-performance Python 库 ,用于 genomic interval operations 、 bioinformatics file I/O,built on Polars,Apache Arrow,、 Apache DataFusion。 It provides familiar DataFrame-centric API ,用于 interval arithmetic (overlap,nearest,merge,coverage,complement,subtract) 、 reading/writing common bioinformatics formats (BED,VCF,BAM,CRAM,GFF/GTF,FASTA,FASTQ)。

Claude Code分析处理写作整理polars-biodata-analysispackagedata analysis & visualization

原始来源

K-Dense-AI/claude-scientific-skills

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/polars-bio

维护者
K-Dense Inc.
许可
https://github.com/biodatageeks/polars-bio/blob/main/LICENSE
最近更新
2026年4月1日

技能摘要

来自 SKILL.md 的关键信息

2 min

核心说明

  • 6-38x faster than bioframe on real-world genomic 基准评测。
  • Streaming/out-of-core support ,用于 large genomes ,通过 DataFusion。
  • Cloud-native file I/O (S3,GCS,Azure) ,支持 predicate pushdown。
  • Two API styles:functional (pb.overlap(df1,df2)) 、 method-chaining (df1.lazy().pb.overlap(df2))。
  • SQL interface ,用于 genomic data ,通过 DataFusion SQL engine。

原始文档

SKILL.md 摘录

When to Use This Skill

Use this skill when:

  • Performing genomic interval operations (overlap, nearest, merge, coverage, complement, subtract)
  • Reading/writing bioinformatics file formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ)
  • Processing large genomic datasets that don't fit in memory (streaming mode)
  • Running SQL queries on genomic data files
  • Migrating from bioframe to a faster alternative
  • Computing read depth/pileup from BAM/CRAM files
  • Working with Polars DataFrames containing genomic intervals

Basic Overlap Example

import polars as pl
import polars_bio as pb

## Create two interval DataFrames

df1 = pl.DataFrame({
    "chrom": ["chr1", "chr1", "chr1"],
    "start": [1, 5, 22],
    "end":   [6, 9, 30],
})

df2 = pl.DataFrame({
    "chrom": ["chr1", "chr1"],
    "start": [3, 25],
    "end":   [8, 28],
})

适用场景

  • Performing genomic interval operations (overlap,nearest,merge,coverage,complement,subtract)。
  • Reading/writing bioinformatics file formats (BED,VCF,BAM,CRAM,GFF/GTF,FASTA,FASTQ)。
  • Processing large genomic 数据集s that don't fit in memory (streaming mode)。
  • Running SQL queries on genomic data files。

不适用场景

  • Do not rely on this catalog entry alone ,用于 installation 或 maintenance details。

相关技能

相关技能

返回目录
BI
数据与复现科研绘图与可视化

bio-chipseq-visualization

bio-chipseq-visualization:可视化 ChIP-seq data ,使用 deepTools,Gviz,、 ChIPseeker。 创建 heatmaps,profile plots,、 genome browser…

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现科研绘图与可视化

bio-consensus-sequences

bio-consensus-sequences:生成 consensus FASTA sequences by applying VCF variants to reference ,使用 bcftools consensus。 适合在cr…

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现科研绘图与可视化

bio-copy-number-cnv-visualization

bio-copy-number-cnv-visualization:可视化 copy number profiles,segments,、 compare across samples。 创建 publication-quality plo…

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现科研绘图与可视化

bio-data-visualization-circos-plots

bio-data-visualization-circos-plots:Circular genome visualization ,支持 Circos 或 pycirclize。

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看