bio-chipseq-visualization:可视化 ChIP-seq data ,使用 deepTools,Gviz,、 ChIPseeker。 创建 heatmaps,profile plots,、 genome browser…
Polars
维护者 K-Dense Inc. · 最近更新 2026年4月1日
Polars:polars-bio是一个high-performance Python 库 ,用于 genomic interval operations 、 bioinformatics file I/O,built on Polars,Apache Arrow,、 Apache DataFusion。 It provides familiar DataFrame-centric API ,用于 interval arithmetic (overlap,nearest,merge,coverage,complement,subtract) 、 reading/writing common bioinformatics formats (BED,VCF,BAM,CRAM,GFF/GTF,FASTA,FASTQ)。
原始来源
K-Dense-AI/claude-scientific-skills
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/polars-bio
- 维护者
- K-Dense Inc.
- 许可
- https://github.com/biodatageeks/polars-bio/blob/main/LICENSE
- 最近更新
- 2026年4月1日
技能摘要
来自 SKILL.md 的关键信息
核心说明
- 6-38x faster than bioframe on real-world genomic 基准评测。
- Streaming/out-of-core support ,用于 large genomes ,通过 DataFusion。
- Cloud-native file I/O (S3,GCS,Azure) ,支持 predicate pushdown。
- Two API styles:functional (pb.overlap(df1,df2)) 、 method-chaining (df1.lazy().pb.overlap(df2))。
- SQL interface ,用于 genomic data ,通过 DataFusion SQL engine。
原始文档
SKILL.md 摘录
When to Use This Skill
Use this skill when:
- Performing genomic interval operations (overlap, nearest, merge, coverage, complement, subtract)
- Reading/writing bioinformatics file formats (BED, VCF, BAM, CRAM, GFF/GTF, FASTA, FASTQ)
- Processing large genomic datasets that don't fit in memory (streaming mode)
- Running SQL queries on genomic data files
- Migrating from bioframe to a faster alternative
- Computing read depth/pileup from BAM/CRAM files
- Working with Polars DataFrames containing genomic intervals
Basic Overlap Example
import polars as pl
import polars_bio as pb
## Create two interval DataFrames
df1 = pl.DataFrame({
"chrom": ["chr1", "chr1", "chr1"],
"start": [1, 5, 22],
"end": [6, 9, 30],
})
df2 = pl.DataFrame({
"chrom": ["chr1", "chr1"],
"start": [3, 25],
"end": [8, 28],
})
适用场景
- Performing genomic interval operations (overlap,nearest,merge,coverage,complement,subtract)。
- Reading/writing bioinformatics file formats (BED,VCF,BAM,CRAM,GFF/GTF,FASTA,FASTQ)。
- Processing large genomic 数据集s that don't fit in memory (streaming mode)。
- Running SQL queries on genomic data files。
不适用场景
- Do not rely on this catalog entry alone ,用于 installation 或 maintenance details。
相关技能
相关技能
bio-consensus-sequences:生成 consensus FASTA sequences by applying VCF variants to reference ,使用 bcftools consensus。 适合在cr…
bio-copy-number-cnv-visualization:可视化 copy number profiles,segments,、 compare across samples。 创建 publication-quality plo…
bio-data-visualization-circos-plots:Circular genome visualization ,支持 Circos 或 pycirclize。