Data & ReproStatistics & Data AnalysisFreedomIntelligence/OpenClaw-Medical-SkillsData & Reproduction

bio-sequence-statistics

Maintainer FreedomIntelligence · Last updated March 31, 2026

Calculate sequence statistics (N50, length distribution, GC content, summary reports) using Biopython. Use when analyzing sequence datasets, generating QC reports, or comparing assemblies.

OpenClawNanoClawAnalysisReproductionbio-sequence-statistics🧬 bioinformatics (gptomics bio-* suite)bioinformatics — sequencing & read qccalculate

Original source

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-sequence-statistics

Maintainer: FreedomIntelligence
License: MIT
Last updated: March 31, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

Python: SeqIO.parse(), gc_fraction() (BioPython).
Calculate N50 and other assembly statistics" → Compute sequence count, length distribution, N50/L50, GC content, and nucleotide composition for FASTA datasets. Python: SeqIO.parse(), gc_fraction() (BioPython).
Calculate comprehensive statistics for sequence datasets using Biopython.
bin_size = 100 bins = [(l // bin_size) * bin_size for l in lengths] histogram = Counter(bins).
for length_bin in sorted(histogram.keys()): count = histogram[length_bin] print(f'{length_bin}-{length_bin + bin_size}: {count}').

Source Doc

Excerpt From SKILL.md

Length Histogram Data

from collections import Counter

lengths = [len(r.seq) for r in SeqIO.parse('sequences.fasta', 'fasta')]

## Comprehensive Summary Report

**Goal:** Generate a complete QC summary (counts, lengths, N50, GC) for any FASTA file.

**Approach:** Load all records, compute length and GC arrays, derive N50/L50 from cumulative sorted lengths, and package into a dictionary.

**Reference (BioPython 1.83+):**

## Compare Multiple Assemblies

**Goal:** Generate a side-by-side comparison table of key metrics across multiple assembly files.

**Approach:** Run `sequence_summary` on each file and format results into an aligned table.

**Reference (BioPython 1.83+):**

Use cases

Use when analyzing sequence datasets, generating QC reports, or comparing assemblies.

Not for

Do not rely on this catalog entry alone for installation or maintenance details.

Upstream Related Skills

read-sequences - Parse sequences for statistics calculation
batch-processing - Calculate stats across multiple files
fastq-quality - Quality score statistics for FASTQ files
sequence-manipulation/sequence-properties - Per-sequence GC content and properties
alignment-files - samtools stats/flagstat for alignment statistics