Data & ReproStatistics & Data AnalysisFreedomIntelligence/OpenClaw-Medical-SkillsData & Reproduction
BI

bio-sequence-statistics

Maintainer FreedomIntelligence · Last updated April 1, 2026

Calculate sequence statistics (N50, length distribution, GC content, summary reports) using Biopython. Use when analyzing sequence datasets, generating QC reports, or comparing assemblies.

OpenClawNanoClawAnalysisReproductionbio-sequence-statistics🧬 bioinformatics (gptomics bio-* suite)bioinformatics — sequencing & read qccalculate

Original source

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-sequence-statistics

Maintainer
FreedomIntelligence
License
MIT
Last updated
April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

  • Python: SeqIO.parse(), gc_fraction() (BioPython).
  • Calculate N50 and other assembly statistics" → Compute sequence count, length distribution, N50/L50, GC content, and nucleotide composition for FASTA datasets. Python: SeqIO.parse(), gc_fraction() (BioPython).
  • Calculate comprehensive statistics for sequence datasets using Biopython.
  • bin_size = 100 bins = [(l // bin_size) * bin_size for l in lengths] histogram = Counter(bins).
  • for length_bin in sorted(histogram.keys()): count = histogram[length_bin] print(f'{length_bin}-{length_bin + bin_size}: {count}').

Source Doc

Excerpt From SKILL.md

Length Histogram Data

from collections import Counter

lengths = [len(r.seq) for r in SeqIO.parse('sequences.fasta', 'fasta')]

## Comprehensive Summary Report

**Goal:** Generate a complete QC summary (counts, lengths, N50, GC) for any FASTA file.

**Approach:** Load all records, compute length and GC arrays, derive N50/L50 from cumulative sorted lengths, and package into a dictionary.

**Reference (BioPython 1.83+):**

## Compare Multiple Assemblies

**Goal:** Generate a side-by-side comparison table of key metrics across multiple assembly files.

**Approach:** Run `sequence_summary` on each file and format results into an aligned table.

**Reference (BioPython 1.83+):**

Use cases

  • Use when analyzing sequence datasets, generating QC reports, or comparing assemblies.

Not for

  • Do not rely on this catalog entry alone for installation or maintenance details.

Upstream Related Skills

  • read-sequences - Parse sequences for statistics calculation
  • batch-processing - Calculate stats across multiple files
  • fastq-quality - Quality score statistics for FASTQ files
  • sequence-manipulation/sequence-properties - Per-sequence GC content and properties
  • alignment-files - samtools stats/flagstat for alignment statistics

Related skills

Related skills

Back to directory
AR
Data & ReproStatistics & Data Analysis

arxiv-database

This skill provides Python tools for searching and retrieving preprints from arXiv.org via its public Atom API. It supports keyword search,…

Claude CodeAnalysis
K-Dense-AI/claude-scientific-skillsView
BA
Data & ReproStatistics & Data Analysis

bayesian-optimizer

Bayesian optimization for experimental design and hyperparameter tuning in biomedical research.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproStatistics & Data Analysis

bio-alignment-files-bam-statistics

Compute alignment statistics: flagstat, idxstats, coverage depth.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproStatistics & Data Analysis

bio-alignment-msa-statistics

Calculate alignment statistics including sequence identity, conservation scores, substitution matrices, and similarity metrics. Use when com…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView