Arboreto
Arboreto is a computational library for inferring gene regulatory networks (GRNs) from gene expression data using paralleli.
Maintainer FreedomIntelligence · Last updated April 1, 2026
Extract, process, and deduplicate reads using Unique Molecular Identifiers (UMIs) with umi_tools. Use when library prep includes UMIs and accurate molecule counting is needed, such as in single-cell RNA-seq, low-input RNA-seq, or targeted sequencing to distinguish PCR from biological duplicates.
Original source
https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-read-qc-umi-processing
Skill Snapshot
Source Doc
umi_tools extract
--stdin=R1.fastq.gz
--read2-in=R2.fastq.gz
--stdout=R1_extracted.fastq.gz
--read2-out=R2_extracted.fastq.gz
--bc-pattern2=NNNNNNNN
umi_tools extract
--stdin=R1.fastq.gz
--read2-in=R2.fastq.gz
--stdout=R1_extracted.fastq.gz
--read2-out=R2_extracted.fastq.gz
--bc-pattern=NNNNNNNN
--bc-pattern2=NNNNNNNN
## UMI Pattern Syntax
| Pattern | Meaning |
|---------|---------|
| `N` | UMI base (extracted) |
| `C` | Cell barcode (extracted, kept separate) |
| `X` | Discard base |
| `NNNNNNNN` | 8bp UMI |
| `CCCCCCCCNNNNNNNN` | 8bp cell barcode + 8bp UMI |
| `NNNXXXNNN` | 3bp UMI, skip 3bp, 3bp UMI |
Related skills
Arboreto is a computational library for inferring gene regulatory networks (GRNs) from gene expression data using paralleli.
Cell segmentation from multiplexed tissue images. Covers deep learning (Cellpose, Mesmer) and classical approaches for nuclear and whole-cel…
Integrate multiple scRNA-seq samples/batches using Harmony, scVI, Seurat anchors, and fastMNN. Remove technical variation while preserving b…
Automated cell type annotation using reference-based methods including CellTypist, scPred, SingleR, and Azimuth for consistent, reproducible…