数据与复现科研绘图与可视化K-Dense-AI/claude-scientific-skills数据与复现
DA

Dask

维护者 K-Dense Inc. · 最近更新 2026年4月1日

Dask是一个Python 库 ,用于 parallel 、 distributed computing that enables three critical capabilities:- **Larger-than-memory execution** on single machines ,用于 data exceeding available RAM - **Parallel processing** ,用于 improved computational speed across multiple cores - **Distributed computation** supporting terabyte-scale 数据集s across multiple machines Dask scales ,面向 laptops (processing ~100 GiB) to clust…。

Claude CodeOpenClawNanoClaw分析处理写作整理daskdata-analysispackagedata analysis & visualization

原始来源

K-Dense-AI/claude-scientific-skills

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/dask

维护者
K-Dense Inc.
许可
BSD-3-Clause license
最近更新
2026年4月1日

技能摘要

来自 SKILL.md 的关键信息

2 min

核心说明

  • Larger-than-memory execution on single machines ,用于 data exceeding available RAM。
  • Parallel processing ,用于 improved computational speed across multiple cores。
  • Distributed computation supporting terabyte-scale 数据集s across multiple machines。
  • Dask是一个Python 库 ,用于 parallel 、 distributed computing that enables three critical capabilities:Larger-than-memory execution on single machines ,用于 data exceeding available RAM Parallel processing ,用于 improved computational speed across multiple cores Distributed computation supporting terabyte-scale 数据集s across multiple machines。
  • Dask scales ,面向 laptops (processing ~100 GiB) to clusters (processing ~100 TiB) while maintaining familiar Python APIs。

原始文档

SKILL.md 摘录

When to Use This Skill

This skill should be used when:

  • Process datasets that exceed available RAM
  • Scale pandas or NumPy operations to larger datasets
  • Parallelize computations for performance improvements
  • Process multiple files efficiently (CSVs, Parquet, JSON, text logs)
  • Build custom parallel workflows with task dependencies
  • Distribute workloads across multiple cores or machines

Core Capabilities

Dask provides five main components, each suited to different use cases:

1. DataFrames - Parallel Pandas Operations

Purpose: Scale pandas operations to larger datasets through parallel processing.

When to Use:

  • Tabular data exceeds available RAM
  • Need to process multiple CSV/Parquet files together
  • Pandas operations are slow and need parallelization
  • Scaling from pandas prototype to production

Reference Documentation: For comprehensive guidance on Dask DataFrames, refer to references/dataframes.md which includes:

  • Reading data (single files, multiple files, glob patterns)
  • Common operations (filtering, groupby, joins, aggregations)
  • Custom operations with map_partitions
  • Performance optimization tips
  • Common patterns (ETL, time series, multi-file processing)

Quick Example:

import dask.dataframe as dd

适用场景

  • Process 数据集s that exceed available RAM。
  • Scale pandas 或 NumPy operations to larger 数据集s。

不适用场景

  • Do not rely on this catalog entry alone ,用于 installation 或 maintenance details。

相关技能

相关技能

返回目录
BG
数据与复现科研绘图与可视化

bgpt-paper-search

bgpt-paper-search:BGPT是一个remote MCP server that searches curated database of scientific papers built ,面向 raw experimenta…

Claude CodeOpenClaw分析处理
K-Dense-AI/claude-scientific-skills查看
BI
数据与复现科研绘图与可视化

bio-data-visualization-ggplot2-fundamentals

bio-data-visualization-ggplot2-fundamentals:R ggplot2 ,用于 publication-quality genomics 、 omics figures。

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现科研绘图与可视化

bio-data-visualization-upset-plots

bio-data-visualization-upset-plots:UpSet plots ,用于 multi-set intersection visualization。

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现科研绘图与可视化

bio-data-visualization-volcano-customization

bio-data-visualization-volcano-customization:Customized volcano plots ,支持 ggplot2 或 matplotlib ,用于 DE results。

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看