Exploratory Data Analysis:执行 comprehensive exploratory 数据分析 (EDA) on scientific data files across multiple domains。 This…
Dask
维护者 K-Dense Inc. · 最近更新 2026年3月31日
Dask是一个Python 库 ,用于 parallel 、 distributed computing that enables three critical capabilities:- **Larger-than-memory execution** on single machines ,用于 data exceeding available RAM - **Parallel processing** ,用于 improved computational speed across multiple cores - **Distributed computation** supporting terabyte-scale 数据集s across multiple machines Dask scales ,面向 laptops (processing ~100 GiB) to clust…。
原始来源
K-Dense-AI/claude-scientific-skills
https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/dask
- 维护者
- K-Dense Inc.
- 许可
- BSD-3-Clause license
- 最近更新
- 2026年3月31日
技能摘要
来自 SKILL.md 的关键信息
核心说明
- Larger-than-memory execution on single machines ,用于 data exceeding available RAM。
- Parallel processing ,用于 improved computational speed across multiple cores。
- Distributed computation supporting terabyte-scale 数据集s across multiple machines。
- Dask是一个Python 库 ,用于 parallel 、 distributed computing that enables three critical capabilities:Larger-than-memory execution on single machines ,用于 data exceeding available RAM Parallel processing ,用于 improved computational speed across multiple cores Distributed computation supporting terabyte-scale 数据集s across multiple machines。
- Dask scales ,面向 laptops (processing ~100 GiB) to clusters (processing ~100 TiB) while maintaining familiar Python APIs。
原始文档
SKILL.md 摘录
When to Use This Skill
This skill should be used when:
- Process datasets that exceed available RAM
- Scale pandas or NumPy operations to larger datasets
- Parallelize computations for performance improvements
- Process multiple files efficiently (CSVs, Parquet, JSON, text logs)
- Build custom parallel workflows with task dependencies
- Distribute workloads across multiple cores or machines
Core Capabilities
Dask provides five main components, each suited to different use cases:
1. DataFrames - Parallel Pandas Operations
Purpose: Scale pandas operations to larger datasets through parallel processing.
When to Use:
- Tabular data exceeds available RAM
- Need to process multiple CSV/Parquet files together
- Pandas operations are slow and need parallelization
- Scaling from pandas prototype to production
Reference Documentation: For comprehensive guidance on Dask DataFrames, refer to references/dataframes.md which includes:
- Reading data (single files, multiple files, glob patterns)
- Common operations (filtering, groupby, joins, aggregations)
- Custom operations with
map_partitions - Performance optimization tips
- Common patterns (ETL, time series, multi-file processing)
Quick Example:
import dask.dataframe as dd
适用场景
- Process 数据集s that exceed available RAM。
- Scale pandas 或 NumPy operations to larger 数据集s。
不适用场景
- Do not rely on this catalog entry alone ,用于 installation 或 maintenance details。
相关技能
相关技能
GeoPandas
GeoPandas extends pandas to enable spatial operations on geometric types。 It combines capabilities of pandas 、 shapely ,…
NetworkX
NetworkX是一个Python package ,用于 creating,manipulating,、 analy。
Polars
Polars是一个lightning-fast DataFrame 库 ,用于 Python 、 Rust built on Apache Arrow。 Work ,支持 Polars' expression-based API,la。