Data & ReproDrug Discovery & CheminformaticsK-Dense-AI/claude-scientific-skillsData & Reproduction
PY

PyTDC

Maintainer K-Dense Inc. · Last updated April 1, 2026

PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery and development. Access curated datasets spanning the entire therapeutics pipeline with standardi.

Claude CodeOpenClawNanoClawAnalysisReproductionpytdcchemistrypackagecheminformatics & drug discovery

Original source

K-Dense-AI/claude-scientific-skills

https://github.com/K-Dense-AI/claude-scientific-skills/tree/main/scientific-skills/pytdc

Maintainer
K-Dense Inc.
License
MIT license
Last updated
April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

  • PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery and development. Access curated datasets spanning the entire therapeutics pipeline with standardized evaluation metrics and meaningful data splits, organized into three categories: single-instance prediction (molecular/protein properties), multi-instance prediction (drug-target interactions, DDI), and generation (molecule generation, retrosynthesis).
  • Caco2 - Intestinal permeability.
  • HIA - Human intestinal absorption.
  • Bioavailability - Oral bioavailability.
  • Lipophilicity - Octanol-water partition coefficient.

Source Doc

Excerpt From SKILL.md

When to Use This Skill

This skill should be used when:

  • Working with drug discovery or therapeutic ML datasets
  • Benchmarking machine learning models on standardized pharmaceutical tasks
  • Predicting molecular properties (ADME, toxicity, bioactivity)
  • Predicting drug-target or drug-drug interactions
  • Generating novel molecules with desired properties
  • Accessing curated datasets with proper train/test splits (scaffold, cold-split)
  • Using molecular oracles for property optimization

Installation & Setup

Install PyTDC using pip:

To upgrade to the latest version:

Core dependencies (automatically installed):

  • numpy, pandas, tqdm, seaborn, scikit_learn, fuzzywuzzy

Additional packages are installed automatically as needed for specific features.

Quick Start

The basic pattern for accessing any TDC dataset follows this structure:

Where:

  • <problem>: One of single_pred, multi_pred, or generation
  • <Task>: Specific task category (e.g., ADME, DTI, MolGen)
  • <Dataset>: Dataset name within that task

Example - Loading ADME data:

from tdc.single_pred import ADME
data = ADME(name='Caco2_Wang')
split = data.get_split(method='scaffold')

Use cases

  • Working with drug discovery or therapeutic ML datasets.
  • Benchmarking machine learning models on standardi.

Not for

  • Do not rely on this catalog entry alone for installation or maintenance details.

Related skills

Related skills

Back to directory
AG
Data & ReproDrug Discovery & Cheminformatics

agentd-drug-discovery

AgentD autonomous drug discovery: target identification, hit finding, ADMET optimization.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproDrug Discovery & Cheminformatics

bio-clinical-databases-hla-typing

Call HLA alleles from NGS data using OptiType, HLA-HD, or arcasHLA for immunogenomics applications. Use when determining HLA genotype for tr…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproDrug Discovery & Cheminformatics

bio-clinical-databases-pharmacogenomics

Query PharmGKB and CPIC for drug-gene interactions, pharmacogenomic annotations, and dosing guidelines. Use when predicting drug response fr…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
CH
Data & ReproDrug Discovery & Cheminformatics

chematagent-drug-discovery

CheMatAgent: chemistry-aware drug design with retrosynthesis and property optimization.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView