数据与复现蛋白结构与设计FreedomIntelligence/OpenClaw-Medical-Skills数据与复现
CL

claw-semantic-sim

维护者 FreedomIntelligence · 最近更新 2026年4月1日

Semantic Similarity Index for disease research literature using PubMedBERT embeddings.

OpenClawNanoClaw分析处理复现实验claw-semantic-sim⚙️ clawbio pipelinesstructural biology & literaturesemantic

原始来源

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/claw-semantic-sim

维护者
FreedomIntelligence
许可
MIT
最近更新
2026年4月1日

技能摘要

来自 SKILL.md 的关键信息

2 min

核心说明

  • Measure how isolated 或 connected disease research is across global biomedical literature,,使用 PubMedBERT embeddings on PubMed abstracts spanning 175 GBD diseases。
  • Semantic Isolation Index (SII):average cosine distance to k-nearest disease neighbours;higher = more isolated,less connected research。
  • Knowledge Transfer Potential (KTP):cross-disease centroid similarity;higher = more potential ,用于 research spillover。
  • Research 聚类 Coefficient (RCC):within-disease embedding variance;higher = more diverse research approaches。
  • Temporal Semantic Drift:cosine distance between yearly centroids;measures how research focus evolves。

原始文档

SKILL.md 摘录

Why this exists

If you ask ChatGPT to "measure research neglect for diseases," it will:

  • Not know which embedding model to use for biomedical text
  • Hallucinate metrics that sound plausible but have no methodological grounding
  • Skip quality filtering (year coverage, abstract coverage, minimum papers)
  • Not handle MPS acceleration or checkpointed batch processing
  • Produce a single scatter plot with no disease classification

This skill encodes the correct methodological decisions:

  • Uses PubMedBERT (the gold-standard biomedical language model)
  • Fetches from PubMed with exponential backoff and NCBI rate limiting
  • Quality filters: year coverage >= 70%, abstract coverage >= 95%, minimum 50 papers
  • Batch embedding with Apple MPS acceleration and CPU fallback
  • Checkpointed processing (resume after interruption)
  • HDF5 storage with gzip compression and SHA-256 checksums
  • Classification against WHO NTD list and Global South priority diseases
  • Statistical significance testing (Welch's t-test, Cohen's d)

Key Finding

Neglected tropical diseases (NTDs) are significantly more semantically isolated than other conditions (P < 0.001, Cohen's d = 0.8+). They exist in knowledge silos with limited cross-disciplinary research bridges. The 25 most isolated diseases are disproportionately Global South priority conditions.

Demo (works out of the box)

The demo uses pre-computed embeddings and metrics for 175 GBD diseases and generates the full 4-panel figure instantly.

适用场景

  • Use claw-semantic-sim in 科研工作流 aligned ,支持 this subject area。
  • Follow upstream documentation ,用于 full working procedure。

不适用场景

  • Do not rely on this catalog entry alone ,用于 installation 或 maintenance details。

相关技能

相关技能

返回目录
AD
数据与复现蛋白结构与设计

Adaptyv

Adaptyv is a cloud laboratory platform that provides automated protein testing and validation services. Submit protein s…

Claude CodeOpenClaw分析处理
K-Dense-AI/claude-scientific-skills查看
AL
数据与复现蛋白结构与设计

alphafold

Validate protein designs using AlphaFold2 structure prediction. Use this skill when: (1) Validating designed sequences f…

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
AN
数据与复现蛋白结构与设计

antibody-design-agent

Antibody design: epitope mapping, CDR engineering, bispecific construction.

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现蛋白结构与设计

bindcraft

End-to-end binder design using BindCraft hallucination. Use this skill when: (1) Designing protein binders with built-in…

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看