数据与复现生物信息与基因组学FreedomIntelligence/OpenClaw-Medical-Skills数据与复现
BI

bio-machine-learning-model-validation

维护者 FreedomIntelligence · 最近更新 2026年4月1日

Cross-validation, AUC-ROC, calibration, and permutation testing.

OpenClawNanoClaw分析处理复现实验bio-machine-learning-model-validation🧠 bioos extended suitebioos extended bioinformatics suitecross

原始来源

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-machine-learning-model-validation

维护者
FreedomIntelligence
许可
MIT
最近更新
2026年4月1日

技能摘要

来自 SKILL.md 的关键信息

2 min

核心说明

  • outer_cv = StratifiedKFold(n_splits=5,shuffle=True,random_state=42) inner_cv = StratifiedKFold(n_splits=3,shuffle=True,random_state=42)。
  • nested_scores = [] ,用于 train_idx,test_idx in outer_cv.split(X,y):X_train,X_test = X.iloc[train_idx],X.iloc[test_idx] y_train,y_test = y[train_idx],y[test_idx]。
  • grid = GridSearchCV(pipe,param_grid,cv=inner_cv,scoring='roc_auc',n_jobs=-1) grid.fit(X_train,y_train) score = grid.score(X_test,y_test) nested_scores.append(score)。
  • print(f'Nested CV AUC:{np.mean(nested_scores):.3f} +/- {np.std(nested_scores):.3f}')。
  • cv = StratifiedKFold(n_splits=5,shuffle=True,random_state=42) scores = cross_val_score(pipe,X,y,cv=cv,scoring='roc_auc') print(f'CV AUC:{scores.mean():.3f} +/- {scores.std():.3f}')。

原始文档

SKILL.md 摘录

Why Nested CV Matters

Simple train/test splits overestimate performance on small omics datasets. Nested CV provides unbiased estimates by separating hyperparameter tuning from performance evaluation.

Nested Cross-Validation

from sklearn.model_selection import cross_val_score, StratifiedKFold, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(random_state=42))
])

param_grid = {
    'clf__n_estimators': [50, 100, 200],
    'clf__max_depth': [5, 10, None]
}

## Stratified K-Fold

```python
from sklearn.model_selection import StratifiedKFold, cross_val_score

适用场景

  • Use bio-machine-learning-model-validation ,用于 genomics 、 bioinformatics workflows。
  • Apply bio-machine-learning-model-validation to sequencing,variant,或 omics analysis tasks。

不适用场景

  • Do not rely on this catalog entry alone ,用于 installation 或 maintenance details。

上游相关技能

  • machine-learning/omics-classifiers - Model training
  • experimental-design/multiple-testing - Multiple hypothesis correction
  • machine-learning/biomarker-discovery - Feature selection within CV

相关技能

相关技能

返回目录
AG
数据与复现生物信息与基因组学

agent-browser

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extr…

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
AL
数据与复现生物信息与基因组学

alpha-vantage

Access 20+ years of global financial data: equities, options, forex, crypto, commodities, economic indicators, and 50+ t…

Claude Code分析处理
K-Dense-AI/claude-scientific-skills查看
BI
数据与复现生物信息与基因组学

bio-alignment-filtering

Filter alignments by flag, quality, region, or paired status.

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看
BI
数据与复现生物信息与基因组学

bio-alignment-indexing

Index BAM/CRAM files with samtools index for random access.

OpenClawNanoClaw分析处理
FreedomIntelligence/OpenClaw-Medical-Skills查看