Data & ReproBioinformatics & GenomicsFreedomIntelligence/OpenClaw-Medical-SkillsData & Reproduction

bio-machine-learning-model-validation

Maintainer FreedomIntelligence · Last updated April 1, 2026

Cross-validation, AUC-ROC, calibration, and permutation testing.

OpenClawNanoClawAnalysisReproductionbio-machine-learning-model-validation🧠 bioos extended suitebioos extended bioinformatics suitecross

Original source

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-machine-learning-model-validation

Maintainer: FreedomIntelligence
License: MIT
Last updated: April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) inner_cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42).
nested_scores = [] for train_idx, test_idx in outer_cv.split(X, y): X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] y_train, y_test = y[train_idx], y[test_idx].
grid = GridSearchCV(pipe, param_grid, cv=inner_cv, scoring='roc_auc', n_jobs=-1) grid.fit(X_train, y_train) score = grid.score(X_test, y_test) nested_scores.append(score).
print(f'Nested CV AUC: {np.mean(nested_scores):.3f} +/- {np.std(nested_scores):.3f}').
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) scores = cross_val_score(pipe, X, y, cv=cv, scoring='roc_auc') print(f'CV AUC: {scores.mean():.3f} +/- {scores.std():.3f}').

Source Doc

Excerpt From SKILL.md

Why Nested CV Matters

Simple train/test splits overestimate performance on small omics datasets. Nested CV provides unbiased estimates by separating hyperparameter tuning from performance evaluation.

Nested Cross-Validation

from sklearn.model_selection import cross_val_score, StratifiedKFold, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(random_state=42))
])

param_grid = {
    'clf__n_estimators': [50, 100, 200],
    'clf__max_depth': [5, 10, None]
}

## Stratified K-Fold

```python
from sklearn.model_selection import StratifiedKFold, cross_val_score

Use cases

Use bio-machine-learning-model-validation for genomics and bioinformatics workflows.
Apply bio-machine-learning-model-validation to sequencing, variant, or omics analysis tasks.