Data & ReproBioinformatics & GenomicsFreedomIntelligence/OpenClaw-Medical-SkillsData & Reproduction
BI

bio-machine-learning-model-validation

Maintainer FreedomIntelligence · Last updated April 1, 2026

Cross-validation, AUC-ROC, calibration, and permutation testing.

OpenClawNanoClawAnalysisReproductionbio-machine-learning-model-validation🧠 bioos extended suitebioos extended bioinformatics suitecross

Original source

FreedomIntelligence/OpenClaw-Medical-Skills

https://github.com/FreedomIntelligence/OpenClaw-Medical-Skills/tree/main/skills/bio-machine-learning-model-validation

Maintainer
FreedomIntelligence
License
MIT
Last updated
April 1, 2026

Skill Snapshot

Key Details From SKILL.md

2 min

Key Notes

  • outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) inner_cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=42).
  • nested_scores = [] for train_idx, test_idx in outer_cv.split(X, y): X_train, X_test = X.iloc[train_idx], X.iloc[test_idx] y_train, y_test = y[train_idx], y[test_idx].
  • grid = GridSearchCV(pipe, param_grid, cv=inner_cv, scoring='roc_auc', n_jobs=-1) grid.fit(X_train, y_train) score = grid.score(X_test, y_test) nested_scores.append(score).
  • print(f'Nested CV AUC: {np.mean(nested_scores):.3f} +/- {np.std(nested_scores):.3f}').
  • cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) scores = cross_val_score(pipe, X, y, cv=cv, scoring='roc_auc') print(f'CV AUC: {scores.mean():.3f} +/- {scores.std():.3f}').

Source Doc

Excerpt From SKILL.md

Why Nested CV Matters

Simple train/test splits overestimate performance on small omics datasets. Nested CV provides unbiased estimates by separating hyperparameter tuning from performance evaluation.

Nested Cross-Validation

from sklearn.model_selection import cross_val_score, StratifiedKFold, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import numpy as np

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('clf', RandomForestClassifier(random_state=42))
])

param_grid = {
    'clf__n_estimators': [50, 100, 200],
    'clf__max_depth': [5, 10, None]
}

## Stratified K-Fold

```python
from sklearn.model_selection import StratifiedKFold, cross_val_score

Use cases

  • Use bio-machine-learning-model-validation for genomics and bioinformatics workflows.
  • Apply bio-machine-learning-model-validation to sequencing, variant, or omics analysis tasks.

Not for

  • Do not rely on this catalog entry alone for installation or maintenance details.

Upstream Related Skills

  • machine-learning/omics-classifiers - Model training
  • experimental-design/multiple-testing - Multiple hypothesis correction
  • machine-learning/biomarker-discovery - Feature selection within CV

Related skills

Related skills

Back to directory
AG
Data & ReproBioinformatics & Genomics

agent-browser

Browse the web for any task — research topics, read articles, interact with web apps, fill forms, take screenshots, extract data, and test w…

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
AL
Data & ReproBioinformatics & Genomics

alpha-vantage

Access 20+ years of global financial data: equities, options, forex, crypto, commodities, economic indicators, and 50+ technical indicators.

Claude CodeAnalysis
K-Dense-AI/claude-scientific-skillsView
BI
Data & ReproBioinformatics & Genomics

bio-alignment-filtering

Filter alignments by flag, quality, region, or paired status.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView
BI
Data & ReproBioinformatics & Genomics

bio-alignment-indexing

Index BAM/CRAM files with samtools index for random access.

OpenClawNanoClawAnalysis
FreedomIntelligence/OpenClaw-Medical-SkillsView