Back to Projects
F
Data Science / Analysis

Fibroid Risk Prediction

ML Engineer (Research)

XGBoostSMOTESHAPScikit-learnPython

About the Project

A research project developing a machine learning model to predict uterine fibroid disease risk using clinical patient data. The model analyzes features like age, BMI, blood pressure, and symptom profiles to classify risk levels. Built on 49 anonymized patient records with severe class imbalance, the system uses SMOTE oversampling and gradient boosting methods with SHAP-based model interpretability for clinical transparency.

Key Highlights

  • Built classification pipeline with XGBoost on imbalanced clinical data
  • Applied SMOTE oversampling to handle severe class imbalance in patient records
  • Implemented SHAP explainability analysis for clinical decision transparency
  • Feature engineering from raw symptom text into structured predictive features
  • 5-fold cross-validation with MinMaxScaler normalization

Technical Challenges

Working with only 49 patient records and severe class imbalance required creative approaches — SMOTE for synthetic oversampling and careful cross-validation to avoid overfitting. SHAP analysis was critical for clinical acceptance since doctors need to understand why a model makes a prediction.