Update README.md

2026-01-14 13:54:06 +01:00 · 2025-06-06 21:58:45 +02:00
parent fd24b9c04c
commit fc397e89f4
1 changed files with 62 additions and 2 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,62 @@
-# breast-cancer-detection
+#  🩺 Early Breast Cancer Detection using Blood Biomarkers
-Binary classification of breast cancer based on biomedical markers using logistic regression and statistical learning techniques.
+
 **Université Paris Dauphine - PSL**  
 Statistical Learning Project — Academic Year 2024–2025  
 **Supervisors:** Prof. Gabriel Turinici, Dr. Laetitia Comminges  
 ## Project Objective
 The goal of this project is to predict the presence of breast cancer using blood-based biomarkers. Several supervised classification models are compared, with a strong focus on **recall** to reflect the clinical need to minimize false negatives (i.e., undiagnosed patients).
 ## Dataset
 - **Source:** Breast Cancer Coimbra Dataset (116 patients, 9 biomarkers)
 - **Target variable:** `Classification` (0 = healthy, 1 = cancer)
 - **Preprocessing steps:**
  - Logarithmic transformation of skewed variables
  - Standardization (Z-score)
  - Stratified train/test split (92/24)
 ## Models Compared
 | Model                        | Recall | F1-score | AUC   |
 |-----------------------------|--------|----------|-------|
 | k-Nearest Neighbors (k=23)  | 0.92   | 0.788    | 0.88  |
 | Neural Network (MLP)        | 0.92   | 0.69     | 0.83  |
 | Logistic Regression (L2)    | 0.69   | 0.75     | 0.79  |
 | Gaussian Naïve Bayes        | 0.58   | 0.68     | 0.72  |
 **Best model for clinical usage (recall priority):** k-NN (k = 23)
 ## Repository Structure
 - `eda_analysis.ipynb` – Data exploration, visualization, and preprocessing
 - `logistic_regression.ipynb` – Logistic regression (basic and optimized via GridSearchCV)
 - `knn.ipynb` – k-Nearest Neighbors with cross-validation and performance tuning
 - `neural_network.ipynb` – Feedforward neural network (MLPClassifier)
 - `naive_bayes.ipynb` – Gaussian Naïve Bayes with log-transformed inputs
 - `svm.ipynb` – Preliminary experiments with SVM (bonus, not included in the final report)
 - `Subject_3_Ouabdesselam_Forest_Durousseau_Danjou_vonSiemens.pdf` – Final report (comprehensive analysis and conclusions)
 - `README.md` – This file
 ## Evaluation Metrics
 - **Recall:** prioritized (to avoid false negatives)
 - **F1-score:** balances precision and recall
 - **ROC & AUC:** overall discriminative ability
 ## Clinical Recommendation
 Assuming false positives are acceptable to avoid missing cancer cases, the k-NN (k = 23) model is preferred. It offers the best compromise between recall and F1-score, and reliably identifies patients at risk.
 ## Authors
 Erwan Ouabdesselam  
 Antonin Durousseau  
 Moritz von Siemens  
 Arthur Danjou  
 Thaïs Forest
 ---
 Project completed as part of the Statistical Learning course in the Master’s program in Applied Mathematics at Université Paris Dauphine - PSL.