Files
artsite/content/projects/sl-breast-cancer.md
Arthur DANJOU 5a4a4f380f feat: Add CLAUDE.md for project guidance and update project files
- Created CLAUDE.md to provide development commands, architecture overview, and environment variables for the Nuxt 3 portfolio website.
- Refactored project pages to remove unused color mappings and improve project filtering logic.
- Updated content.config.ts to enforce stricter project type definitions and added short descriptions for projects.
- Deleted outdated project files and added new projects related to hackathons and academic research.
- Enhanced existing project descriptions with short summaries for better clarity.
2026-02-16 19:48:31 +01:00

2.2 KiB

slug, title, type, description, shortDescription, publishedAt, readingTime, status, tags, icon
slug title type description shortDescription publishedAt readingTime status tags icon
sl-breast-cancer Breast Cancer Detection Academic Project Prediction of breast cancer presence by comparing several supervised classification models using machine learning techniques. A project comparing supervised classification models to predict breast cancer presence using machine learning. 2025-06-06 2 Completed
Python
Machine Learning
Classification
Healthcare
i-ph-heart-half-duotone

This project was carried out as part of the Statistical Learning course at Paris-Dauphine PSL University. The objective is to identify the most effective model for predicting or explaining the presence of breast cancer based on a set of biological and clinical features.

📊 Project Objectives

Develop and evaluate several supervised classification models to predict the presence of breast cancer based on biological features extracted from the Breast Cancer Coimbra dataset, provided by the UCI Machine Learning Repository.

The dataset contains 116 observations divided into two classes:

  • 1: healthy individuals (controls)
  • 2: patients diagnosed with breast cancer

There are 9 explanatory variables, including clinical measurements such as age, insulin levels, leptin, insulin resistance, among others.

🔍 Methodology

The project follows a comparative approach between several algorithms:

  • Logistic Regression
  • k-Nearest Neighbors (k-NN)
  • Naive Bayes
  • Artificial Neural Network (MLP with a 16-8-1 architecture)

Model evaluation is primarily based on the F1-score, which is more suitable in a medical context where identifying positive cases is crucial. Particular attention was paid to stratified cross-validation and to handling class imbalance, notably through the use of class weights and regularization techniques (L2, early stopping).

This project illustrates a concrete application of data science techniques to a public health issue, while implementing a rigorous methodology for supervised modeling.

📚 Resources

You can find the code here: Breast Cancer Detection

📄 Detailed Report