artsite/content/projects/breast-cancer.md at 17306539fea5d6045ec5575680bbbb4fa9e1f5d4

mirror of https://github.com/ArthurDanjou/artsite.git synced 2026-01-27 06:54:17 +01:00

Files

Arthur DANJOU ba91408b6d feat: Add personal profile, projects, and skills documentation

- Created index.md for personal introduction and interests.
- Added languages.json to specify language proficiencies.
- Developed profile.md detailing academic background, skills, and career goals.
- Introduced multiple project markdown files showcasing personal and academic projects, including ArtChat, ArtHome, and various data science initiatives.
- Implemented skills.json to outline technical skills and competencies.
- Compiled uses.md to document hardware and software tools utilized for development and personal projects.

2025-12-22 19:39:36 +01:00

2.0 KiB

Raw Blame History

slug, title, type, description, publishedAt, readingTime, status, tags, emoji

slug

title

type

description

publishedAt

readingTime

status

tags

emoji

breast-cancer

Breast Cancer Detection

Academic Project

Prediction of breast cancer presence by comparing several supervised classification models using machine learning techniques.

2025-06-06

Completed

Python

Machine Learning

Data Science

Classification

Healthcare

💉

The project was carried out as part of the Statistical Learning course at Paris-Dauphine PSL University. Its objective is to identify the most effective model for predicting or explaining the presence of breast cancer based on a set of biological and clinical features.

This project aims to develop and evaluate several supervised classification models to predict the presence of breast cancer based on biological features extracted from the Breast Cancer Coimbra dataset, provided by the UCI Machine Learning Repository.

The dataset contains 116 observations divided into two classes:

1: healthy individuals (controls)
2: patients diagnosed with breast cancer

There are 9 explanatory variables, including clinical measurements such as age, insulin levels, leptin, insulin resistance, among others.

The project follows a comparative approach between several algorithms:

Logistic Regression
k-Nearest Neighbors (k-NN)
Naive Bayes
Artificial Neural Network (MLP with a 16-8-1 architecture)

Model evaluation is primarily based on the F1-score, which is more suitable in a medical context where identifying positive cases is crucial. Particular attention was paid to stratified cross-validation and to handling class imbalance, notably through the use of class weights and regularization techniques (L2, early stopping).

This project illustrates a concrete application of data science techniques to a public health issue, while implementing a rigorous methodology for supervised modeling.

You can find the code here: Breast Cancer Detection

2.0 KiB Raw Blame History

2.0 KiB

Raw Blame History