- Created a new article on "Understanding AI Agents, LLMs, and RAG" detailing the synergy between AI agents, LLMs, and Retrieval-Augmented Generation. - Added an introductory article on "What is Machine Learning?" covering types, model selection, workflow, and evaluation metrics. chore: setup ESLint and Nuxt configuration - Added ESLint configuration for code quality. - Initialized Nuxt configuration with various modules and settings for the application. chore: initialize package.json and TypeScript configuration - Created package.json for dependency management and scripts. - Added TypeScript configuration for the project. feat: implement API endpoints for activity and stats - Developed API endpoint to fetch user activity from Lanyard. - Created a stats endpoint to retrieve Wakatime coding statistics with caching. feat: add various assets and images - Included multiple images and assets for articles and projects. - Added placeholder files to maintain directory structure. refactor: define types for chat, lanyard, time, and wakatime - Created TypeScript types for chat messages, Lanyard activities, time formatting, and Wakatime statistics.
1.9 KiB
slug, title, description, publishedAt, readingTime, tags
| slug | title | description | publishedAt | readingTime | tags | |||
|---|---|---|---|---|---|---|---|---|
| breast-cancer | 💉 Breast Cancer Detection | Prediction of breast cancer presence by comparing several supervised classification models. | 2025/06/06 | 2 |
|
The project was carried out as part of the Statistical Learning course at Paris-Dauphine PSL University. Its objective is to identify the most effective model for predicting or explaining the presence of breast cancer based on a set of biological and clinical features.
This project aims to develop and evaluate several supervised classification models to predict the presence of breast cancer based on biological features extracted from the Breast Cancer Coimbra dataset, provided by the UCI Machine Learning Repository.
The dataset contains 116 observations divided into two classes:
-
1: healthy individuals (controls)
-
2: patients diagnosed with breast cancer
There are 9 explanatory variables, including clinical measurements such as age, insulin levels, leptin, insulin resistance, among others.
The project follows a comparative approach between several algorithms:
-
Logistic Regression
-
k-Nearest Neighbors (k-NN)
-
Naive Bayes
-
Artificial Neural Network (MLP with a 16-8-1 architecture)
Model evaluation is primarily based on the F1-score, which is more suitable in a medical context where identifying positive cases is crucial. Particular attention was paid to stratified cross-validation and to handling class imbalance, notably through the use of class weights and regularization techniques (L2, early stopping).
This project illustrates a concrete application of data science techniques to a public health issue, while implementing a rigorous methodology for supervised modeling.
You can find the code here: Breast Cancer Detection