mirror of
https://github.com/ArthurDanjou/ArtStudies.git
synced 2026-01-14 18:59:59 +01:00
3670 lines
95 KiB
Plaintext
3670 lines
95 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8750d15b",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Cours 3 : Machine Learning - Algorithmes supervisés (1/2)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f7c08ae5",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Préambule"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ec7ecb4b",
|
|
"metadata": {},
|
|
"source": [
|
|
"Les objectifs de cette séance (3h) sont :\n",
|
|
"* Préparation des bases de modélisation (sampling)\n",
|
|
"* Mettre en application un modèle supervisé simple.\n",
|
|
"* Construire un modèle de Machine Learning (cross-validation et hyperparamétrage) pour résoudre un problème de régression\n",
|
|
"* Analyser les performances du modèle"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4e99c600",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Préparation du workspace"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c1b01045",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Import de librairies "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 157,
|
|
"id": "97d58527",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Données\n",
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"#Graphiques\n",
|
|
"import seaborn as sns\n",
|
|
"\n",
|
|
"sns.set()\n",
|
|
"import plotly.express as px\n",
|
|
"import plotly.graph_objects as gp\n",
|
|
"import sklearn.preprocessing as preproc\n",
|
|
"\n",
|
|
"#Statistiques\n",
|
|
"from scipy.stats import chi2_contingency\n",
|
|
"from sklearn import metrics\n",
|
|
"\n",
|
|
"# Machine Learning\n",
|
|
"from sklearn.cluster import KMeans\n",
|
|
"import sklearn.metrics as metrics\n",
|
|
"from sklearn.ensemble import RandomForestRegressor\n",
|
|
"from sklearn.model_selection import KFold, train_test_split\n",
|
|
"from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "06153286",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Définition des fonctions "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c67db932",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "985e4e97",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Constantes"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 158,
|
|
"id": "c9597b48",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"input_path = \"./1_inputs\"\n",
|
|
"output_path = \"./2_outputs\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b2b035d2",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Import des données"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 159,
|
|
"id": "8051b5f4",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"path =input_path + '/base_retraitee.csv'\n",
|
|
"data_retraitee = pd.read_csv(path,sep=\",\",decimal=\".\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a2578ba1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Algorithme supervisé : CART "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "aaa0b27d",
|
|
"metadata": {},
|
|
"source": [
|
|
"Dans cette partie l'objectif est de construire un modèle simple (algorithme CART) afin de voir les différentes étapes nécessaire au lancement d'un modèle\n",
|
|
"Nous modéliserons directement le coût des sinistres. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a0458a05",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Construction du modèle"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b3715c37",
|
|
"metadata": {},
|
|
"source": [
|
|
"La première étape est de calculer les côut moyen de chaque sinistre (target ou variable réponse). Cette variable sera la variable à prédire en fonction des variables explicatives."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 160,
|
|
"id": "c427a4b8",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"(824, 14)"
|
|
]
|
|
},
|
|
"execution_count": 160,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"data_model = data_retraitee.copy()\n",
|
|
"\n",
|
|
"# Filtre pour ne garder que les lignes qui ont un sinistre (NB > 0)\n",
|
|
"data_model = data_model[data_model['NB'] > 0]\n",
|
|
"\n",
|
|
"# Calcul du cout moyen \"théorique\" des sinistres\n",
|
|
"data_model[\"CM\"] = (data_model[\"CHARGE\"] / data_model[\"NB\"])\n",
|
|
"data_model = data_model.drop(['CHARGE', 'NB', \"EXPO\"], axis=1)\n",
|
|
"data_model.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e3e85088",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** construisez les statistiques descriptives de la base utilisée."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 161,
|
|
"id": "c8fd3ee1",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
|
|
"columns": [
|
|
{
|
|
"name": "index",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "ANNEE_CTR",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "CONTRAT_ANCIENNETE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "FREQUENCE_PAIEMENT_COTISATION",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "GROUPE_KM",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "ZONE_RISQUE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "AGE_ASSURE_PRINCIPAL",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "GENRE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "DEUXIEME_CONDUCTEUR",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "ANCIENNETE_PERMIS",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "ANNEE_CONSTRUCTION",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "ENERGIE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "EQUIPEMENT_SECURITE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "VALEUR_DU_BIEN",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "CM",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
}
|
|
],
|
|
"ref": "e80a8f38-8160-41fb-bbfa-ae1f7b39de11",
|
|
"rows": [
|
|
[
|
|
"count",
|
|
"824.0",
|
|
"824",
|
|
"824",
|
|
"824",
|
|
"824",
|
|
"824.0",
|
|
"824",
|
|
"824",
|
|
"824.0",
|
|
"824.0",
|
|
"824",
|
|
"824",
|
|
"824",
|
|
"824.0"
|
|
],
|
|
[
|
|
"unique",
|
|
null,
|
|
"5",
|
|
"3",
|
|
"4",
|
|
"14",
|
|
null,
|
|
"2",
|
|
"2",
|
|
null,
|
|
null,
|
|
"3",
|
|
"2",
|
|
"6",
|
|
null
|
|
],
|
|
[
|
|
"top",
|
|
null,
|
|
"(0,1]",
|
|
"MENSUEL",
|
|
"[0;20000[",
|
|
"C",
|
|
null,
|
|
"M",
|
|
"False",
|
|
null,
|
|
null,
|
|
"ESSENCE",
|
|
"FAUX",
|
|
"[10000;15000[",
|
|
null
|
|
],
|
|
[
|
|
"freq",
|
|
null,
|
|
"297",
|
|
"398",
|
|
"391",
|
|
"269",
|
|
null,
|
|
"483",
|
|
"663",
|
|
null,
|
|
null,
|
|
"413",
|
|
"517",
|
|
"213",
|
|
null
|
|
],
|
|
[
|
|
"mean",
|
|
"2018.384708737864",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"44.383495145631066",
|
|
null,
|
|
null,
|
|
"35.68810679611651",
|
|
"2015.2123786407767",
|
|
null,
|
|
null,
|
|
null,
|
|
"4246.01697815534"
|
|
],
|
|
[
|
|
"std",
|
|
"1.515832735580178",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"13.808216667998865",
|
|
null,
|
|
null,
|
|
"19.370620845496358",
|
|
"3.1637823115731556",
|
|
null,
|
|
null,
|
|
null,
|
|
"6869.61691660173"
|
|
],
|
|
[
|
|
"min",
|
|
"2016.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"19.0",
|
|
null,
|
|
null,
|
|
"1.0",
|
|
"1998.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"7.5"
|
|
],
|
|
[
|
|
"25%",
|
|
"2017.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"34.0",
|
|
null,
|
|
null,
|
|
"18.0",
|
|
"2014.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"1159.96125"
|
|
],
|
|
[
|
|
"50%",
|
|
"2018.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"43.0",
|
|
null,
|
|
null,
|
|
"35.0",
|
|
"2016.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"2541.6499999999996"
|
|
],
|
|
[
|
|
"75%",
|
|
"2020.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"53.0",
|
|
null,
|
|
null,
|
|
"53.0",
|
|
"2017.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"4193.797500000001"
|
|
],
|
|
[
|
|
"max",
|
|
"2021.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"94.0",
|
|
null,
|
|
null,
|
|
"70.0",
|
|
"2021.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"83421.85"
|
|
]
|
|
],
|
|
"shape": {
|
|
"columns": 14,
|
|
"rows": 11
|
|
}
|
|
},
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>ANNEE_CTR</th>\n",
|
|
" <th>CONTRAT_ANCIENNETE</th>\n",
|
|
" <th>FREQUENCE_PAIEMENT_COTISATION</th>\n",
|
|
" <th>GROUPE_KM</th>\n",
|
|
" <th>ZONE_RISQUE</th>\n",
|
|
" <th>AGE_ASSURE_PRINCIPAL</th>\n",
|
|
" <th>GENRE</th>\n",
|
|
" <th>DEUXIEME_CONDUCTEUR</th>\n",
|
|
" <th>ANCIENNETE_PERMIS</th>\n",
|
|
" <th>ANNEE_CONSTRUCTION</th>\n",
|
|
" <th>ENERGIE</th>\n",
|
|
" <th>EQUIPEMENT_SECURITE</th>\n",
|
|
" <th>VALEUR_DU_BIEN</th>\n",
|
|
" <th>CM</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>count</th>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>unique</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>top</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>(0,1]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[0;20000[</td>\n",
|
|
" <td>C</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>ESSENCE</td>\n",
|
|
" <td>FAUX</td>\n",
|
|
" <td>[10000;15000[</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>freq</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>297</td>\n",
|
|
" <td>398</td>\n",
|
|
" <td>391</td>\n",
|
|
" <td>269</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>483</td>\n",
|
|
" <td>663</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>413</td>\n",
|
|
" <td>517</td>\n",
|
|
" <td>213</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>mean</th>\n",
|
|
" <td>2018.384709</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>44.383495</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>35.688107</td>\n",
|
|
" <td>2015.212379</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4246.016978</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>std</th>\n",
|
|
" <td>1.515833</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>13.808217</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>19.370621</td>\n",
|
|
" <td>3.163782</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>6869.616917</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>min</th>\n",
|
|
" <td>2016.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>19.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>1998.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>7.500000</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>25%</th>\n",
|
|
" <td>2017.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>34.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>18.000000</td>\n",
|
|
" <td>2014.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>1159.961250</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>50%</th>\n",
|
|
" <td>2018.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>43.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>35.000000</td>\n",
|
|
" <td>2016.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>2541.650000</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>75%</th>\n",
|
|
" <td>2020.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>53.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>53.000000</td>\n",
|
|
" <td>2017.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4193.797500</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>max</th>\n",
|
|
" <td>2021.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>94.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>70.000000</td>\n",
|
|
" <td>2021.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>83421.850000</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" ANNEE_CTR CONTRAT_ANCIENNETE FREQUENCE_PAIEMENT_COTISATION \\\n",
|
|
"count 824.000000 824 824 \n",
|
|
"unique NaN 5 3 \n",
|
|
"top NaN (0,1] MENSUEL \n",
|
|
"freq NaN 297 398 \n",
|
|
"mean 2018.384709 NaN NaN \n",
|
|
"std 1.515833 NaN NaN \n",
|
|
"min 2016.000000 NaN NaN \n",
|
|
"25% 2017.000000 NaN NaN \n",
|
|
"50% 2018.000000 NaN NaN \n",
|
|
"75% 2020.000000 NaN NaN \n",
|
|
"max 2021.000000 NaN NaN \n",
|
|
"\n",
|
|
" GROUPE_KM ZONE_RISQUE AGE_ASSURE_PRINCIPAL GENRE DEUXIEME_CONDUCTEUR \\\n",
|
|
"count 824 824 824.000000 824 824 \n",
|
|
"unique 4 14 NaN 2 2 \n",
|
|
"top [0;20000[ C NaN M False \n",
|
|
"freq 391 269 NaN 483 663 \n",
|
|
"mean NaN NaN 44.383495 NaN NaN \n",
|
|
"std NaN NaN 13.808217 NaN NaN \n",
|
|
"min NaN NaN 19.000000 NaN NaN \n",
|
|
"25% NaN NaN 34.000000 NaN NaN \n",
|
|
"50% NaN NaN 43.000000 NaN NaN \n",
|
|
"75% NaN NaN 53.000000 NaN NaN \n",
|
|
"max NaN NaN 94.000000 NaN NaN \n",
|
|
"\n",
|
|
" ANCIENNETE_PERMIS ANNEE_CONSTRUCTION ENERGIE EQUIPEMENT_SECURITE \\\n",
|
|
"count 824.000000 824.000000 824 824 \n",
|
|
"unique NaN NaN 3 2 \n",
|
|
"top NaN NaN ESSENCE FAUX \n",
|
|
"freq NaN NaN 413 517 \n",
|
|
"mean 35.688107 2015.212379 NaN NaN \n",
|
|
"std 19.370621 3.163782 NaN NaN \n",
|
|
"min 1.000000 1998.000000 NaN NaN \n",
|
|
"25% 18.000000 2014.000000 NaN NaN \n",
|
|
"50% 35.000000 2016.000000 NaN NaN \n",
|
|
"75% 53.000000 2017.000000 NaN NaN \n",
|
|
"max 70.000000 2021.000000 NaN NaN \n",
|
|
"\n",
|
|
" VALEUR_DU_BIEN CM \n",
|
|
"count 824 824.000000 \n",
|
|
"unique 6 NaN \n",
|
|
"top [10000;15000[ NaN \n",
|
|
"freq 213 NaN \n",
|
|
"mean NaN 4246.016978 \n",
|
|
"std NaN 6869.616917 \n",
|
|
"min NaN 7.500000 \n",
|
|
"25% NaN 1159.961250 \n",
|
|
"50% NaN 2541.650000 \n",
|
|
"75% NaN 4193.797500 \n",
|
|
"max NaN 83421.850000 "
|
|
]
|
|
},
|
|
"execution_count": 161,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"data_model.describe(include='all')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "92d6156a",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Etude des corrélations parmi les variables explicatives"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d7327570",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** Selon vous, pourquoi faut-il s'intéresser à la corrélation des variables ? "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "475e141b",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Réponse*: Pour avoir un modèle qui fit mieux + déterminer un potentiel effet de causalité entre features et target + sélectionner certaines variables."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 162,
|
|
"id": "1b156435",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"(824, 13)"
|
|
]
|
|
},
|
|
"execution_count": 162,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"data_set = data_model.drop(\"CM\", axis=1)\n",
|
|
"data_set.shape"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 163,
|
|
"id": "0ef0fcc0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Séparation en variables qualitatives ou catégorielles\n",
|
|
"variables_na = []\n",
|
|
"variables_numeriques = []\n",
|
|
"variables_01 = []\n",
|
|
"variables_categorielles = []\n",
|
|
"for colu in data_set.columns:\n",
|
|
" if True in data_set[colu].isna().unique() :\n",
|
|
" variables_na.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" if str(data_set[colu].dtypes) in [\"int32\",\"int64\",\"float64\"]:\n",
|
|
" if len(data_set[colu].unique())==2 :\n",
|
|
" variables_categorielles.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" variables_numeriques.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" if len(data_set[colu].unique())==2 :\n",
|
|
" variables_categorielles.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" variables_categorielles.append(data_set[colu])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e82fcade",
|
|
"metadata": {},
|
|
"source": [
|
|
"##### Corrélation des variables catégorielles :"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 164,
|
|
"id": "e130aae5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"vars_categorielles = pd.DataFrame(variables_categorielles).transpose()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 165,
|
|
"id": "c39e2ad0",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.plotly.v1+json": {
|
|
"config": {
|
|
"plotlyServerURL": "https://plot.ly"
|
|
},
|
|
"data": [
|
|
{
|
|
"coloraxis": "coloraxis",
|
|
"hovertemplate": "x: %{x}<br>y: %{y}<br>color: %{z}<extra></extra>",
|
|
"name": "0",
|
|
"texttemplate": "%{z:.2f}",
|
|
"type": "heatmap",
|
|
"x": [
|
|
"CONTRAT_ANCIENNETE",
|
|
"FREQUENCE_PAIEMENT_COTISATION",
|
|
"GROUPE_KM",
|
|
"ZONE_RISQUE",
|
|
"GENRE",
|
|
"DEUXIEME_CONDUCTEUR",
|
|
"ENERGIE",
|
|
"EQUIPEMENT_SECURITE",
|
|
"VALEUR_DU_BIEN"
|
|
],
|
|
"xaxis": "x",
|
|
"y": [
|
|
"CONTRAT_ANCIENNETE",
|
|
"FREQUENCE_PAIEMENT_COTISATION",
|
|
"GROUPE_KM",
|
|
"ZONE_RISQUE",
|
|
"GENRE",
|
|
"DEUXIEME_CONDUCTEUR",
|
|
"ENERGIE",
|
|
"EQUIPEMENT_SECURITE",
|
|
"VALEUR_DU_BIEN"
|
|
],
|
|
"yaxis": "y",
|
|
"z": {
|
|
"bdata": "AAAAAAAA8D8AAAAAAAAAACoCGzzITrA/jS6+t390sj/aAKYMJa2eP5RMqUS3uZs/ytNpsBVXkz8AAAAAAAAAAJsekiMPM4I/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAAAAAAAAAAAABgNwyfFOK3Px3tLvtk1qI/VTS7w965nj/DbHQwNU6sP6xOyIjBVMQ/KwIbPMhOsD8AAAAAAAAAAAAAAAAAAPA/JGwWgOwjwz/Y12crRVC2P1AU8aUpk3Y/tZ25v8HgyT9++YWBDBq6PxMKBP1KAMk/ki6+t390sj8AAAAAAAAAACNsFoDsI8M/AAAAAAAA8D8AAAAAAAAAAOzpAHMW1bU/OToUIB5twT+gpoD1ZjrEP/5ATjN+vpg/0gCmDCWtnj9gNwyfFOK3P9jXZytFULY/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAA2p0N4q1bwz/UsLoqS0u5PxFqf8IHB9E/lEypRLe5mz8d7S77ZNaiP1AU8aUpk3Y/7OkAcxbVtT8AAAAAAAAAAAAAAAAAAPA/AAAAAAAAAAAAAAAAAAAAAOYlMsJ0brs/ytNpsBVXkz9RNLvD3rmeP7edub/B4Mk/OjoUIB5twT/anQ3irVvDPwAAAAAAAAAAAAAAAAAA8D8nEbUEUmnAP+SA2g/TvNE/AAAAAAAAAADDbHQwNU6sP335hYEMGro/oKaA9WY6xD/UsLoqS0u5PwAAAAAAAAAAJxG1BFJpwD8AAAAAAADwP+fmCf6XRco/mx6SIw8zgj+rTsiIwVTEPxIKBP1KAMk//kBOM36+mD8Ran/CBwfRP+YlMsJ0brs/5YDaD9O80T/n5gn+l0XKPwAAAAAAAPA/",
|
|
"dtype": "f8",
|
|
"shape": "9, 9"
|
|
}
|
|
}
|
|
],
|
|
"layout": {
|
|
"coloraxis": {
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"rgb(5,48,97)"
|
|
],
|
|
[
|
|
0.1,
|
|
"rgb(33,102,172)"
|
|
],
|
|
[
|
|
0.2,
|
|
"rgb(67,147,195)"
|
|
],
|
|
[
|
|
0.3,
|
|
"rgb(146,197,222)"
|
|
],
|
|
[
|
|
0.4,
|
|
"rgb(209,229,240)"
|
|
],
|
|
[
|
|
0.5,
|
|
"rgb(247,247,247)"
|
|
],
|
|
[
|
|
0.6,
|
|
"rgb(253,219,199)"
|
|
],
|
|
[
|
|
0.7,
|
|
"rgb(244,165,130)"
|
|
],
|
|
[
|
|
0.8,
|
|
"rgb(214,96,77)"
|
|
],
|
|
[
|
|
0.9,
|
|
"rgb(178,24,43)"
|
|
],
|
|
[
|
|
1,
|
|
"rgb(103,0,31)"
|
|
]
|
|
]
|
|
},
|
|
"template": {
|
|
"data": {
|
|
"bar": [
|
|
{
|
|
"error_x": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"error_y": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "bar"
|
|
}
|
|
],
|
|
"barpolar": [
|
|
{
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "barpolar"
|
|
}
|
|
],
|
|
"carpet": [
|
|
{
|
|
"aaxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"baxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"type": "carpet"
|
|
}
|
|
],
|
|
"choropleth": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "choropleth"
|
|
}
|
|
],
|
|
"contour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "contour"
|
|
}
|
|
],
|
|
"contourcarpet": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "contourcarpet"
|
|
}
|
|
],
|
|
"heatmap": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "heatmap"
|
|
}
|
|
],
|
|
"histogram": [
|
|
{
|
|
"marker": {
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "histogram"
|
|
}
|
|
],
|
|
"histogram2d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2d"
|
|
}
|
|
],
|
|
"histogram2dcontour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2dcontour"
|
|
}
|
|
],
|
|
"mesh3d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "mesh3d"
|
|
}
|
|
],
|
|
"parcoords": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "parcoords"
|
|
}
|
|
],
|
|
"pie": [
|
|
{
|
|
"automargin": true,
|
|
"type": "pie"
|
|
}
|
|
],
|
|
"scatter": [
|
|
{
|
|
"fillpattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
},
|
|
"type": "scatter"
|
|
}
|
|
],
|
|
"scatter3d": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatter3d"
|
|
}
|
|
],
|
|
"scattercarpet": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattercarpet"
|
|
}
|
|
],
|
|
"scattergeo": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergeo"
|
|
}
|
|
],
|
|
"scattergl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergl"
|
|
}
|
|
],
|
|
"scattermap": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermap"
|
|
}
|
|
],
|
|
"scattermapbox": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermapbox"
|
|
}
|
|
],
|
|
"scatterpolar": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolar"
|
|
}
|
|
],
|
|
"scatterpolargl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolargl"
|
|
}
|
|
],
|
|
"scatterternary": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterternary"
|
|
}
|
|
],
|
|
"surface": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "surface"
|
|
}
|
|
],
|
|
"table": [
|
|
{
|
|
"cells": {
|
|
"fill": {
|
|
"color": "#EBF0F8"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"header": {
|
|
"fill": {
|
|
"color": "#C8D4E3"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"type": "table"
|
|
}
|
|
]
|
|
},
|
|
"layout": {
|
|
"annotationdefaults": {
|
|
"arrowcolor": "#2a3f5f",
|
|
"arrowhead": 0,
|
|
"arrowwidth": 1
|
|
},
|
|
"autotypenumbers": "strict",
|
|
"coloraxis": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"colorscale": {
|
|
"diverging": [
|
|
[
|
|
0,
|
|
"#8e0152"
|
|
],
|
|
[
|
|
0.1,
|
|
"#c51b7d"
|
|
],
|
|
[
|
|
0.2,
|
|
"#de77ae"
|
|
],
|
|
[
|
|
0.3,
|
|
"#f1b6da"
|
|
],
|
|
[
|
|
0.4,
|
|
"#fde0ef"
|
|
],
|
|
[
|
|
0.5,
|
|
"#f7f7f7"
|
|
],
|
|
[
|
|
0.6,
|
|
"#e6f5d0"
|
|
],
|
|
[
|
|
0.7,
|
|
"#b8e186"
|
|
],
|
|
[
|
|
0.8,
|
|
"#7fbc41"
|
|
],
|
|
[
|
|
0.9,
|
|
"#4d9221"
|
|
],
|
|
[
|
|
1,
|
|
"#276419"
|
|
]
|
|
],
|
|
"sequential": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"sequentialminus": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
]
|
|
},
|
|
"colorway": [
|
|
"#636efa",
|
|
"#EF553B",
|
|
"#00cc96",
|
|
"#ab63fa",
|
|
"#FFA15A",
|
|
"#19d3f3",
|
|
"#FF6692",
|
|
"#B6E880",
|
|
"#FF97FF",
|
|
"#FECB52"
|
|
],
|
|
"font": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"geo": {
|
|
"bgcolor": "white",
|
|
"lakecolor": "white",
|
|
"landcolor": "#E5ECF6",
|
|
"showlakes": true,
|
|
"showland": true,
|
|
"subunitcolor": "white"
|
|
},
|
|
"hoverlabel": {
|
|
"align": "left"
|
|
},
|
|
"hovermode": "closest",
|
|
"mapbox": {
|
|
"style": "light"
|
|
},
|
|
"paper_bgcolor": "white",
|
|
"plot_bgcolor": "#E5ECF6",
|
|
"polar": {
|
|
"angularaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"radialaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"scene": {
|
|
"xaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"yaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"zaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
}
|
|
},
|
|
"shapedefaults": {
|
|
"line": {
|
|
"color": "#2a3f5f"
|
|
}
|
|
},
|
|
"ternary": {
|
|
"aaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"baxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"caxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"title": {
|
|
"x": 0.05
|
|
},
|
|
"xaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
},
|
|
"yaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
}
|
|
}
|
|
},
|
|
"title": {
|
|
"text": "Matrice de corrélation des variables catégorielles (V de Cramér)"
|
|
},
|
|
"xaxis": {
|
|
"anchor": "y",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
},
|
|
"yaxis": {
|
|
"anchor": "x",
|
|
"autorange": "reversed",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Matrice de corrélation pour les variables catégorielles (V de Cramér)\n",
|
|
"def cramers_v(confusion_matrix):\n",
|
|
" \"\"\"Calcule le V de Cramér à partir d'une matrice de contingence\"\"\"\n",
|
|
" chi2 = chi2_contingency(confusion_matrix)[0]\n",
|
|
" n = confusion_matrix.sum().sum()\n",
|
|
" phi2 = chi2 / n\n",
|
|
" r, k = confusion_matrix.shape\n",
|
|
" phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))\n",
|
|
" rcorr = r - ((r-1)**2)/(n-1)\n",
|
|
" kcorr = k - ((k-1)**2)/(n-1)\n",
|
|
" return np.sqrt(phi2corr / min((kcorr-1), (rcorr-1)))\n",
|
|
"\n",
|
|
"# Créer la matrice de corrélation\n",
|
|
"categorical_cols = vars_categorielles.columns\n",
|
|
"n_vars = len(categorical_cols)\n",
|
|
"cramers_matrix = np.zeros((n_vars, n_vars))\n",
|
|
"\n",
|
|
"for i, col1 in enumerate(categorical_cols):\n",
|
|
" for j, col2 in enumerate(categorical_cols):\n",
|
|
" if i == j:\n",
|
|
" cramers_matrix[i, j] = 1.0\n",
|
|
" else:\n",
|
|
" confusion_matrix = pd.crosstab(vars_categorielles[col1], vars_categorielles[col2])\n",
|
|
" cramers_matrix[i, j] = cramers_v(confusion_matrix)\n",
|
|
"\n",
|
|
"# Créer le DataFrame de corrélation\n",
|
|
"correlation_cat = pd.DataFrame(cramers_matrix,\n",
|
|
" index=categorical_cols,\n",
|
|
" columns=categorical_cols)\n",
|
|
"\n",
|
|
"# Visualiser avec Plotly\n",
|
|
"fig = px.imshow(correlation_cat,\n",
|
|
" text_auto='.2f', # type: ignore\n",
|
|
" aspect=\"auto\",\n",
|
|
" color_continuous_scale='RdBu_r',\n",
|
|
" title='Matrice de corrélation des variables catégorielles (V de Cramér)')\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8f615121",
|
|
"metadata": {},
|
|
"source": [
|
|
"##### Corrélation des variables numériques :"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 166,
|
|
"id": "a16215ab",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"vars_numeriques = pd.DataFrame(variables_numeriques).transpose()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 167,
|
|
"id": "532ca6c4",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.plotly.v1+json": {
|
|
"config": {
|
|
"plotlyServerURL": "https://plot.ly"
|
|
},
|
|
"data": [
|
|
{
|
|
"coloraxis": "coloraxis",
|
|
"hovertemplate": "x: %{x}<br>y: %{y}<br>color: %{z}<extra></extra>",
|
|
"name": "0",
|
|
"texttemplate": "%{z}",
|
|
"type": "heatmap",
|
|
"x": [
|
|
"ANNEE_CTR",
|
|
"AGE_ASSURE_PRINCIPAL",
|
|
"ANCIENNETE_PERMIS",
|
|
"ANNEE_CONSTRUCTION"
|
|
],
|
|
"xaxis": "x",
|
|
"y": [
|
|
"ANNEE_CTR",
|
|
"AGE_ASSURE_PRINCIPAL",
|
|
"ANCIENNETE_PERMIS",
|
|
"ANNEE_CONSTRUCTION"
|
|
],
|
|
"yaxis": "y",
|
|
"z": {
|
|
"bdata": "AAAAAAAA8D+ybZcEUUCbP/CBLCtO46Q/qr2Q49LN2D+ybZcEUUCbPwAAAAAAAPA/slV7SAtP4T84L73yETWgv/CBLCtO46Q/slV7SAtP4T8AAAAAAADwP0I6y25dD6E/qr2Q49LN2D84L73yETWgv0I6y25dD6E/AAAAAAAA8D8=",
|
|
"dtype": "f8",
|
|
"shape": "4, 4"
|
|
}
|
|
}
|
|
],
|
|
"layout": {
|
|
"coloraxis": {
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"rgb(5,48,97)"
|
|
],
|
|
[
|
|
0.1,
|
|
"rgb(33,102,172)"
|
|
],
|
|
[
|
|
0.2,
|
|
"rgb(67,147,195)"
|
|
],
|
|
[
|
|
0.3,
|
|
"rgb(146,197,222)"
|
|
],
|
|
[
|
|
0.4,
|
|
"rgb(209,229,240)"
|
|
],
|
|
[
|
|
0.5,
|
|
"rgb(247,247,247)"
|
|
],
|
|
[
|
|
0.6,
|
|
"rgb(253,219,199)"
|
|
],
|
|
[
|
|
0.7,
|
|
"rgb(244,165,130)"
|
|
],
|
|
[
|
|
0.8,
|
|
"rgb(214,96,77)"
|
|
],
|
|
[
|
|
0.9,
|
|
"rgb(178,24,43)"
|
|
],
|
|
[
|
|
1,
|
|
"rgb(103,0,31)"
|
|
]
|
|
]
|
|
},
|
|
"template": {
|
|
"data": {
|
|
"bar": [
|
|
{
|
|
"error_x": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"error_y": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "bar"
|
|
}
|
|
],
|
|
"barpolar": [
|
|
{
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "barpolar"
|
|
}
|
|
],
|
|
"carpet": [
|
|
{
|
|
"aaxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"baxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"type": "carpet"
|
|
}
|
|
],
|
|
"choropleth": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "choropleth"
|
|
}
|
|
],
|
|
"contour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "contour"
|
|
}
|
|
],
|
|
"contourcarpet": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "contourcarpet"
|
|
}
|
|
],
|
|
"heatmap": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "heatmap"
|
|
}
|
|
],
|
|
"histogram": [
|
|
{
|
|
"marker": {
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "histogram"
|
|
}
|
|
],
|
|
"histogram2d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2d"
|
|
}
|
|
],
|
|
"histogram2dcontour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2dcontour"
|
|
}
|
|
],
|
|
"mesh3d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "mesh3d"
|
|
}
|
|
],
|
|
"parcoords": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "parcoords"
|
|
}
|
|
],
|
|
"pie": [
|
|
{
|
|
"automargin": true,
|
|
"type": "pie"
|
|
}
|
|
],
|
|
"scatter": [
|
|
{
|
|
"fillpattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
},
|
|
"type": "scatter"
|
|
}
|
|
],
|
|
"scatter3d": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatter3d"
|
|
}
|
|
],
|
|
"scattercarpet": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattercarpet"
|
|
}
|
|
],
|
|
"scattergeo": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergeo"
|
|
}
|
|
],
|
|
"scattergl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergl"
|
|
}
|
|
],
|
|
"scattermap": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermap"
|
|
}
|
|
],
|
|
"scattermapbox": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermapbox"
|
|
}
|
|
],
|
|
"scatterpolar": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolar"
|
|
}
|
|
],
|
|
"scatterpolargl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolargl"
|
|
}
|
|
],
|
|
"scatterternary": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterternary"
|
|
}
|
|
],
|
|
"surface": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "surface"
|
|
}
|
|
],
|
|
"table": [
|
|
{
|
|
"cells": {
|
|
"fill": {
|
|
"color": "#EBF0F8"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"header": {
|
|
"fill": {
|
|
"color": "#C8D4E3"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"type": "table"
|
|
}
|
|
]
|
|
},
|
|
"layout": {
|
|
"annotationdefaults": {
|
|
"arrowcolor": "#2a3f5f",
|
|
"arrowhead": 0,
|
|
"arrowwidth": 1
|
|
},
|
|
"autotypenumbers": "strict",
|
|
"coloraxis": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"colorscale": {
|
|
"diverging": [
|
|
[
|
|
0,
|
|
"#8e0152"
|
|
],
|
|
[
|
|
0.1,
|
|
"#c51b7d"
|
|
],
|
|
[
|
|
0.2,
|
|
"#de77ae"
|
|
],
|
|
[
|
|
0.3,
|
|
"#f1b6da"
|
|
],
|
|
[
|
|
0.4,
|
|
"#fde0ef"
|
|
],
|
|
[
|
|
0.5,
|
|
"#f7f7f7"
|
|
],
|
|
[
|
|
0.6,
|
|
"#e6f5d0"
|
|
],
|
|
[
|
|
0.7,
|
|
"#b8e186"
|
|
],
|
|
[
|
|
0.8,
|
|
"#7fbc41"
|
|
],
|
|
[
|
|
0.9,
|
|
"#4d9221"
|
|
],
|
|
[
|
|
1,
|
|
"#276419"
|
|
]
|
|
],
|
|
"sequential": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"sequentialminus": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
]
|
|
},
|
|
"colorway": [
|
|
"#636efa",
|
|
"#EF553B",
|
|
"#00cc96",
|
|
"#ab63fa",
|
|
"#FFA15A",
|
|
"#19d3f3",
|
|
"#FF6692",
|
|
"#B6E880",
|
|
"#FF97FF",
|
|
"#FECB52"
|
|
],
|
|
"font": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"geo": {
|
|
"bgcolor": "white",
|
|
"lakecolor": "white",
|
|
"landcolor": "#E5ECF6",
|
|
"showlakes": true,
|
|
"showland": true,
|
|
"subunitcolor": "white"
|
|
},
|
|
"hoverlabel": {
|
|
"align": "left"
|
|
},
|
|
"hovermode": "closest",
|
|
"mapbox": {
|
|
"style": "light"
|
|
},
|
|
"paper_bgcolor": "white",
|
|
"plot_bgcolor": "#E5ECF6",
|
|
"polar": {
|
|
"angularaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"radialaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"scene": {
|
|
"xaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"yaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"zaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
}
|
|
},
|
|
"shapedefaults": {
|
|
"line": {
|
|
"color": "#2a3f5f"
|
|
}
|
|
},
|
|
"ternary": {
|
|
"aaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"baxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"caxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"title": {
|
|
"x": 0.05
|
|
},
|
|
"xaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
},
|
|
"yaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
}
|
|
}
|
|
},
|
|
"title": {
|
|
"text": "Matrice de corrélation des variables numériques"
|
|
},
|
|
"xaxis": {
|
|
"anchor": "y",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
},
|
|
"yaxis": {
|
|
"anchor": "x",
|
|
"autorange": "reversed",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"vars_numeriques.corr()\n",
|
|
"fig = px.imshow(vars_numeriques.corr(),\n",
|
|
" text_auto=True,\n",
|
|
" aspect=\"auto\",\n",
|
|
" color_continuous_scale='RdBu_r',\n",
|
|
" title='Matrice de corrélation des variables numériques')\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "98c7dba6",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** quels sont vos commentaires ?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "67406b54",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Réponse*: Aucune des variables ne semblent corrélées."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "212209ec",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Preprocessing"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "65aca700",
|
|
"metadata": {},
|
|
"source": [
|
|
"Deux étapes sont nécessaires avant de lancer l'apprentissage d'un modèle, c'est ce qu'on connait comme le *Preprocessing* :\n",
|
|
"\n",
|
|
"* Les modèles proposés par la librairie \"sklearn\" ne gèrent que des variables numériques. Il est donc nécessaire de transformer les variables catégorielles en variables numériques : ce processus s'appelle le *One Hot Encoding*.\n",
|
|
"* Normaliser les données numériques"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "95f5cc9f",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant de réaliser le One Hot Encoding des variables catégorielles. Vous pourrez utiliser la fonction \"preproc.OneHotEncoder\" de la librairie sklearn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 168,
|
|
"id": "b8530717",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"encoder = preproc.OneHotEncoder()\n",
|
|
"encoder.fit(vars_categorielles)\n",
|
|
"vars_categorielles_enc = encoder.transform(vars_categorielles)\n",
|
|
"vars_categorielles_enc = pd.DataFrame(vars_categorielles_enc.toarray(), columns=encoder.get_feature_names_out(vars_categorielles.columns))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b70abc5c",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant normaliser les variables numériques présentes dans la base. Vous pourrez utiliser la fonction \"preproc.StandardScaler\" de la librairie sklearn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 169,
|
|
"id": "4ff3847d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"scaler = preproc.StandardScaler()\n",
|
|
"scaler.fit(vars_numeriques)\n",
|
|
"vars_numeriques_scaled = scaler.transform(vars_numeriques)\n",
|
|
"vars_numeriques_scaled = pd.DataFrame(vars_numeriques_scaled, columns=vars_numeriques.columns)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "62d49546",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Sampling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "64d229f4",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant construire la base d'apprentissage (80% des données) et la base de test (20%)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 170,
|
|
"id": "6a1c7907",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"X = data_model_preprocessed = vars_numeriques_scaled.merge(vars_categorielles_enc, left_index=True, right_index=True) # type: ignore\n",
|
|
"Y = data_model[\"CM\"]\n",
|
|
"\n",
|
|
"X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "84dc7a07",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Fitting"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "97c7b783",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant construire le modèle"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 171,
|
|
"id": "053e013c",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/html": [
|
|
"<style>#sk-container-id-4 {\n",
|
|
" /* Definition of color scheme common for light and dark mode */\n",
|
|
" --sklearn-color-text: #000;\n",
|
|
" --sklearn-color-text-muted: #666;\n",
|
|
" --sklearn-color-line: gray;\n",
|
|
" /* Definition of color scheme for unfitted estimators */\n",
|
|
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
|
|
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
|
|
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
|
|
" --sklearn-color-unfitted-level-3: chocolate;\n",
|
|
" /* Definition of color scheme for fitted estimators */\n",
|
|
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
|
|
" --sklearn-color-fitted-level-1: #d4ebff;\n",
|
|
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
|
|
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
|
|
"\n",
|
|
" /* Specific color for light theme */\n",
|
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
|
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
|
|
" --sklearn-color-icon: #696969;\n",
|
|
"\n",
|
|
" @media (prefers-color-scheme: dark) {\n",
|
|
" /* Redefinition of color scheme for dark theme */\n",
|
|
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
|
|
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
|
|
" --sklearn-color-icon: #878787;\n",
|
|
" }\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 pre {\n",
|
|
" padding: 0;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 input.sk-hidden--visually {\n",
|
|
" border: 0;\n",
|
|
" clip: rect(1px 1px 1px 1px);\n",
|
|
" clip: rect(1px, 1px, 1px, 1px);\n",
|
|
" height: 1px;\n",
|
|
" margin: -1px;\n",
|
|
" overflow: hidden;\n",
|
|
" padding: 0;\n",
|
|
" position: absolute;\n",
|
|
" width: 1px;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-dashed-wrapped {\n",
|
|
" border: 1px dashed var(--sklearn-color-line);\n",
|
|
" margin: 0 0.4em 0.5em 0.4em;\n",
|
|
" box-sizing: border-box;\n",
|
|
" padding-bottom: 0.4em;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-container {\n",
|
|
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
|
|
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
|
|
" so we also need the `!important` here to be able to override the\n",
|
|
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
|
|
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
|
|
" display: inline-block !important;\n",
|
|
" position: relative;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-text-repr-fallback {\n",
|
|
" display: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"div.sk-parallel-item,\n",
|
|
"div.sk-serial,\n",
|
|
"div.sk-item {\n",
|
|
" /* draw centered vertical line to link estimators */\n",
|
|
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
|
|
" background-size: 2px 100%;\n",
|
|
" background-repeat: no-repeat;\n",
|
|
" background-position: center center;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Parallel-specific style estimator block */\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-parallel-item::after {\n",
|
|
" content: \"\";\n",
|
|
" width: 100%;\n",
|
|
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
|
|
" flex-grow: 1;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-parallel {\n",
|
|
" display: flex;\n",
|
|
" align-items: stretch;\n",
|
|
" justify-content: center;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" position: relative;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-parallel-item {\n",
|
|
" display: flex;\n",
|
|
" flex-direction: column;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-parallel-item:first-child::after {\n",
|
|
" align-self: flex-end;\n",
|
|
" width: 50%;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-parallel-item:last-child::after {\n",
|
|
" align-self: flex-start;\n",
|
|
" width: 50%;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-parallel-item:only-child::after {\n",
|
|
" width: 0;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Serial-specific style estimator block */\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-serial {\n",
|
|
" display: flex;\n",
|
|
" flex-direction: column;\n",
|
|
" align-items: center;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" padding-right: 1em;\n",
|
|
" padding-left: 1em;\n",
|
|
"}\n",
|
|
"\n",
|
|
"\n",
|
|
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
|
|
"clickable and can be expanded/collapsed.\n",
|
|
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
|
|
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
|
|
"*/\n",
|
|
"\n",
|
|
"/* Pipeline and ColumnTransformer style (default) */\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-toggleable {\n",
|
|
" /* Default theme specific background. It is overwritten whether we have a\n",
|
|
" specific estimator or a Pipeline/ColumnTransformer */\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Toggleable label */\n",
|
|
"#sk-container-id-4 label.sk-toggleable__label {\n",
|
|
" cursor: pointer;\n",
|
|
" display: flex;\n",
|
|
" width: 100%;\n",
|
|
" margin-bottom: 0;\n",
|
|
" padding: 0.5em;\n",
|
|
" box-sizing: border-box;\n",
|
|
" text-align: center;\n",
|
|
" align-items: start;\n",
|
|
" justify-content: space-between;\n",
|
|
" gap: 0.5em;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 label.sk-toggleable__label .caption {\n",
|
|
" font-size: 0.6rem;\n",
|
|
" font-weight: lighter;\n",
|
|
" color: var(--sklearn-color-text-muted);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 label.sk-toggleable__label-arrow:before {\n",
|
|
" /* Arrow on the left of the label */\n",
|
|
" content: \"▸\";\n",
|
|
" float: left;\n",
|
|
" margin-right: 0.25em;\n",
|
|
" color: var(--sklearn-color-icon);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 label.sk-toggleable__label-arrow:hover:before {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Toggleable content - dropdown */\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-toggleable__content {\n",
|
|
" max-height: 0;\n",
|
|
" max-width: 0;\n",
|
|
" overflow: hidden;\n",
|
|
" text-align: left;\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-toggleable__content.fitted {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-toggleable__content pre {\n",
|
|
" margin: 0.2em;\n",
|
|
" border-radius: 0.25em;\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-toggleable__content.fitted pre {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
|
|
" /* Expand drop-down */\n",
|
|
" max-height: 200px;\n",
|
|
" max-width: 100%;\n",
|
|
" overflow: auto;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
|
|
" content: \"▾\";\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Pipeline/ColumnTransformer-specific style */\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Estimator-specific style */\n",
|
|
"\n",
|
|
"/* Colorize estimator box */\n",
|
|
"#sk-container-id-4 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-label label.sk-toggleable__label,\n",
|
|
"#sk-container-id-4 div.sk-label label {\n",
|
|
" /* The background is the default theme color */\n",
|
|
" color: var(--sklearn-color-text-on-default-background);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* On hover, darken the color of the background */\n",
|
|
"#sk-container-id-4 div.sk-label:hover label.sk-toggleable__label {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Label box, darken color on hover, fitted */\n",
|
|
"#sk-container-id-4 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Estimator label */\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-label label {\n",
|
|
" font-family: monospace;\n",
|
|
" font-weight: bold;\n",
|
|
" display: inline-block;\n",
|
|
" line-height: 1.2em;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-label-container {\n",
|
|
" text-align: center;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Estimator-specific */\n",
|
|
"#sk-container-id-4 div.sk-estimator {\n",
|
|
" font-family: monospace;\n",
|
|
" border: 1px dotted var(--sklearn-color-border-box);\n",
|
|
" border-radius: 0.25em;\n",
|
|
" box-sizing: border-box;\n",
|
|
" margin-bottom: 0.5em;\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-estimator.fitted {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-0);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* on hover */\n",
|
|
"#sk-container-id-4 div.sk-estimator:hover {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 div.sk-estimator.fitted:hover {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-2);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
|
|
"\n",
|
|
"/* Common style for \"i\" and \"?\" */\n",
|
|
"\n",
|
|
".sk-estimator-doc-link,\n",
|
|
"a:link.sk-estimator-doc-link,\n",
|
|
"a:visited.sk-estimator-doc-link {\n",
|
|
" float: right;\n",
|
|
" font-size: smaller;\n",
|
|
" line-height: 1em;\n",
|
|
" font-family: monospace;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" border-radius: 1em;\n",
|
|
" height: 1em;\n",
|
|
" width: 1em;\n",
|
|
" text-decoration: none !important;\n",
|
|
" margin-left: 0.5em;\n",
|
|
" text-align: center;\n",
|
|
" /* unfitted */\n",
|
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|
"}\n",
|
|
"\n",
|
|
".sk-estimator-doc-link.fitted,\n",
|
|
"a:link.sk-estimator-doc-link.fitted,\n",
|
|
"a:visited.sk-estimator-doc-link.fitted {\n",
|
|
" /* fitted */\n",
|
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* On hover */\n",
|
|
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
|
|
".sk-estimator-doc-link:hover,\n",
|
|
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
|
|
".sk-estimator-doc-link:hover {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|
" color: var(--sklearn-color-background);\n",
|
|
" text-decoration: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|
".sk-estimator-doc-link.fitted:hover,\n",
|
|
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
|
|
".sk-estimator-doc-link.fitted:hover {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|
" color: var(--sklearn-color-background);\n",
|
|
" text-decoration: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* Span, style for the box shown on hovering the info icon */\n",
|
|
".sk-estimator-doc-link span {\n",
|
|
" display: none;\n",
|
|
" z-index: 9999;\n",
|
|
" position: relative;\n",
|
|
" font-weight: normal;\n",
|
|
" right: .2ex;\n",
|
|
" padding: .5ex;\n",
|
|
" margin: .5ex;\n",
|
|
" width: min-content;\n",
|
|
" min-width: 20ex;\n",
|
|
" max-width: 50ex;\n",
|
|
" color: var(--sklearn-color-text);\n",
|
|
" box-shadow: 2pt 2pt 4pt #999;\n",
|
|
" /* unfitted */\n",
|
|
" background: var(--sklearn-color-unfitted-level-0);\n",
|
|
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
|
|
"}\n",
|
|
"\n",
|
|
".sk-estimator-doc-link.fitted span {\n",
|
|
" /* fitted */\n",
|
|
" background: var(--sklearn-color-fitted-level-0);\n",
|
|
" border: var(--sklearn-color-fitted-level-3);\n",
|
|
"}\n",
|
|
"\n",
|
|
".sk-estimator-doc-link:hover span {\n",
|
|
" display: block;\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
|
|
"\n",
|
|
"#sk-container-id-4 a.estimator_doc_link {\n",
|
|
" float: right;\n",
|
|
" font-size: 1rem;\n",
|
|
" line-height: 1em;\n",
|
|
" font-family: monospace;\n",
|
|
" background-color: var(--sklearn-color-background);\n",
|
|
" border-radius: 1rem;\n",
|
|
" height: 1rem;\n",
|
|
" width: 1rem;\n",
|
|
" text-decoration: none;\n",
|
|
" /* unfitted */\n",
|
|
" color: var(--sklearn-color-unfitted-level-1);\n",
|
|
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 a.estimator_doc_link.fitted {\n",
|
|
" /* fitted */\n",
|
|
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
|
|
" color: var(--sklearn-color-fitted-level-1);\n",
|
|
"}\n",
|
|
"\n",
|
|
"/* On hover */\n",
|
|
"#sk-container-id-4 a.estimator_doc_link:hover {\n",
|
|
" /* unfitted */\n",
|
|
" background-color: var(--sklearn-color-unfitted-level-3);\n",
|
|
" color: var(--sklearn-color-background);\n",
|
|
" text-decoration: none;\n",
|
|
"}\n",
|
|
"\n",
|
|
"#sk-container-id-4 a.estimator_doc_link.fitted:hover {\n",
|
|
" /* fitted */\n",
|
|
" background-color: var(--sklearn-color-fitted-level-3);\n",
|
|
"}\n",
|
|
"</style><div id=\"sk-container-id-4\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>DecisionTreeRegressor()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-4\" type=\"checkbox\" checked><label for=\"sk-estimator-id-4\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow\"><div><div>DecisionTreeRegressor</div></div><div><a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.6/modules/generated/sklearn.tree.DecisionTreeRegressor.html\">?<span>Documentation for DecisionTreeRegressor</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></div></label><div class=\"sk-toggleable__content fitted\"><pre>DecisionTreeRegressor()</pre></div> </div></div></div></div>"
|
|
],
|
|
"text/plain": [
|
|
"DecisionTreeRegressor()"
|
|
]
|
|
},
|
|
"execution_count": 171,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"tree = DecisionTreeRegressor()\n",
|
|
"tree.fit(X_train, y_train)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8d624704",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant d'évaluer les performances du modèle (MAE, MSE et RMSE)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 172,
|
|
"id": "c4ca2cf9",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"MAE: 0.00\n",
|
|
"MSE: 0.00\n",
|
|
"RMSE: 0.00\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Prédictions sur l'ensemble d'entraînement\n",
|
|
"y_pred_train = tree.predict(X_train)\n",
|
|
"\n",
|
|
"mae = metrics.mean_absolute_error(y_train, y_pred_train)\n",
|
|
"mse = metrics.mean_squared_error(y_train, y_pred_train)\n",
|
|
"rmse = metrics.root_mean_squared_error(y_train, y_pred_train)\n",
|
|
"\n",
|
|
"print(f\"MAE: {mae:.2f}\")\n",
|
|
"print(f\"MSE: {mse:.2f}\")\n",
|
|
"print(f\"RMSE: {rmse:.2f}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 173,
|
|
"id": "4b739d5b",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"MAE: 5950.05\n",
|
|
"MSE: 160067768.70\n",
|
|
"RMSE: 12651.79\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"y_pred_test = tree.predict(X_test)\n",
|
|
"\n",
|
|
"mae = metrics.mean_absolute_error(y_test, y_pred_test)\n",
|
|
"mse = metrics.mean_squared_error(y_test, y_pred_test)\n",
|
|
"rmse = metrics.root_mean_squared_error(y_test, y_pred_test)\n",
|
|
"\n",
|
|
"print(f\"MAE: {mae:.2f}\")\n",
|
|
"print(f\"MSE: {mse:.2f}\")\n",
|
|
"print(f\"RMSE: {rmse:.2f}\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fb2fe98c",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** que pensez-vous des performances de ce modèle ?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7ecba832",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Algorithme supervisé : Random Forest "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "efcb8987",
|
|
"metadata": {},
|
|
"source": [
|
|
"A ce stade, nous avons vu les différentes étapes pour lancer un algorithme de Machine Learning. Néanmoins, ces étapes ne sont pas suffisantes pour construire un modèle performant. \n",
|
|
"En effet, afin de construire un modèle performant le Data Scientist doit agir sur l'apprentissage du modèle. Dans ce qui suit nous :\n",
|
|
"* Changerons d'algorithme pour utiliser un algorithme plus performant (Random Forest)\n",
|
|
"* Raliserons un *grid search* sur les paramètres du modèle\n",
|
|
"* Appliquerons l'apprentissage par validation croisée\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d6723a2f",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Modèle avec Validation Croisée"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3716b09f",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Sampling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ab1e1367",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3f5d735e",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Fitting avec Cross-Validation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bc819f8f",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** construisez un modèle RF (RandomForestRegressor) en implémentant la technique de validation croisée. Pensez à enregistrer au sein d'une variable/liste les performances (MAE, MSE & RMSE) du modèle au sein de chaque fold."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 174,
|
|
"id": "b515460e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Initialisation\n",
|
|
"# Nombre de sous-échantillons pour la cross-validation\n",
|
|
"num_splits = 5\n",
|
|
"\n",
|
|
"# Random Forest regressor\n",
|
|
"rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)\n",
|
|
"\n",
|
|
"# Initialisation du KFold cross-validation splitter\n",
|
|
"kf = KFold(n_splits=num_splits)\n",
|
|
"\n",
|
|
"# Listes pour enregistrer les performances du modèle\n",
|
|
"MAE_scores = []\n",
|
|
"MSE_scores = []\n",
|
|
"RMSE_scores = []"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 175,
|
|
"id": "eebb394f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Entrainement avec cross-validation\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 176,
|
|
"id": "b067126c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Métriques sur tous les folds\n",
|
|
"\n",
|
|
"#MAE\n",
|
|
"for fold, mae in enumerate(MAE_scores, start=1):\n",
|
|
" print(f\"Fold {fold} MAE:\", mae)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 177,
|
|
"id": "6597152c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#MSE\n",
|
|
"for fold, mse in enumerate(MSE_scores, start=1):\n",
|
|
" print(f\"Fold {fold} MSE:\", mse)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 178,
|
|
"id": "63ff1c9d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#RMSE\n",
|
|
"for fold, rmse in enumerate(RMSE_scores, start=1):\n",
|
|
" print(f\"Fold {fold} RMSE:\", rmse)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ec1961c2",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** Commentez les résultats."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5a8163ef",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Ajout d'un Grid Search pour les hyper paramètres"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5a6adbfe",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Sampling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d9342ad6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dce52b11",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Fitting avec Cross-Validation et *Grid Search*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7e3a9dd0",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** Intégrez la technique de Grid Search pour rechercher les paramètres optimaux du modèle."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 179,
|
|
"id": "6d58dbc2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Initialisation\n",
|
|
"# Nombre de sous-échantillons pour la cross-validation\n",
|
|
"num_splits = 5\n",
|
|
"\n",
|
|
"# Initialisation du KFold cross-validation splitter\n",
|
|
"kf = KFold(n_splits=num_splits)\n",
|
|
"\n",
|
|
"# Listes pour enregistrer les performances du modèle\n",
|
|
"MAE_scores = []\n",
|
|
"MSE_scores = []\n",
|
|
"RMSE_scores = []\n",
|
|
"\n",
|
|
"# Hyperparamètres à tester\n",
|
|
"n_estimators_values = [] #Complétez ici par les paramètres à tester\n",
|
|
"max_depth_values = [] #Complétez ici par les paramètres à tester\n",
|
|
"min_samples_split_values = [] #Complétez ici par les paramètres à tester\n",
|
|
"\n",
|
|
"# Liste pour sauveagrder les meilleurs résultats\n",
|
|
"best_score = np.inf\n",
|
|
"best_params = {}\n",
|
|
"\n",
|
|
"MAE_best_score = []\n",
|
|
"MSE_best_score = []\n",
|
|
"RMSE_best_score = []"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 180,
|
|
"id": "47da5172",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Complétez ici avec votre code"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 181,
|
|
"id": "d4936c46",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Meilleurs paramètres: {}\n",
|
|
"Meilleure RMSE : inf\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# Meilleurs résultats\n",
|
|
"print(\"Meilleurs paramètres:\", best_params)\n",
|
|
"print(\"Meilleure RMSE :\", best_score)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 182,
|
|
"id": "3215c463",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Métriques sur tous les folds\n",
|
|
"\n",
|
|
"#RMSE\n",
|
|
"for fold, rmse in enumerate(RMSE_best_score, start=1):\n",
|
|
" print(f\"Fold {fold} RMSE:\", rmse)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 183,
|
|
"id": "bb9a5c9b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#MAE\n",
|
|
"for fold, mse in enumerate(MSE_best_score, start=1):\n",
|
|
" print(f\"Fold {fold} MSE:\", mse)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 184,
|
|
"id": "0f0768ad",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#MSE\n",
|
|
"for fold, mae in enumerate(MAE_best_score, start=1):\n",
|
|
" print(f\"Fold {fold} MAE:\", mae)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "802a625f",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** Commentez les résultats"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "studies",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.13.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|