Files
ArtStudies/M2/Machine Learning/TP_3/2025_TP_3_M2_ISF.ipynb

6785 lines
159 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "8750d15b",
"metadata": {},
"source": [
"# Cours 3 : Machine Learning - Algorithmes supervisés (1/2)"
]
},
{
"cell_type": "markdown",
"id": "f7c08ae5",
"metadata": {},
"source": [
"## Préambule"
]
},
{
"cell_type": "markdown",
"id": "ec7ecb4b",
"metadata": {},
"source": [
"Les objectifs de cette séance (3h) sont :\n",
"* Préparation des bases de modélisation (sampling)\n",
"* Mettre en application un modèle supervisé simple.\n",
"* Construire un modèle de Machine Learning (cross-validation et hyperparamétrage) pour résoudre un problème de régression\n",
"* Analyser les performances du modèle"
]
},
{
"cell_type": "markdown",
"id": "4e99c600",
"metadata": {},
"source": [
"## Préparation du workspace"
]
},
{
"cell_type": "markdown",
"id": "c1b01045",
"metadata": {},
"source": [
"### Import de librairies "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "97d58527",
"metadata": {},
"outputs": [],
"source": [
"# Données\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"#Graphiques\n",
"import seaborn as sns\n",
"\n",
"sns.set()\n",
"import plotly.express as px\n",
"import plotly.graph_objects as gp\n",
"import sklearn.preprocessing as preproc\n",
"\n",
"#Statistiques\n",
"from scipy.stats import chi2_contingency\n",
"from sklearn import metrics\n",
"\n",
"# Machine Learning\n",
"from sklearn.cluster import KMeans\n",
"import sklearn.metrics as metrics\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from sklearn.model_selection import KFold, train_test_split\n",
"from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor"
]
},
{
"cell_type": "markdown",
"id": "06153286",
"metadata": {},
"source": [
"### Définition des fonctions "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c67db932",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "985e4e97",
"metadata": {},
"source": [
"### Constantes"
]
},
{
"cell_type": "code",
"execution_count": 91,
"id": "c9597b48",
"metadata": {},
"outputs": [],
"source": [
"input_path = \"./1_inputs\"\n",
"output_path = \"./2_outputs\""
]
},
{
"cell_type": "markdown",
"id": "b2b035d2",
"metadata": {},
"source": [
"### Import des données"
]
},
{
"cell_type": "code",
"execution_count": 92,
"id": "8051b5f4",
"metadata": {},
"outputs": [],
"source": [
"path =input_path + '/base_retraitee.csv'\n",
"data_retraitee = pd.read_csv(path,sep=\",\",decimal=\".\")"
]
},
{
"cell_type": "markdown",
"id": "a2578ba1",
"metadata": {},
"source": [
"## Algorithme supervisé : CART "
]
},
{
"cell_type": "markdown",
"id": "aaa0b27d",
"metadata": {},
"source": [
"Dans cette partie l'objectif est de construire un modèle simple (algorithme CART) afin de voir les différentes étapes nécessaire au lancement d'un modèle\n",
"Nous modéliserons directement le coût des sinistres. "
]
},
{
"cell_type": "markdown",
"id": "a0458a05",
"metadata": {},
"source": [
"### Construction du modèle"
]
},
{
"cell_type": "markdown",
"id": "b3715c37",
"metadata": {},
"source": [
"La première étape est de calculer les côut moyen de chaque sinistre (target ou variable réponse). Cette variable sera la variable à prédire en fonction des variables explicatives."
]
},
{
"cell_type": "code",
"execution_count": 93,
"id": "c427a4b8",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(824, 14)"
]
},
"execution_count": 93,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_model = data_retraitee.copy()\n",
"\n",
"# Filtre pour ne garder que les lignes qui ont un sinistre (NB > 0)\n",
"data_model = data_model[data_model['NB'] > 0]\n",
"\n",
"# Calcul du cout moyen \"théorique\" des sinistres\n",
"data_model[\"CM\"] = (data_model[\"CHARGE\"] / data_model[\"NB\"])\n",
"data_model = data_model.drop(['CHARGE', 'NB', \"EXPO\"], axis=1)\n",
"data_model.shape"
]
},
{
"cell_type": "markdown",
"id": "e3e85088",
"metadata": {},
"source": [
"**Exercice :** construisez les statistiques descriptives de la base utilisée."
]
},
{
"cell_type": "code",
"execution_count": 94,
"id": "c8fd3ee1",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
"columns": [
{
"name": "index",
"rawType": "object",
"type": "string"
},
{
"name": "ANNEE_CTR",
"rawType": "float64",
"type": "float"
},
{
"name": "CONTRAT_ANCIENNETE",
"rawType": "object",
"type": "unknown"
},
{
"name": "FREQUENCE_PAIEMENT_COTISATION",
"rawType": "object",
"type": "unknown"
},
{
"name": "GROUPE_KM",
"rawType": "object",
"type": "unknown"
},
{
"name": "ZONE_RISQUE",
"rawType": "object",
"type": "unknown"
},
{
"name": "AGE_ASSURE_PRINCIPAL",
"rawType": "float64",
"type": "float"
},
{
"name": "GENRE",
"rawType": "object",
"type": "unknown"
},
{
"name": "DEUXIEME_CONDUCTEUR",
"rawType": "object",
"type": "unknown"
},
{
"name": "ANCIENNETE_PERMIS",
"rawType": "float64",
"type": "float"
},
{
"name": "ANNEE_CONSTRUCTION",
"rawType": "float64",
"type": "float"
},
{
"name": "ENERGIE",
"rawType": "object",
"type": "unknown"
},
{
"name": "EQUIPEMENT_SECURITE",
"rawType": "object",
"type": "unknown"
},
{
"name": "VALEUR_DU_BIEN",
"rawType": "object",
"type": "unknown"
},
{
"name": "CM",
"rawType": "float64",
"type": "float"
}
],
"ref": "8d8166c3-6828-4361-92de-ebce2dadb512",
"rows": [
[
"count",
"824.0",
"824",
"824",
"824",
"824",
"824.0",
"824",
"824",
"824.0",
"824.0",
"824",
"824",
"824",
"824.0"
],
[
"unique",
null,
"5",
"3",
"4",
"14",
null,
"2",
"2",
null,
null,
"3",
"2",
"6",
null
],
[
"top",
null,
"(0,1]",
"MENSUEL",
"[0;20000[",
"C",
null,
"M",
"False",
null,
null,
"ESSENCE",
"FAUX",
"[10000;15000[",
null
],
[
"freq",
null,
"297",
"398",
"391",
"269",
null,
"483",
"663",
null,
null,
"413",
"517",
"213",
null
],
[
"mean",
"2018.384708737864",
null,
null,
null,
null,
"44.383495145631066",
null,
null,
"35.68810679611651",
"2015.2123786407767",
null,
null,
null,
"4246.01697815534"
],
[
"std",
"1.515832735580178",
null,
null,
null,
null,
"13.808216667998865",
null,
null,
"19.370620845496358",
"3.1637823115731556",
null,
null,
null,
"6869.61691660173"
],
[
"min",
"2016.0",
null,
null,
null,
null,
"19.0",
null,
null,
"1.0",
"1998.0",
null,
null,
null,
"7.5"
],
[
"25%",
"2017.0",
null,
null,
null,
null,
"34.0",
null,
null,
"18.0",
"2014.0",
null,
null,
null,
"1159.96125"
],
[
"50%",
"2018.0",
null,
null,
null,
null,
"43.0",
null,
null,
"35.0",
"2016.0",
null,
null,
null,
"2541.6499999999996"
],
[
"75%",
"2020.0",
null,
null,
null,
null,
"53.0",
null,
null,
"53.0",
"2017.0",
null,
null,
null,
"4193.797500000001"
],
[
"max",
"2021.0",
null,
null,
null,
null,
"94.0",
null,
null,
"70.0",
"2021.0",
null,
null,
null,
"83421.85"
]
],
"shape": {
"columns": 14,
"rows": 11
}
},
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ANNEE_CTR</th>\n",
" <th>CONTRAT_ANCIENNETE</th>\n",
" <th>FREQUENCE_PAIEMENT_COTISATION</th>\n",
" <th>GROUPE_KM</th>\n",
" <th>ZONE_RISQUE</th>\n",
" <th>AGE_ASSURE_PRINCIPAL</th>\n",
" <th>GENRE</th>\n",
" <th>DEUXIEME_CONDUCTEUR</th>\n",
" <th>ANCIENNETE_PERMIS</th>\n",
" <th>ANNEE_CONSTRUCTION</th>\n",
" <th>ENERGIE</th>\n",
" <th>EQUIPEMENT_SECURITE</th>\n",
" <th>VALEUR_DU_BIEN</th>\n",
" <th>CM</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>824.000000</td>\n",
" <td>824</td>\n",
" <td>824</td>\n",
" <td>824</td>\n",
" <td>824</td>\n",
" <td>824.000000</td>\n",
" <td>824</td>\n",
" <td>824</td>\n",
" <td>824.000000</td>\n",
" <td>824.000000</td>\n",
" <td>824</td>\n",
" <td>824</td>\n",
" <td>824</td>\n",
" <td>824.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>unique</th>\n",
" <td>NaN</td>\n",
" <td>5</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>14</td>\n",
" <td>NaN</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>top</th>\n",
" <td>NaN</td>\n",
" <td>(0,1]</td>\n",
" <td>MENSUEL</td>\n",
" <td>[0;20000[</td>\n",
" <td>C</td>\n",
" <td>NaN</td>\n",
" <td>M</td>\n",
" <td>False</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>ESSENCE</td>\n",
" <td>FAUX</td>\n",
" <td>[10000;15000[</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>freq</th>\n",
" <td>NaN</td>\n",
" <td>297</td>\n",
" <td>398</td>\n",
" <td>391</td>\n",
" <td>269</td>\n",
" <td>NaN</td>\n",
" <td>483</td>\n",
" <td>663</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>413</td>\n",
" <td>517</td>\n",
" <td>213</td>\n",
" <td>NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>2018.384709</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>44.383495</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>35.688107</td>\n",
" <td>2015.212379</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>4246.016978</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>1.515833</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>13.808217</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>19.370621</td>\n",
" <td>3.163782</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>6869.616917</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>2016.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>19.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1.000000</td>\n",
" <td>1998.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>7.500000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>2017.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>34.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>18.000000</td>\n",
" <td>2014.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>1159.961250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>2018.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>43.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>35.000000</td>\n",
" <td>2016.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>2541.650000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>2020.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>53.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>53.000000</td>\n",
" <td>2017.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>4193.797500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>2021.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>94.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>70.000000</td>\n",
" <td>2021.000000</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>NaN</td>\n",
" <td>83421.850000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ANNEE_CTR CONTRAT_ANCIENNETE FREQUENCE_PAIEMENT_COTISATION \\\n",
"count 824.000000 824 824 \n",
"unique NaN 5 3 \n",
"top NaN (0,1] MENSUEL \n",
"freq NaN 297 398 \n",
"mean 2018.384709 NaN NaN \n",
"std 1.515833 NaN NaN \n",
"min 2016.000000 NaN NaN \n",
"25% 2017.000000 NaN NaN \n",
"50% 2018.000000 NaN NaN \n",
"75% 2020.000000 NaN NaN \n",
"max 2021.000000 NaN NaN \n",
"\n",
" GROUPE_KM ZONE_RISQUE AGE_ASSURE_PRINCIPAL GENRE DEUXIEME_CONDUCTEUR \\\n",
"count 824 824 824.000000 824 824 \n",
"unique 4 14 NaN 2 2 \n",
"top [0;20000[ C NaN M False \n",
"freq 391 269 NaN 483 663 \n",
"mean NaN NaN 44.383495 NaN NaN \n",
"std NaN NaN 13.808217 NaN NaN \n",
"min NaN NaN 19.000000 NaN NaN \n",
"25% NaN NaN 34.000000 NaN NaN \n",
"50% NaN NaN 43.000000 NaN NaN \n",
"75% NaN NaN 53.000000 NaN NaN \n",
"max NaN NaN 94.000000 NaN NaN \n",
"\n",
" ANCIENNETE_PERMIS ANNEE_CONSTRUCTION ENERGIE EQUIPEMENT_SECURITE \\\n",
"count 824.000000 824.000000 824 824 \n",
"unique NaN NaN 3 2 \n",
"top NaN NaN ESSENCE FAUX \n",
"freq NaN NaN 413 517 \n",
"mean 35.688107 2015.212379 NaN NaN \n",
"std 19.370621 3.163782 NaN NaN \n",
"min 1.000000 1998.000000 NaN NaN \n",
"25% 18.000000 2014.000000 NaN NaN \n",
"50% 35.000000 2016.000000 NaN NaN \n",
"75% 53.000000 2017.000000 NaN NaN \n",
"max 70.000000 2021.000000 NaN NaN \n",
"\n",
" VALEUR_DU_BIEN CM \n",
"count 824 824.000000 \n",
"unique 6 NaN \n",
"top [10000;15000[ NaN \n",
"freq 213 NaN \n",
"mean NaN 4246.016978 \n",
"std NaN 6869.616917 \n",
"min NaN 7.500000 \n",
"25% NaN 1159.961250 \n",
"50% NaN 2541.650000 \n",
"75% NaN 4193.797500 \n",
"max NaN 83421.850000 "
]
},
"execution_count": 94,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_model.describe(include='all')"
]
},
{
"cell_type": "markdown",
"id": "92d6156a",
"metadata": {},
"source": [
"#### Etude des corrélations parmi les variables explicatives"
]
},
{
"cell_type": "markdown",
"id": "d7327570",
"metadata": {},
"source": [
"**Question :** Selon vous, pourquoi faut-il s'intéresser à la corrélation des variables ? "
]
},
{
"cell_type": "markdown",
"id": "475e141b",
"metadata": {},
"source": [
"*Réponse*: Pour avoir un modèle qui fit mieux + déterminer un potentiel effet de causalité entre features et target + sélectionner certaines variables."
]
},
{
"cell_type": "code",
"execution_count": 95,
"id": "1b156435",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(824, 13)"
]
},
"execution_count": 95,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_set = data_model.drop(\"CM\", axis=1)\n",
"data_set.shape"
]
},
{
"cell_type": "code",
"execution_count": 96,
"id": "0ef0fcc0",
"metadata": {},
"outputs": [],
"source": [
"#Séparation en variables qualitatives ou catégorielles\n",
"variables_na = []\n",
"variables_numeriques = []\n",
"variables_01 = []\n",
"variables_categorielles = []\n",
"for colu in data_set.columns:\n",
" if True in data_set[colu].isna().unique() :\n",
" variables_na.append(data_set[colu])\n",
" else :\n",
" if str(data_set[colu].dtypes) in [\"int32\",\"int64\",\"float64\"]:\n",
" if len(data_set[colu].unique())==2 :\n",
" variables_categorielles.append(data_set[colu])\n",
" else :\n",
" variables_numeriques.append(data_set[colu])\n",
" else :\n",
" if len(data_set[colu].unique())==2 :\n",
" variables_categorielles.append(data_set[colu])\n",
" else :\n",
" variables_categorielles.append(data_set[colu])"
]
},
{
"cell_type": "markdown",
"id": "e82fcade",
"metadata": {},
"source": [
"##### Corrélation des variables catégorielles :"
]
},
{
"cell_type": "code",
"execution_count": 97,
"id": "e130aae5",
"metadata": {},
"outputs": [],
"source": [
"vars_categorielles = pd.DataFrame(variables_categorielles).transpose()"
]
},
{
"cell_type": "code",
"execution_count": 123,
"id": "c39e2ad0",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"coloraxis": "coloraxis",
"hovertemplate": "x: %{x}<br>y: %{y}<br>color: %{z}<extra></extra>",
"name": "0",
"texttemplate": "%{z:.2f}",
"type": "heatmap",
"x": [
"CONTRAT_ANCIENNETE",
"FREQUENCE_PAIEMENT_COTISATION",
"GROUPE_KM",
"ZONE_RISQUE",
"GENRE",
"DEUXIEME_CONDUCTEUR",
"ENERGIE",
"EQUIPEMENT_SECURITE",
"VALEUR_DU_BIEN"
],
"xaxis": "x",
"y": [
"CONTRAT_ANCIENNETE",
"FREQUENCE_PAIEMENT_COTISATION",
"GROUPE_KM",
"ZONE_RISQUE",
"GENRE",
"DEUXIEME_CONDUCTEUR",
"ENERGIE",
"EQUIPEMENT_SECURITE",
"VALEUR_DU_BIEN"
],
"yaxis": "y",
"z": {
"bdata": "AAAAAAAA8D8AAAAAAAAAACoCGzzITrA/jS6+t390sj/aAKYMJa2eP5RMqUS3uZs/ytNpsBVXkz8AAAAAAAAAAJsekiMPM4I/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAAAAAAAAAAAABgNwyfFOK3Px3tLvtk1qI/VTS7w965nj/DbHQwNU6sP6xOyIjBVMQ/KwIbPMhOsD8AAAAAAAAAAAAAAAAAAPA/JGwWgOwjwz/Y12crRVC2P1AU8aUpk3Y/tZ25v8HgyT9++YWBDBq6PxMKBP1KAMk/ki6+t390sj8AAAAAAAAAACNsFoDsI8M/AAAAAAAA8D8AAAAAAAAAAOzpAHMW1bU/OToUIB5twT+gpoD1ZjrEP/5ATjN+vpg/0gCmDCWtnj9gNwyfFOK3P9jXZytFULY/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAA2p0N4q1bwz/UsLoqS0u5PxFqf8IHB9E/lEypRLe5mz8d7S77ZNaiP1AU8aUpk3Y/7OkAcxbVtT8AAAAAAAAAAAAAAAAAAPA/AAAAAAAAAAAAAAAAAAAAAOYlMsJ0brs/ytNpsBVXkz9RNLvD3rmeP7edub/B4Mk/OjoUIB5twT/anQ3irVvDPwAAAAAAAAAAAAAAAAAA8D8nEbUEUmnAP+SA2g/TvNE/AAAAAAAAAADDbHQwNU6sP335hYEMGro/oKaA9WY6xD/UsLoqS0u5PwAAAAAAAAAAJxG1BFJpwD8AAAAAAADwP+fmCf6XRco/mx6SIw8zgj+rTsiIwVTEPxIKBP1KAMk//kBOM36+mD8Ran/CBwfRP+YlMsJ0brs/5YDaD9O80T/n5gn+l0XKPwAAAAAAAPA/",
"dtype": "f8",
"shape": "9, 9"
}
}
],
"layout": {
"coloraxis": {
"colorscale": [
[
0,
"rgb(5,48,97)"
],
[
0.1,
"rgb(33,102,172)"
],
[
0.2,
"rgb(67,147,195)"
],
[
0.3,
"rgb(146,197,222)"
],
[
0.4,
"rgb(209,229,240)"
],
[
0.5,
"rgb(247,247,247)"
],
[
0.6,
"rgb(253,219,199)"
],
[
0.7,
"rgb(244,165,130)"
],
[
0.8,
"rgb(214,96,77)"
],
[
0.9,
"rgb(178,24,43)"
],
[
1,
"rgb(103,0,31)"
]
]
},
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"fillpattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermap": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermap"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"title": {
"text": "Matrice de corrélation des variables catégorielles (V de Cramér)"
},
"xaxis": {
"anchor": "y",
"domain": [
0,
1
]
},
"yaxis": {
"anchor": "x",
"autorange": "reversed",
"domain": [
0,
1
]
}
}
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Matrice de corrélation pour les variables catégorielles (V de Cramér)\n",
"def cramers_v(confusion_matrix):\n",
" \"\"\"Calcule le V de Cramér à partir d'une matrice de contingence\"\"\"\n",
" chi2 = chi2_contingency(confusion_matrix)[0]\n",
" n = confusion_matrix.sum().sum()\n",
" phi2 = chi2 / n\n",
" r, k = confusion_matrix.shape\n",
" phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))\n",
" rcorr = r - ((r-1)**2)/(n-1)\n",
" kcorr = k - ((k-1)**2)/(n-1)\n",
" return np.sqrt(phi2corr / min((kcorr-1), (rcorr-1)))\n",
"\n",
"# Créer la matrice de corrélation\n",
"categorical_cols = vars_categorielles.columns\n",
"n_vars = len(categorical_cols)\n",
"cramers_matrix = np.zeros((n_vars, n_vars))\n",
"\n",
"for i, col1 in enumerate(categorical_cols):\n",
" for j, col2 in enumerate(categorical_cols):\n",
" if i == j:\n",
" cramers_matrix[i, j] = 1.0\n",
" else:\n",
" confusion_matrix = pd.crosstab(vars_categorielles[col1], vars_categorielles[col2])\n",
" cramers_matrix[i, j] = cramers_v(confusion_matrix)\n",
"\n",
"# Créer le DataFrame de corrélation\n",
"correlation_cat = pd.DataFrame(cramers_matrix,\n",
" index=categorical_cols,\n",
" columns=categorical_cols)\n",
"\n",
"# Visualiser avec Plotly\n",
"fig = px.imshow(correlation_cat,\n",
" text_auto='.2f', # type: ignore\n",
" aspect=\"auto\",\n",
" color_continuous_scale='RdBu_r',\n",
" title='Matrice de corrélation des variables catégorielles (V de Cramér)')\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "8f615121",
"metadata": {},
"source": [
"##### Corrélation des variables numériques :"
]
},
{
"cell_type": "code",
"execution_count": 99,
"id": "a16215ab",
"metadata": {},
"outputs": [],
"source": [
"vars_numeriques = pd.DataFrame(variables_numeriques).transpose()"
]
},
{
"cell_type": "code",
"execution_count": 100,
"id": "532ca6c4",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.plotly.v1+json": {
"config": {
"plotlyServerURL": "https://plot.ly"
},
"data": [
{
"coloraxis": "coloraxis",
"hovertemplate": "x: %{x}<br>y: %{y}<br>color: %{z}<extra></extra>",
"name": "0",
"texttemplate": "%{z}",
"type": "heatmap",
"x": [
"ANNEE_CTR",
"AGE_ASSURE_PRINCIPAL",
"ANCIENNETE_PERMIS",
"ANNEE_CONSTRUCTION"
],
"xaxis": "x",
"y": [
"ANNEE_CTR",
"AGE_ASSURE_PRINCIPAL",
"ANCIENNETE_PERMIS",
"ANNEE_CONSTRUCTION"
],
"yaxis": "y",
"z": {
"bdata": "AAAAAAAA8D+ybZcEUUCbP/CBLCtO46Q/qr2Q49LN2D+ybZcEUUCbPwAAAAAAAPA/slV7SAtP4T84L73yETWgv/CBLCtO46Q/slV7SAtP4T8AAAAAAADwP0I6y25dD6E/qr2Q49LN2D84L73yETWgv0I6y25dD6E/AAAAAAAA8D8=",
"dtype": "f8",
"shape": "4, 4"
}
}
],
"layout": {
"coloraxis": {
"colorscale": [
[
0,
"rgb(5,48,97)"
],
[
0.1,
"rgb(33,102,172)"
],
[
0.2,
"rgb(67,147,195)"
],
[
0.3,
"rgb(146,197,222)"
],
[
0.4,
"rgb(209,229,240)"
],
[
0.5,
"rgb(247,247,247)"
],
[
0.6,
"rgb(253,219,199)"
],
[
0.7,
"rgb(244,165,130)"
],
[
0.8,
"rgb(214,96,77)"
],
[
0.9,
"rgb(178,24,43)"
],
[
1,
"rgb(103,0,31)"
]
]
},
"template": {
"data": {
"bar": [
{
"error_x": {
"color": "#2a3f5f"
},
"error_y": {
"color": "#2a3f5f"
},
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "bar"
}
],
"barpolar": [
{
"marker": {
"line": {
"color": "#E5ECF6",
"width": 0.5
},
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "barpolar"
}
],
"carpet": [
{
"aaxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"baxis": {
"endlinecolor": "#2a3f5f",
"gridcolor": "white",
"linecolor": "white",
"minorgridcolor": "white",
"startlinecolor": "#2a3f5f"
},
"type": "carpet"
}
],
"choropleth": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "choropleth"
}
],
"contour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "contour"
}
],
"contourcarpet": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "contourcarpet"
}
],
"heatmap": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "heatmap"
}
],
"histogram": [
{
"marker": {
"pattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
}
},
"type": "histogram"
}
],
"histogram2d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2d"
}
],
"histogram2dcontour": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "histogram2dcontour"
}
],
"mesh3d": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"type": "mesh3d"
}
],
"parcoords": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "parcoords"
}
],
"pie": [
{
"automargin": true,
"type": "pie"
}
],
"scatter": [
{
"fillpattern": {
"fillmode": "overlay",
"size": 10,
"solidity": 0.2
},
"type": "scatter"
}
],
"scatter3d": [
{
"line": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatter3d"
}
],
"scattercarpet": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattercarpet"
}
],
"scattergeo": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergeo"
}
],
"scattergl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattergl"
}
],
"scattermap": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermap"
}
],
"scattermapbox": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scattermapbox"
}
],
"scatterpolar": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolar"
}
],
"scatterpolargl": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterpolargl"
}
],
"scatterternary": [
{
"marker": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"type": "scatterternary"
}
],
"surface": [
{
"colorbar": {
"outlinewidth": 0,
"ticks": ""
},
"colorscale": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"type": "surface"
}
],
"table": [
{
"cells": {
"fill": {
"color": "#EBF0F8"
},
"line": {
"color": "white"
}
},
"header": {
"fill": {
"color": "#C8D4E3"
},
"line": {
"color": "white"
}
},
"type": "table"
}
]
},
"layout": {
"annotationdefaults": {
"arrowcolor": "#2a3f5f",
"arrowhead": 0,
"arrowwidth": 1
},
"autotypenumbers": "strict",
"coloraxis": {
"colorbar": {
"outlinewidth": 0,
"ticks": ""
}
},
"colorscale": {
"diverging": [
[
0,
"#8e0152"
],
[
0.1,
"#c51b7d"
],
[
0.2,
"#de77ae"
],
[
0.3,
"#f1b6da"
],
[
0.4,
"#fde0ef"
],
[
0.5,
"#f7f7f7"
],
[
0.6,
"#e6f5d0"
],
[
0.7,
"#b8e186"
],
[
0.8,
"#7fbc41"
],
[
0.9,
"#4d9221"
],
[
1,
"#276419"
]
],
"sequential": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
],
"sequentialminus": [
[
0,
"#0d0887"
],
[
0.1111111111111111,
"#46039f"
],
[
0.2222222222222222,
"#7201a8"
],
[
0.3333333333333333,
"#9c179e"
],
[
0.4444444444444444,
"#bd3786"
],
[
0.5555555555555556,
"#d8576b"
],
[
0.6666666666666666,
"#ed7953"
],
[
0.7777777777777778,
"#fb9f3a"
],
[
0.8888888888888888,
"#fdca26"
],
[
1,
"#f0f921"
]
]
},
"colorway": [
"#636efa",
"#EF553B",
"#00cc96",
"#ab63fa",
"#FFA15A",
"#19d3f3",
"#FF6692",
"#B6E880",
"#FF97FF",
"#FECB52"
],
"font": {
"color": "#2a3f5f"
},
"geo": {
"bgcolor": "white",
"lakecolor": "white",
"landcolor": "#E5ECF6",
"showlakes": true,
"showland": true,
"subunitcolor": "white"
},
"hoverlabel": {
"align": "left"
},
"hovermode": "closest",
"mapbox": {
"style": "light"
},
"paper_bgcolor": "white",
"plot_bgcolor": "#E5ECF6",
"polar": {
"angularaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"radialaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"scene": {
"xaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"yaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
},
"zaxis": {
"backgroundcolor": "#E5ECF6",
"gridcolor": "white",
"gridwidth": 2,
"linecolor": "white",
"showbackground": true,
"ticks": "",
"zerolinecolor": "white"
}
},
"shapedefaults": {
"line": {
"color": "#2a3f5f"
}
},
"ternary": {
"aaxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"baxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
},
"bgcolor": "#E5ECF6",
"caxis": {
"gridcolor": "white",
"linecolor": "white",
"ticks": ""
}
},
"title": {
"x": 0.05
},
"xaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
},
"yaxis": {
"automargin": true,
"gridcolor": "white",
"linecolor": "white",
"ticks": "",
"title": {
"standoff": 15
},
"zerolinecolor": "white",
"zerolinewidth": 2
}
}
},
"title": {
"text": "Matrice de corrélation des variables numériques"
},
"xaxis": {
"anchor": "y",
"domain": [
0,
1
]
},
"yaxis": {
"anchor": "x",
"autorange": "reversed",
"domain": [
0,
1
]
}
}
}
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"vars_numeriques.corr()\n",
"fig = px.imshow(vars_numeriques.corr(),\n",
" text_auto=True,\n",
" aspect=\"auto\",\n",
" color_continuous_scale='RdBu_r',\n",
" title='Matrice de corrélation des variables numériques')\n",
"fig.show()"
]
},
{
"cell_type": "markdown",
"id": "98c7dba6",
"metadata": {},
"source": [
"**Question :** quels sont vos commentaires ?"
]
},
{
"cell_type": "markdown",
"id": "67406b54",
"metadata": {},
"source": [
"*Réponse*: Aucune des variables ne semblent corrélées."
]
},
{
"cell_type": "markdown",
"id": "212209ec",
"metadata": {},
"source": [
"#### Preprocessing"
]
},
{
"cell_type": "markdown",
"id": "65aca700",
"metadata": {},
"source": [
"Deux étapes sont nécessaires avant de lancer l'apprentissage d'un modèle, c'est ce qu'on connait comme le *Preprocessing* :\n",
"\n",
"* Les modèles proposés par la librairie \"sklearn\" ne gèrent que des variables numériques. Il est donc nécessaire de transformer les variables catégorielles en variables numériques : ce processus s'appelle le *One Hot Encoding*.\n",
"* Normaliser les données numériques"
]
},
{
"cell_type": "markdown",
"id": "95f5cc9f",
"metadata": {},
"source": [
"**Exercice :** proposez un bout de code permettant de réaliser le One Hot Encoding des variables catégorielles. Vous pourrez utiliser la fonction \"preproc.OneHotEncoder\" de la librairie sklearn"
]
},
{
"cell_type": "code",
"execution_count": 101,
"id": "b8530717",
"metadata": {},
"outputs": [],
"source": [
"encoder = preproc.OneHotEncoder()\n",
"encoder.fit(vars_categorielles)\n",
"vars_categorielles_enc = encoder.transform(vars_categorielles)\n",
"vars_categorielles_enc = pd.DataFrame(vars_categorielles_enc.toarray(), columns=encoder.get_feature_names_out(vars_categorielles.columns)) # type: ignore"
]
},
{
"cell_type": "markdown",
"id": "b70abc5c",
"metadata": {},
"source": [
"**Exercice :** proposez un bout de code permettant normaliser les variables numériques présentes dans la base. Vous pourrez utiliser la fonction \"preproc.StandardScaler\" de la librairie sklearn"
]
},
{
"cell_type": "code",
"execution_count": 102,
"id": "4ff3847d",
"metadata": {},
"outputs": [],
"source": [
"scaler = preproc.StandardScaler()\n",
"scaler.fit(vars_numeriques)\n",
"vars_numeriques_scaled = scaler.transform(vars_numeriques)\n",
"vars_numeriques_scaled = pd.DataFrame(vars_numeriques_scaled, columns=vars_numeriques.columns)"
]
},
{
"cell_type": "code",
"execution_count": 117,
"id": "128d4a36",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(824, 46)\n"
]
},
{
"data": {
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
"columns": [
{
"name": "index",
"rawType": "int64",
"type": "integer"
},
{
"name": "ANNEE_CTR",
"rawType": "float64",
"type": "float"
},
{
"name": "AGE_ASSURE_PRINCIPAL",
"rawType": "float64",
"type": "float"
},
{
"name": "ANCIENNETE_PERMIS",
"rawType": "float64",
"type": "float"
},
{
"name": "ANNEE_CONSTRUCTION",
"rawType": "float64",
"type": "float"
},
{
"name": "CONTRAT_ANCIENNETE_(-1,0]",
"rawType": "float64",
"type": "float"
},
{
"name": "CONTRAT_ANCIENNETE_(0,1]",
"rawType": "float64",
"type": "float"
},
{
"name": "CONTRAT_ANCIENNETE_(1,2]",
"rawType": "float64",
"type": "float"
},
{
"name": "CONTRAT_ANCIENNETE_(2,5]",
"rawType": "float64",
"type": "float"
},
{
"name": "CONTRAT_ANCIENNETE_(5,10]",
"rawType": "float64",
"type": "float"
},
{
"name": "FREQUENCE_PAIEMENT_COTISATION_ANNUEL",
"rawType": "float64",
"type": "float"
},
{
"name": "FREQUENCE_PAIEMENT_COTISATION_MENSUEL",
"rawType": "float64",
"type": "float"
},
{
"name": "FREQUENCE_PAIEMENT_COTISATION_TRIMESTRIEL",
"rawType": "float64",
"type": "float"
},
{
"name": "GROUPE_KM_[0;20000[",
"rawType": "float64",
"type": "float"
},
{
"name": "GROUPE_KM_[20000;40000[",
"rawType": "float64",
"type": "float"
},
{
"name": "GROUPE_KM_[40000;60000[",
"rawType": "float64",
"type": "float"
},
{
"name": "GROUPE_KM_[60000;99999[",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_A",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_B",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_C",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_D",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_E",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_F",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_G",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_H",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_I",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_J",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_K",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_L",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_M",
"rawType": "float64",
"type": "float"
},
{
"name": "ZONE_RISQUE_T",
"rawType": "float64",
"type": "float"
},
{
"name": "GENRE_F",
"rawType": "float64",
"type": "float"
},
{
"name": "GENRE_M",
"rawType": "float64",
"type": "float"
},
{
"name": "DEUXIEME_CONDUCTEUR_False",
"rawType": "float64",
"type": "float"
},
{
"name": "DEUXIEME_CONDUCTEUR_True",
"rawType": "float64",
"type": "float"
},
{
"name": "ENERGIE_AUTRE",
"rawType": "float64",
"type": "float"
},
{
"name": "ENERGIE_DIESEL",
"rawType": "float64",
"type": "float"
},
{
"name": "ENERGIE_ESSENCE",
"rawType": "float64",
"type": "float"
},
{
"name": "EQUIPEMENT_SECURITE_FAUX",
"rawType": "float64",
"type": "float"
},
{
"name": "EQUIPEMENT_SECURITE_VRAI",
"rawType": "float64",
"type": "float"
},
{
"name": "VALEUR_DU_BIEN_[0;10000[",
"rawType": "float64",
"type": "float"
},
{
"name": "VALEUR_DU_BIEN_[10000;15000[",
"rawType": "float64",
"type": "float"
},
{
"name": "VALEUR_DU_BIEN_[15000;20000[",
"rawType": "float64",
"type": "float"
},
{
"name": "VALEUR_DU_BIEN_[20000;25000[",
"rawType": "float64",
"type": "float"
},
{
"name": "VALEUR_DU_BIEN_[25000;35000[",
"rawType": "float64",
"type": "float"
},
{
"name": "VALEUR_DU_BIEN_[35000;99999[",
"rawType": "float64",
"type": "float"
},
{
"name": "CM",
"rawType": "float64",
"type": "float"
}
],
"ref": "85e30838-5a51-4c2c-8483-c3033e7d9195",
"rows": [
[
"0",
"0.40615626262983295",
"-0.31764836563527515",
"0.067767057718506",
"0.5653698304986595",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1072.98"
],
[
"1",
"1.06626032654885",
"-1.2596885906311412",
"-1.1719751563806404",
"0.8816391722032739",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"3750.0"
],
[
"2",
"0.40615626262983295",
"-1.839405652167059",
"-1.740190337842749",
"0.5653698304986595",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1838.49"
],
[
"3",
"0.40615626262983295",
"-0.31764836563527515",
"0.48101446241822143",
"0.8816391722032739",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"4892.74"
],
[
"4",
"-0.25394780128918387",
"-1.7669410194750692",
"-1.2752870075555691",
"-0.38343819461518397",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"166.73"
],
[
"5",
"-0.9140518652082007",
"-1.332153223323131",
"-1.5335666354928914",
"-0.6997075363197984",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"4859.58"
],
[
"6",
"-0.25394780128918387",
"-0.31764836563527515",
"-0.7587277516809249",
"-0.38343819461518397",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"2160.98"
],
[
"7",
"-0.25394780128918387",
"0.4069979612846219",
"-0.34548034698120944",
"-1.015976878024413",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"2316.165"
],
[
"8",
"-1.5741559291272176",
"-0.8249007944792031",
"-0.8103836772683893",
"-0.38343819461518397",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1603.99"
],
[
"9",
"0.40615626262983295",
"1.856290615124416",
"0.7392940903555436",
"0.8816391722032739",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1653.21"
],
[
"10",
"-0.25394780128918387",
"1.7838259824324263",
"1.4624770485800456",
"0.5653698304986595",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"3537.32"
],
[
"11",
"0.40615626262983295",
"-0.17271910025129572",
"-0.34548034698120944",
"0.24910048879404498",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1531.35"
],
[
"12",
"-0.9140518652082007",
"0.2620686959006425",
"0.6876381647680792",
"-0.6997075363197984",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"26196.5"
],
[
"13",
"1.7263643904678667",
"-1.0422946925551722",
"-1.2236310819681047",
"0.5653698304986595",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"8130.34"
],
[
"14",
"0.40615626262983295",
"0.8417857574365601",
"1.3075092718176524",
"0.5653698304986595",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"7281.26"
],
[
"15",
"0.40615626262983295",
"0.2620686959006425",
"0.48101446241822143",
"0.8816391722032739",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"835.17"
],
[
"16",
"1.06626032654885",
"2.0736845132003854",
"1.617444825342439",
"0.8816391722032739",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"7598.7"
],
[
"17",
"-1.5741559291272176",
"-0.24518373294328544",
"-0.39713627256867384",
"-1.9647849031382563",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"6518.33"
],
[
"18",
"-0.9140518652082007",
"3.3780479016562",
"0.9975737182928658",
"-5.127478320184401",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"881.52"
],
[
"19",
"1.06626032654885",
"-0.7524361617872134",
"0.3260466856558282",
"0.8816391722032739",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"3955.825"
],
[
"20",
"-1.5741559291272176",
"0.9867150228205396",
"0.2743907600683637",
"0.5653698304986595",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"3309.14"
],
[
"21",
"1.06626032654885",
"0.8417857574365601",
"-0.13885664463135172",
"0.8816391722032739",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"157.95"
],
[
"22",
"-1.5741559291272176",
"0.9142503901285499",
"1.255853346230188",
"-0.38343819461518397",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"3073.62"
],
[
"23",
"0.40615626262983295",
"2.7258662074282927",
"1.51413297416751",
"-0.38343819461518397",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"4719.99"
],
[
"24",
"0.40615626262983295",
"0.2620686959006425",
"0.3260466856558282",
"0.5653698304986595",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"8519.2"
],
[
"25",
"0.40615626262983295",
"-1.6220117540910899",
"-1.2236310819681047",
"-1.3322462197290275",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"3750.0"
],
[
"26",
"1.7263643904678667",
"-0.24518373294328544",
"-0.035544793456422856",
"0.24910048879404498",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"819.0"
],
[
"27",
"-1.5741559291272176",
"0.11713943051666309",
"-0.39713627256867384",
"-0.38343819461518397",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"5141.66"
],
[
"28",
"0.40615626262983295",
"-0.6799715290952236",
"-1.4302547843179625",
"1.1979085139078884",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"8087.1"
],
[
"29",
"-0.25394780128918387",
"-0.31764836563527515",
"0.3260466856558282",
"-1.015976878024413",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1985.24"
],
[
"30",
"-0.25394780128918387",
"-1.2596885906311412",
"-1.2236310819681047",
"0.5653698304986595",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"166.73"
],
[
"31",
"-0.9140518652082007",
"-1.4046178560151208",
"-1.0686633052057115",
"-0.38343819461518397",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1360.63"
],
[
"32",
"-0.25394780128918387",
"-1.6220117540910899",
"-1.3269429331430336",
"-1.015976878024413",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1045.92"
],
[
"33",
"-0.9140518652082007",
"-0.8973654271711928",
"-1.2236310819681047",
"-0.6997075363197984",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"3168.47"
],
[
"34",
"-1.5741559291272176",
"-1.2596885906311412",
"-1.3269429331430336",
"-1.3322462197290275",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"3064.59"
],
[
"35",
"-0.9140518652082007",
"0.8417857574365601",
"-0.19051257021881615",
"-0.38343819461518397",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1797.13"
],
[
"36",
"0.40615626262983295",
"-0.46257763101925453",
"0.48101446241822143",
"0.5653698304986595",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"6445.05"
],
[
"37",
"0.40615626262983295",
"0.33453332859263224",
"-1.2752870075555691",
"0.5653698304986595",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"6134.28"
],
[
"38",
"-1.5741559291272176",
"-0.9698300598631825",
"-1.0686633052057115",
"-0.0671688529105695",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"16466.86"
],
[
"39",
"1.7263643904678667",
"-0.8249007944792031",
"-0.9136955284433181",
"1.1979085139078884",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"3750.0"
],
[
"40",
"-0.9140518652082007",
"-1.0422946925551722",
"-1.120319230793176",
"0.24910048879404498",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"8269.76"
],
[
"41",
"1.06626032654885",
"0.5519272266686014",
"-0.6554159005059961",
"-0.38343819461518397",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"5018.84"
],
[
"42",
"0.40615626262983295",
"-0.027789834867316315",
"0.6876381647680792",
"-1.015976878024413",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"3750.0"
],
[
"43",
"-0.25394780128918387",
"-0.027789834867316315",
"1.152541495055259",
"0.24910048879404498",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1175.34"
],
[
"44",
"-0.9140518652082007",
"2.2910784112763545",
"1.6691007509299034",
"-0.38343819461518397",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"759.22"
],
[
"45",
"-1.5741559291272176",
"0.4069979612846219",
"1.2041974206427235",
"-0.0671688529105695",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"9043.6"
],
[
"46",
"1.06626032654885",
"1.2765735535884983",
"1.255853346230188",
"0.24910048879404498",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"3750.0"
],
[
"47",
"-0.9140518652082007",
"1.349038186280488",
"-0.34548034698120944",
"-3.5461316116613286",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1679.02"
],
[
"48",
"0.40615626262983295",
"0.2620686959006425",
"1.4624770485800456",
"0.8816391722032739",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"1.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"6275.67"
],
[
"49",
"-0.25394780128918387",
"0.04467479782467339",
"1.4624770485800456",
"0.5653698304986595",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"1.0",
"0.0",
"0.0",
"0.0",
"0.0",
"1.0",
"0.0",
"7.5"
]
],
"shape": {
"columns": 46,
"rows": 824
}
},
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ANNEE_CTR</th>\n",
" <th>AGE_ASSURE_PRINCIPAL</th>\n",
" <th>ANCIENNETE_PERMIS</th>\n",
" <th>ANNEE_CONSTRUCTION</th>\n",
" <th>CONTRAT_ANCIENNETE_(-1,0]</th>\n",
" <th>CONTRAT_ANCIENNETE_(0,1]</th>\n",
" <th>CONTRAT_ANCIENNETE_(1,2]</th>\n",
" <th>CONTRAT_ANCIENNETE_(2,5]</th>\n",
" <th>CONTRAT_ANCIENNETE_(5,10]</th>\n",
" <th>FREQUENCE_PAIEMENT_COTISATION_ANNUEL</th>\n",
" <th>...</th>\n",
" <th>ENERGIE_ESSENCE</th>\n",
" <th>EQUIPEMENT_SECURITE_FAUX</th>\n",
" <th>EQUIPEMENT_SECURITE_VRAI</th>\n",
" <th>VALEUR_DU_BIEN_[0;10000[</th>\n",
" <th>VALEUR_DU_BIEN_[10000;15000[</th>\n",
" <th>VALEUR_DU_BIEN_[15000;20000[</th>\n",
" <th>VALEUR_DU_BIEN_[20000;25000[</th>\n",
" <th>VALEUR_DU_BIEN_[25000;35000[</th>\n",
" <th>VALEUR_DU_BIEN_[35000;99999[</th>\n",
" <th>CM</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.406156</td>\n",
" <td>-0.317648</td>\n",
" <td>0.067767</td>\n",
" <td>0.565370</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1072.980</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.066260</td>\n",
" <td>-1.259689</td>\n",
" <td>-1.171975</td>\n",
" <td>0.881639</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>3750.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.406156</td>\n",
" <td>-1.839406</td>\n",
" <td>-1.740190</td>\n",
" <td>0.565370</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1838.490</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.406156</td>\n",
" <td>-0.317648</td>\n",
" <td>0.481014</td>\n",
" <td>0.881639</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>4892.740</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-0.253948</td>\n",
" <td>-1.766941</td>\n",
" <td>-1.275287</td>\n",
" <td>-0.383438</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>166.730</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>819</th>\n",
" <td>-0.914052</td>\n",
" <td>0.406998</td>\n",
" <td>0.894262</td>\n",
" <td>-2.597324</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1216.755</td>\n",
" </tr>\n",
" <tr>\n",
" <th>820</th>\n",
" <td>-0.253948</td>\n",
" <td>0.406998</td>\n",
" <td>1.565789</td>\n",
" <td>0.249100</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>2071.560</td>\n",
" </tr>\n",
" <tr>\n",
" <th>821</th>\n",
" <td>0.406156</td>\n",
" <td>-1.766941</td>\n",
" <td>-1.533567</td>\n",
" <td>0.565370</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>5077.640</td>\n",
" </tr>\n",
" <tr>\n",
" <th>822</th>\n",
" <td>-0.253948</td>\n",
" <td>-1.766941</td>\n",
" <td>-1.275287</td>\n",
" <td>-1.648516</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>5228.550</td>\n",
" </tr>\n",
" <tr>\n",
" <th>823</th>\n",
" <td>1.066260</td>\n",
" <td>0.406998</td>\n",
" <td>0.067767</td>\n",
" <td>0.565370</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>...</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>5880.340</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>824 rows × 46 columns</p>\n",
"</div>"
],
"text/plain": [
" ANNEE_CTR AGE_ASSURE_PRINCIPAL ANCIENNETE_PERMIS ANNEE_CONSTRUCTION \\\n",
"0 0.406156 -0.317648 0.067767 0.565370 \n",
"1 1.066260 -1.259689 -1.171975 0.881639 \n",
"2 0.406156 -1.839406 -1.740190 0.565370 \n",
"3 0.406156 -0.317648 0.481014 0.881639 \n",
"4 -0.253948 -1.766941 -1.275287 -0.383438 \n",
".. ... ... ... ... \n",
"819 -0.914052 0.406998 0.894262 -2.597324 \n",
"820 -0.253948 0.406998 1.565789 0.249100 \n",
"821 0.406156 -1.766941 -1.533567 0.565370 \n",
"822 -0.253948 -1.766941 -1.275287 -1.648516 \n",
"823 1.066260 0.406998 0.067767 0.565370 \n",
"\n",
" CONTRAT_ANCIENNETE_(-1,0] CONTRAT_ANCIENNETE_(0,1] \\\n",
"0 0.0 1.0 \n",
"1 1.0 0.0 \n",
"2 1.0 0.0 \n",
"3 1.0 0.0 \n",
"4 0.0 0.0 \n",
".. ... ... \n",
"819 0.0 0.0 \n",
"820 0.0 1.0 \n",
"821 0.0 0.0 \n",
"822 0.0 1.0 \n",
"823 0.0 0.0 \n",
"\n",
" CONTRAT_ANCIENNETE_(1,2] CONTRAT_ANCIENNETE_(2,5] \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 0.0 0.0 \n",
"4 1.0 0.0 \n",
".. ... ... \n",
"819 1.0 0.0 \n",
"820 0.0 0.0 \n",
"821 1.0 0.0 \n",
"822 0.0 0.0 \n",
"823 0.0 1.0 \n",
"\n",
" CONTRAT_ANCIENNETE_(5,10] FREQUENCE_PAIEMENT_COTISATION_ANNUEL ... \\\n",
"0 0.0 0.0 ... \n",
"1 0.0 0.0 ... \n",
"2 0.0 0.0 ... \n",
"3 0.0 0.0 ... \n",
"4 0.0 0.0 ... \n",
".. ... ... ... \n",
"819 0.0 0.0 ... \n",
"820 0.0 0.0 ... \n",
"821 0.0 0.0 ... \n",
"822 0.0 0.0 ... \n",
"823 0.0 0.0 ... \n",
"\n",
" ENERGIE_ESSENCE EQUIPEMENT_SECURITE_FAUX EQUIPEMENT_SECURITE_VRAI \\\n",
"0 1.0 0.0 1.0 \n",
"1 0.0 1.0 0.0 \n",
"2 1.0 0.0 1.0 \n",
"3 0.0 1.0 0.0 \n",
"4 1.0 1.0 0.0 \n",
".. ... ... ... \n",
"819 0.0 0.0 1.0 \n",
"820 1.0 1.0 0.0 \n",
"821 0.0 0.0 1.0 \n",
"822 0.0 1.0 0.0 \n",
"823 0.0 1.0 0.0 \n",
"\n",
" VALEUR_DU_BIEN_[0;10000[ VALEUR_DU_BIEN_[10000;15000[ \\\n",
"0 0.0 0.0 \n",
"1 0.0 0.0 \n",
"2 1.0 0.0 \n",
"3 0.0 0.0 \n",
"4 0.0 0.0 \n",
".. ... ... \n",
"819 0.0 0.0 \n",
"820 0.0 1.0 \n",
"821 0.0 0.0 \n",
"822 0.0 1.0 \n",
"823 0.0 0.0 \n",
"\n",
" VALEUR_DU_BIEN_[15000;20000[ VALEUR_DU_BIEN_[20000;25000[ \\\n",
"0 1.0 0.0 \n",
"1 0.0 0.0 \n",
"2 0.0 0.0 \n",
"3 1.0 0.0 \n",
"4 0.0 0.0 \n",
".. ... ... \n",
"819 0.0 1.0 \n",
"820 0.0 0.0 \n",
"821 1.0 0.0 \n",
"822 0.0 0.0 \n",
"823 1.0 0.0 \n",
"\n",
" VALEUR_DU_BIEN_[25000;35000[ VALEUR_DU_BIEN_[35000;99999[ CM \n",
"0 0.0 0.0 1072.980 \n",
"1 0.0 1.0 3750.000 \n",
"2 0.0 0.0 1838.490 \n",
"3 0.0 0.0 4892.740 \n",
"4 1.0 0.0 166.730 \n",
".. ... ... ... \n",
"819 0.0 0.0 1216.755 \n",
"820 0.0 0.0 2071.560 \n",
"821 0.0 0.0 5077.640 \n",
"822 0.0 0.0 5228.550 \n",
"823 0.0 0.0 5880.340 \n",
"\n",
"[824 rows x 46 columns]"
]
},
"execution_count": 117,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Concatenate the transformed variables\n",
"data_model_preprocessed = pd.concat([vars_numeriques_scaled, vars_categorielles_enc], axis=1) # type: ignore\n",
"\n",
"# Add the CM column (target variable) to get 824x46 shape\n",
"data_model_preprocessed['CM'] = data_model['CM'].values\n",
"\n",
"print(data_model_preprocessed.shape)\n",
"data_model_preprocessed"
]
},
{
"cell_type": "markdown",
"id": "62d49546",
"metadata": {},
"source": [
"#### Sampling"
]
},
{
"cell_type": "markdown",
"id": "64d229f4",
"metadata": {},
"source": [
"**Exercice :** proposez un bout de code permettant construire la base d'apprentissage (80% des données) et la base de test (20%)."
]
},
{
"cell_type": "code",
"execution_count": 118,
"id": "6a1c7907",
"metadata": {},
"outputs": [],
"source": [
"train, test = train_test_split(data_model_preprocessed, test_size=0.2, random_state=42)"
]
},
{
"cell_type": "markdown",
"id": "84dc7a07",
"metadata": {},
"source": [
"#### Fitting"
]
},
{
"cell_type": "markdown",
"id": "97c7b783",
"metadata": {},
"source": [
"**Exercice :** proposez un bout de code permettant construire le modèle"
]
},
{
"cell_type": "code",
"execution_count": 121,
"id": "053e013c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<style>#sk-container-id-2 {\n",
" /* Definition of color scheme common for light and dark mode */\n",
" --sklearn-color-text: #000;\n",
" --sklearn-color-text-muted: #666;\n",
" --sklearn-color-line: gray;\n",
" /* Definition of color scheme for unfitted estimators */\n",
" --sklearn-color-unfitted-level-0: #fff5e6;\n",
" --sklearn-color-unfitted-level-1: #f6e4d2;\n",
" --sklearn-color-unfitted-level-2: #ffe0b3;\n",
" --sklearn-color-unfitted-level-3: chocolate;\n",
" /* Definition of color scheme for fitted estimators */\n",
" --sklearn-color-fitted-level-0: #f0f8ff;\n",
" --sklearn-color-fitted-level-1: #d4ebff;\n",
" --sklearn-color-fitted-level-2: #b3dbfd;\n",
" --sklearn-color-fitted-level-3: cornflowerblue;\n",
"\n",
" /* Specific color for light theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, white)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, black)));\n",
" --sklearn-color-icon: #696969;\n",
"\n",
" @media (prefers-color-scheme: dark) {\n",
" /* Redefinition of color scheme for dark theme */\n",
" --sklearn-color-text-on-default-background: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-background: var(--sg-background-color, var(--theme-background, var(--jp-layout-color0, #111)));\n",
" --sklearn-color-border-box: var(--sg-text-color, var(--theme-code-foreground, var(--jp-content-font-color1, white)));\n",
" --sklearn-color-icon: #878787;\n",
" }\n",
"}\n",
"\n",
"#sk-container-id-2 {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"#sk-container-id-2 pre {\n",
" padding: 0;\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-hidden--visually {\n",
" border: 0;\n",
" clip: rect(1px 1px 1px 1px);\n",
" clip: rect(1px, 1px, 1px, 1px);\n",
" height: 1px;\n",
" margin: -1px;\n",
" overflow: hidden;\n",
" padding: 0;\n",
" position: absolute;\n",
" width: 1px;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-dashed-wrapped {\n",
" border: 1px dashed var(--sklearn-color-line);\n",
" margin: 0 0.4em 0.5em 0.4em;\n",
" box-sizing: border-box;\n",
" padding-bottom: 0.4em;\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-container {\n",
" /* jupyter's `normalize.less` sets `[hidden] { display: none; }`\n",
" but bootstrap.min.css set `[hidden] { display: none !important; }`\n",
" so we also need the `!important` here to be able to override the\n",
" default hidden behavior on the sphinx rendered scikit-learn.org.\n",
" See: https://github.com/scikit-learn/scikit-learn/issues/21755 */\n",
" display: inline-block !important;\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-text-repr-fallback {\n",
" display: none;\n",
"}\n",
"\n",
"div.sk-parallel-item,\n",
"div.sk-serial,\n",
"div.sk-item {\n",
" /* draw centered vertical line to link estimators */\n",
" background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));\n",
" background-size: 2px 100%;\n",
" background-repeat: no-repeat;\n",
" background-position: center center;\n",
"}\n",
"\n",
"/* Parallel-specific style estimator block */\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item::after {\n",
" content: \"\";\n",
" width: 100%;\n",
" border-bottom: 2px solid var(--sklearn-color-text-on-default-background);\n",
" flex-grow: 1;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel {\n",
" display: flex;\n",
" align-items: stretch;\n",
" justify-content: center;\n",
" background-color: var(--sklearn-color-background);\n",
" position: relative;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item {\n",
" display: flex;\n",
" flex-direction: column;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:first-child::after {\n",
" align-self: flex-end;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:last-child::after {\n",
" align-self: flex-start;\n",
" width: 50%;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-parallel-item:only-child::after {\n",
" width: 0;\n",
"}\n",
"\n",
"/* Serial-specific style estimator block */\n",
"\n",
"#sk-container-id-2 div.sk-serial {\n",
" display: flex;\n",
" flex-direction: column;\n",
" align-items: center;\n",
" background-color: var(--sklearn-color-background);\n",
" padding-right: 1em;\n",
" padding-left: 1em;\n",
"}\n",
"\n",
"\n",
"/* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is\n",
"clickable and can be expanded/collapsed.\n",
"- Pipeline and ColumnTransformer use this feature and define the default style\n",
"- Estimators will overwrite some part of the style using the `sk-estimator` class\n",
"*/\n",
"\n",
"/* Pipeline and ColumnTransformer style (default) */\n",
"\n",
"#sk-container-id-2 div.sk-toggleable {\n",
" /* Default theme specific background. It is overwritten whether we have a\n",
" specific estimator or a Pipeline/ColumnTransformer */\n",
" background-color: var(--sklearn-color-background);\n",
"}\n",
"\n",
"/* Toggleable label */\n",
"#sk-container-id-2 label.sk-toggleable__label {\n",
" cursor: pointer;\n",
" display: flex;\n",
" width: 100%;\n",
" margin-bottom: 0;\n",
" padding: 0.5em;\n",
" box-sizing: border-box;\n",
" text-align: center;\n",
" align-items: start;\n",
" justify-content: space-between;\n",
" gap: 0.5em;\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label .caption {\n",
" font-size: 0.6rem;\n",
" font-weight: lighter;\n",
" color: var(--sklearn-color-text-muted);\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label-arrow:before {\n",
" /* Arrow on the left of the label */\n",
" content: \"▸\";\n",
" float: left;\n",
" margin-right: 0.25em;\n",
" color: var(--sklearn-color-icon);\n",
"}\n",
"\n",
"#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {\n",
" color: var(--sklearn-color-text);\n",
"}\n",
"\n",
"/* Toggleable content - dropdown */\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content {\n",
" max-height: 0;\n",
" max-width: 0;\n",
" overflow: hidden;\n",
" text-align: left;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content pre {\n",
" margin: 0.2em;\n",
" border-radius: 0.25em;\n",
" color: var(--sklearn-color-text);\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-toggleable__content.fitted pre {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {\n",
" /* Expand drop-down */\n",
" max-height: 200px;\n",
" max-width: 100%;\n",
" overflow: auto;\n",
"}\n",
"\n",
"#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {\n",
" content: \"▾\";\n",
"}\n",
"\n",
"/* Pipeline/ColumnTransformer-specific style */\n",
"\n",
"#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator-specific style */\n",
"\n",
"/* Colorize estimator box */\n",
"#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label label.sk-toggleable__label,\n",
"#sk-container-id-2 div.sk-label label {\n",
" /* The background is the default theme color */\n",
" color: var(--sklearn-color-text-on-default-background);\n",
"}\n",
"\n",
"/* On hover, darken the color of the background */\n",
"#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"/* Label box, darken color on hover, fitted */\n",
"#sk-container-id-2 div.sk-label.fitted:hover label.sk-toggleable__label.fitted {\n",
" color: var(--sklearn-color-text);\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Estimator label */\n",
"\n",
"#sk-container-id-2 div.sk-label label {\n",
" font-family: monospace;\n",
" font-weight: bold;\n",
" display: inline-block;\n",
" line-height: 1.2em;\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-label-container {\n",
" text-align: center;\n",
"}\n",
"\n",
"/* Estimator-specific */\n",
"#sk-container-id-2 div.sk-estimator {\n",
" font-family: monospace;\n",
" border: 1px dotted var(--sklearn-color-border-box);\n",
" border-radius: 0.25em;\n",
" box-sizing: border-box;\n",
" margin-bottom: 0.5em;\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-0);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-0);\n",
"}\n",
"\n",
"/* on hover */\n",
"#sk-container-id-2 div.sk-estimator:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-2);\n",
"}\n",
"\n",
"#sk-container-id-2 div.sk-estimator.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-2);\n",
"}\n",
"\n",
"/* Specification for estimator info (e.g. \"i\" and \"?\") */\n",
"\n",
"/* Common style for \"i\" and \"?\" */\n",
"\n",
".sk-estimator-doc-link,\n",
"a:link.sk-estimator-doc-link,\n",
"a:visited.sk-estimator-doc-link {\n",
" float: right;\n",
" font-size: smaller;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1em;\n",
" height: 1em;\n",
" width: 1em;\n",
" text-decoration: none !important;\n",
" margin-left: 0.5em;\n",
" text-align: center;\n",
" /* unfitted */\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted,\n",
"a:link.sk-estimator-doc-link.fitted,\n",
"a:visited.sk-estimator-doc-link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"div.sk-estimator:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link:hover,\n",
".sk-estimator-doc-link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover,\n",
"div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,\n",
".sk-estimator-doc-link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"/* Span, style for the box shown on hovering the info icon */\n",
".sk-estimator-doc-link span {\n",
" display: none;\n",
" z-index: 9999;\n",
" position: relative;\n",
" font-weight: normal;\n",
" right: .2ex;\n",
" padding: .5ex;\n",
" margin: .5ex;\n",
" width: min-content;\n",
" min-width: 20ex;\n",
" max-width: 50ex;\n",
" color: var(--sklearn-color-text);\n",
" box-shadow: 2pt 2pt 4pt #999;\n",
" /* unfitted */\n",
" background: var(--sklearn-color-unfitted-level-0);\n",
" border: .5pt solid var(--sklearn-color-unfitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link.fitted span {\n",
" /* fitted */\n",
" background: var(--sklearn-color-fitted-level-0);\n",
" border: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"\n",
".sk-estimator-doc-link:hover span {\n",
" display: block;\n",
"}\n",
"\n",
"/* \"?\"-specific style due to the `<a>` HTML tag */\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link {\n",
" float: right;\n",
" font-size: 1rem;\n",
" line-height: 1em;\n",
" font-family: monospace;\n",
" background-color: var(--sklearn-color-background);\n",
" border-radius: 1rem;\n",
" height: 1rem;\n",
" width: 1rem;\n",
" text-decoration: none;\n",
" /* unfitted */\n",
" color: var(--sklearn-color-unfitted-level-1);\n",
" border: var(--sklearn-color-unfitted-level-1) 1pt solid;\n",
"}\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link.fitted {\n",
" /* fitted */\n",
" border: var(--sklearn-color-fitted-level-1) 1pt solid;\n",
" color: var(--sklearn-color-fitted-level-1);\n",
"}\n",
"\n",
"/* On hover */\n",
"#sk-container-id-2 a.estimator_doc_link:hover {\n",
" /* unfitted */\n",
" background-color: var(--sklearn-color-unfitted-level-3);\n",
" color: var(--sklearn-color-background);\n",
" text-decoration: none;\n",
"}\n",
"\n",
"#sk-container-id-2 a.estimator_doc_link.fitted:hover {\n",
" /* fitted */\n",
" background-color: var(--sklearn-color-fitted-level-3);\n",
"}\n",
"</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>DecisionTreeRegressor(max_depth=5, random_state=42)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator fitted sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label fitted sk-toggleable__label-arrow\"><div><div>DecisionTreeRegressor</div></div><div><a class=\"sk-estimator-doc-link fitted\" rel=\"noreferrer\" target=\"_blank\" href=\"https://scikit-learn.org/1.6/modules/generated/sklearn.tree.DecisionTreeRegressor.html\">?<span>Documentation for DecisionTreeRegressor</span></a><span class=\"sk-estimator-doc-link fitted\">i<span>Fitted</span></span></div></label><div class=\"sk-toggleable__content fitted\"><pre>DecisionTreeRegressor(max_depth=5, random_state=42)</pre></div> </div></div></div></div>"
],
"text/plain": [
"DecisionTreeRegressor(max_depth=5, random_state=42)"
]
},
"execution_count": 121,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tree = DecisionTreeRegressor(max_depth=5, random_state=42)\n",
"tree.fit(train.drop(\"CM\", axis=1), train[\"CM\"])"
]
},
{
"cell_type": "markdown",
"id": "8d624704",
"metadata": {},
"source": [
"**Exercice :** proposez un bout de code permettant d'évaluer les performances du modèle (MAE, MSE et RMSE)"
]
},
{
"cell_type": "code",
"execution_count": 125,
"id": "c4ca2cf9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MAE: 3683.84\n",
"MSE: 55216550.75\n",
"RMSE: 7430.78\n"
]
}
],
"source": [
"y_pred = tree.predict(test.drop(\"CM\", axis=1))\n",
"\n",
"mae = metrics.mean_absolute_error(test[\"CM\"], y_pred)\n",
"mse = metrics.mean_squared_error(test[\"CM\"], y_pred)\n",
"rmse = metrics.root_mean_squared_error(test[\"CM\"], y_pred)\n",
"\n",
"print(f\"MAE: {mae:.2f}\")\n",
"print(f\"MSE: {mse:.2f}\")\n",
"print(f\"RMSE: {rmse:.2f}\")"
]
},
{
"cell_type": "markdown",
"id": "fb2fe98c",
"metadata": {},
"source": [
"**Question :** que pensez-vous des performances de ce modèle ?"
]
},
{
"cell_type": "markdown",
"id": "7ecba832",
"metadata": {},
"source": [
"## Algorithme supervisé : Random Forest "
]
},
{
"cell_type": "markdown",
"id": "efcb8987",
"metadata": {},
"source": [
"A ce stade, nous avons vu les différentes étapes pour lancer un algorithme de Machine Learning. Néanmoins, ces étapes ne sont pas suffisantes pour construire un modèle performant. \n",
"En effet, afin de construire un modèle performant le Data Scientist doit agir sur l'apprentissage du modèle. Dans ce qui suit nous :\n",
"* Changerons d'algorithme pour utiliser un algorithme plus performant (Random Forest)\n",
"* Raliserons un *grid search* sur les paramètres du modèle\n",
"* Appliquerons l'apprentissage par validation croisée\n"
]
},
{
"cell_type": "markdown",
"id": "d6723a2f",
"metadata": {},
"source": [
"### Modèle avec Validation Croisée"
]
},
{
"cell_type": "markdown",
"id": "3716b09f",
"metadata": {},
"source": [
"#### Sampling"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab1e1367",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "3f5d735e",
"metadata": {},
"source": [
"#### Fitting avec Cross-Validation"
]
},
{
"cell_type": "markdown",
"id": "bc819f8f",
"metadata": {},
"source": [
"**Exercice :** construisez un modèle RF (RandomForestRegressor) en implémentant la technique de validation croisée. Pensez à enregistrer au sein d'une variable/liste les performances (MAE, MSE & RMSE) du modèle au sein de chaque fold."
]
},
{
"cell_type": "code",
"execution_count": 106,
"id": "b515460e",
"metadata": {},
"outputs": [],
"source": [
"#Initialisation\n",
"# Nombre de sous-échantillons pour la cross-validation\n",
"num_splits = 5\n",
"\n",
"# Random Forest regressor\n",
"rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)\n",
"\n",
"# Initialisation du KFold cross-validation splitter\n",
"kf = KFold(n_splits=num_splits)\n",
"\n",
"# Listes pour enregistrer les performances du modèle\n",
"MAE_scores = []\n",
"MSE_scores = []\n",
"RMSE_scores = []"
]
},
{
"cell_type": "code",
"execution_count": 107,
"id": "eebb394f",
"metadata": {},
"outputs": [],
"source": [
"# Entrainement avec cross-validation\n"
]
},
{
"cell_type": "code",
"execution_count": 108,
"id": "b067126c",
"metadata": {},
"outputs": [],
"source": [
"# Métriques sur tous les folds\n",
"\n",
"#MAE\n",
"for fold, mae in enumerate(MAE_scores, start=1):\n",
" print(f\"Fold {fold} MAE:\", mae)"
]
},
{
"cell_type": "code",
"execution_count": 109,
"id": "6597152c",
"metadata": {},
"outputs": [],
"source": [
"#MSE\n",
"for fold, mse in enumerate(MSE_scores, start=1):\n",
" print(f\"Fold {fold} MSE:\", mse)"
]
},
{
"cell_type": "code",
"execution_count": 110,
"id": "63ff1c9d",
"metadata": {},
"outputs": [],
"source": [
"#RMSE\n",
"for fold, rmse in enumerate(RMSE_scores, start=1):\n",
" print(f\"Fold {fold} RMSE:\", rmse)"
]
},
{
"cell_type": "markdown",
"id": "ec1961c2",
"metadata": {},
"source": [
"**Question :** Commentez les résultats."
]
},
{
"cell_type": "markdown",
"id": "5a8163ef",
"metadata": {},
"source": [
"### Ajout d'un Grid Search pour les hyper paramètres"
]
},
{
"cell_type": "markdown",
"id": "5a6adbfe",
"metadata": {},
"source": [
"#### Sampling"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d9342ad6",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "dce52b11",
"metadata": {},
"source": [
"#### Fitting avec Cross-Validation et *Grid Search*"
]
},
{
"cell_type": "markdown",
"id": "7e3a9dd0",
"metadata": {},
"source": [
"**Exercice :** Intégrez la technique de Grid Search pour rechercher les paramètres optimaux du modèle."
]
},
{
"cell_type": "code",
"execution_count": 111,
"id": "6d58dbc2",
"metadata": {},
"outputs": [],
"source": [
"#Initialisation\n",
"# Nombre de sous-échantillons pour la cross-validation\n",
"num_splits = 5\n",
"\n",
"# Initialisation du KFold cross-validation splitter\n",
"kf = KFold(n_splits=num_splits)\n",
"\n",
"# Listes pour enregistrer les performances du modèle\n",
"MAE_scores = []\n",
"MSE_scores = []\n",
"RMSE_scores = []\n",
"\n",
"# Hyperparamètres à tester\n",
"n_estimators_values = [] #Complétez ici par les paramètres à tester\n",
"max_depth_values = [] #Complétez ici par les paramètres à tester\n",
"min_samples_split_values = [] #Complétez ici par les paramètres à tester\n",
"\n",
"# Liste pour sauveagrder les meilleurs résultats\n",
"best_score = np.inf\n",
"best_params = {}\n",
"\n",
"MAE_best_score = []\n",
"MSE_best_score = []\n",
"RMSE_best_score = []"
]
},
{
"cell_type": "code",
"execution_count": 112,
"id": "47da5172",
"metadata": {},
"outputs": [],
"source": [
"#Complétez ici avec votre code"
]
},
{
"cell_type": "code",
"execution_count": 113,
"id": "d4936c46",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Meilleurs paramètres: {}\n",
"Meilleure RMSE : inf\n"
]
}
],
"source": [
"# Meilleurs résultats\n",
"print(\"Meilleurs paramètres:\", best_params)\n",
"print(\"Meilleure RMSE :\", best_score)"
]
},
{
"cell_type": "code",
"execution_count": 114,
"id": "3215c463",
"metadata": {},
"outputs": [],
"source": [
"# Métriques sur tous les folds\n",
"\n",
"#RMSE\n",
"for fold, rmse in enumerate(RMSE_best_score, start=1):\n",
" print(f\"Fold {fold} RMSE:\", rmse)\n"
]
},
{
"cell_type": "code",
"execution_count": 115,
"id": "bb9a5c9b",
"metadata": {},
"outputs": [],
"source": [
"#MAE\n",
"for fold, mse in enumerate(MSE_best_score, start=1):\n",
" print(f\"Fold {fold} MSE:\", mse)"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "0f0768ad",
"metadata": {},
"outputs": [],
"source": [
"#MSE\n",
"for fold, mae in enumerate(MAE_best_score, start=1):\n",
" print(f\"Fold {fold} MAE:\", mae)"
]
},
{
"cell_type": "markdown",
"id": "802a625f",
"metadata": {},
"source": [
"**Question :** Commentez les résultats"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "studies",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}