mirror of
https://github.com/ArthurDanjou/ArtStudies.git
synced 2026-01-14 15:54:13 +01:00
3478 lines
84 KiB
Plaintext
3478 lines
84 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8750d15b",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Cours 3 : Machine Learning - Algorithmes supervisés (1/2)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f7c08ae5",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Préambule"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ec7ecb4b",
|
|
"metadata": {},
|
|
"source": [
|
|
"Les objectifs de cette séance (3h) sont :\n",
|
|
"* Préparation des bases de modélisation (sampling)\n",
|
|
"* Mettre en application un modèle supervisé simple.\n",
|
|
"* Construire un modèle de Machine Learning (cross-validation et hyperparamétrage) pour résoudre un problème de régression\n",
|
|
"* Analyser les performances du modèle"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "4e99c600",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Préparation du workspace"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "c1b01045",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Import de librairies "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "97d58527",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Données\n",
|
|
"import numpy as np\n",
|
|
"import pandas as pd\n",
|
|
"\n",
|
|
"#Graphiques\n",
|
|
"import seaborn as sns\n",
|
|
"\n",
|
|
"sns.set()\n",
|
|
"import plotly.express as px\n",
|
|
"import plotly.graph_objects as gp\n",
|
|
"import sklearn.preprocessing as preproc\n",
|
|
"\n",
|
|
"#Statistiques\n",
|
|
"from scipy.stats import chi2_contingency\n",
|
|
"from sklearn import metrics\n",
|
|
"\n",
|
|
"# Machine Learning\n",
|
|
"from sklearn.cluster import KMeans\n",
|
|
"from sklearn.ensemble import RandomForestRegressor\n",
|
|
"from sklearn.model_selection import KFold, train_test_split\n",
|
|
"from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "06153286",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Définition des fonctions "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c67db932",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "985e4e97",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Constantes"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 24,
|
|
"id": "c9597b48",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"input_path = \"./1_inputs\"\n",
|
|
"output_path = \"./2_outputs\""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b2b035d2",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Import des données"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 25,
|
|
"id": "8051b5f4",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"path =input_path + '/base_retraitee.csv'\n",
|
|
"data_retraitee = pd.read_csv(path,sep=\",\",decimal=\".\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a2578ba1",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Algorithme supervisé : CART "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "aaa0b27d",
|
|
"metadata": {},
|
|
"source": [
|
|
"Dans cette partie l'objectif est de construire un modèle simple (algorithme CART) afin de voir les différentes étapes nécessaire au lancement d'un modèle\n",
|
|
"Nous modéliserons directement le coût des sinistres. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "a0458a05",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Construction du modèle"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b3715c37",
|
|
"metadata": {},
|
|
"source": [
|
|
"La première étape est de calculer les côut moyen de chaque sinistre (target ou variable réponse). Cette variable sera la variable à prédire en fonction des variables explicatives."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 26,
|
|
"id": "c427a4b8",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
|
|
"columns": [
|
|
{
|
|
"name": "index",
|
|
"rawType": "int64",
|
|
"type": "integer"
|
|
},
|
|
{
|
|
"name": "ANNEE_CTR",
|
|
"rawType": "int64",
|
|
"type": "integer"
|
|
},
|
|
{
|
|
"name": "CONTRAT_ANCIENNETE",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "FREQUENCE_PAIEMENT_COTISATION",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "GROUPE_KM",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "ZONE_RISQUE",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "AGE_ASSURE_PRINCIPAL",
|
|
"rawType": "int64",
|
|
"type": "integer"
|
|
},
|
|
{
|
|
"name": "GENRE",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "DEUXIEME_CONDUCTEUR",
|
|
"rawType": "bool",
|
|
"type": "boolean"
|
|
},
|
|
{
|
|
"name": "ANCIENNETE_PERMIS",
|
|
"rawType": "int64",
|
|
"type": "integer"
|
|
},
|
|
{
|
|
"name": "ANNEE_CONSTRUCTION",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "ENERGIE",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "EQUIPEMENT_SECURITE",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "VALEUR_DU_BIEN",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "CM",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
}
|
|
],
|
|
"ref": "e76df045-0c83-40e9-a027-c48f278ec1d6",
|
|
"rows": [
|
|
[
|
|
"10",
|
|
"2019",
|
|
"(0,1]",
|
|
"MENSUEL",
|
|
"[0;20000[",
|
|
"C",
|
|
"40",
|
|
"M",
|
|
"False",
|
|
"37",
|
|
"2017.0",
|
|
"ESSENCE",
|
|
"VRAI",
|
|
"[15000;20000[",
|
|
"1072.98"
|
|
],
|
|
[
|
|
"34",
|
|
"2020",
|
|
"(-1,0]",
|
|
"MENSUEL",
|
|
"[20000;40000[",
|
|
"C",
|
|
"27",
|
|
"M",
|
|
"True",
|
|
"13",
|
|
"2018.0",
|
|
"AUTRE",
|
|
"FAUX",
|
|
"[35000;99999[",
|
|
"3750.0"
|
|
],
|
|
[
|
|
"36",
|
|
"2019",
|
|
"(-1,0]",
|
|
"MENSUEL",
|
|
"[20000;40000[",
|
|
"L",
|
|
"19",
|
|
"M",
|
|
"False",
|
|
"2",
|
|
"2017.0",
|
|
"ESSENCE",
|
|
"VRAI",
|
|
"[0;10000[",
|
|
"1838.49"
|
|
],
|
|
[
|
|
"78",
|
|
"2019",
|
|
"(-1,0]",
|
|
"MENSUEL",
|
|
"[20000;40000[",
|
|
"B",
|
|
"40",
|
|
"M",
|
|
"False",
|
|
"45",
|
|
"2018.0",
|
|
"DIESEL",
|
|
"FAUX",
|
|
"[15000;20000[",
|
|
"4892.74"
|
|
],
|
|
[
|
|
"89",
|
|
"2018",
|
|
"(1,2]",
|
|
"MENSUEL",
|
|
"[20000;40000[",
|
|
"C",
|
|
"20",
|
|
"M",
|
|
"False",
|
|
"11",
|
|
"2014.0",
|
|
"ESSENCE",
|
|
"FAUX",
|
|
"[25000;35000[",
|
|
"166.73"
|
|
]
|
|
],
|
|
"shape": {
|
|
"columns": 14,
|
|
"rows": 5
|
|
}
|
|
},
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>ANNEE_CTR</th>\n",
|
|
" <th>CONTRAT_ANCIENNETE</th>\n",
|
|
" <th>FREQUENCE_PAIEMENT_COTISATION</th>\n",
|
|
" <th>GROUPE_KM</th>\n",
|
|
" <th>ZONE_RISQUE</th>\n",
|
|
" <th>AGE_ASSURE_PRINCIPAL</th>\n",
|
|
" <th>GENRE</th>\n",
|
|
" <th>DEUXIEME_CONDUCTEUR</th>\n",
|
|
" <th>ANCIENNETE_PERMIS</th>\n",
|
|
" <th>ANNEE_CONSTRUCTION</th>\n",
|
|
" <th>ENERGIE</th>\n",
|
|
" <th>EQUIPEMENT_SECURITE</th>\n",
|
|
" <th>VALEUR_DU_BIEN</th>\n",
|
|
" <th>CM</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>10</th>\n",
|
|
" <td>2019</td>\n",
|
|
" <td>(0,1]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[0;20000[</td>\n",
|
|
" <td>C</td>\n",
|
|
" <td>40</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>37</td>\n",
|
|
" <td>2017.0</td>\n",
|
|
" <td>ESSENCE</td>\n",
|
|
" <td>VRAI</td>\n",
|
|
" <td>[15000;20000[</td>\n",
|
|
" <td>1072.98</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>34</th>\n",
|
|
" <td>2020</td>\n",
|
|
" <td>(-1,0]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[20000;40000[</td>\n",
|
|
" <td>C</td>\n",
|
|
" <td>27</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>True</td>\n",
|
|
" <td>13</td>\n",
|
|
" <td>2018.0</td>\n",
|
|
" <td>AUTRE</td>\n",
|
|
" <td>FAUX</td>\n",
|
|
" <td>[35000;99999[</td>\n",
|
|
" <td>3750.00</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>36</th>\n",
|
|
" <td>2019</td>\n",
|
|
" <td>(-1,0]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[20000;40000[</td>\n",
|
|
" <td>L</td>\n",
|
|
" <td>19</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>2017.0</td>\n",
|
|
" <td>ESSENCE</td>\n",
|
|
" <td>VRAI</td>\n",
|
|
" <td>[0;10000[</td>\n",
|
|
" <td>1838.49</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>78</th>\n",
|
|
" <td>2019</td>\n",
|
|
" <td>(-1,0]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[20000;40000[</td>\n",
|
|
" <td>B</td>\n",
|
|
" <td>40</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>45</td>\n",
|
|
" <td>2018.0</td>\n",
|
|
" <td>DIESEL</td>\n",
|
|
" <td>FAUX</td>\n",
|
|
" <td>[15000;20000[</td>\n",
|
|
" <td>4892.74</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>89</th>\n",
|
|
" <td>2018</td>\n",
|
|
" <td>(1,2]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[20000;40000[</td>\n",
|
|
" <td>C</td>\n",
|
|
" <td>20</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>11</td>\n",
|
|
" <td>2014.0</td>\n",
|
|
" <td>ESSENCE</td>\n",
|
|
" <td>FAUX</td>\n",
|
|
" <td>[25000;35000[</td>\n",
|
|
" <td>166.73</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" ANNEE_CTR CONTRAT_ANCIENNETE FREQUENCE_PAIEMENT_COTISATION GROUPE_KM \\\n",
|
|
"10 2019 (0,1] MENSUEL [0;20000[ \n",
|
|
"34 2020 (-1,0] MENSUEL [20000;40000[ \n",
|
|
"36 2019 (-1,0] MENSUEL [20000;40000[ \n",
|
|
"78 2019 (-1,0] MENSUEL [20000;40000[ \n",
|
|
"89 2018 (1,2] MENSUEL [20000;40000[ \n",
|
|
"\n",
|
|
" ZONE_RISQUE AGE_ASSURE_PRINCIPAL GENRE DEUXIEME_CONDUCTEUR \\\n",
|
|
"10 C 40 M False \n",
|
|
"34 C 27 M True \n",
|
|
"36 L 19 M False \n",
|
|
"78 B 40 M False \n",
|
|
"89 C 20 M False \n",
|
|
"\n",
|
|
" ANCIENNETE_PERMIS ANNEE_CONSTRUCTION ENERGIE EQUIPEMENT_SECURITE \\\n",
|
|
"10 37 2017.0 ESSENCE VRAI \n",
|
|
"34 13 2018.0 AUTRE FAUX \n",
|
|
"36 2 2017.0 ESSENCE VRAI \n",
|
|
"78 45 2018.0 DIESEL FAUX \n",
|
|
"89 11 2014.0 ESSENCE FAUX \n",
|
|
"\n",
|
|
" VALEUR_DU_BIEN CM \n",
|
|
"10 [15000;20000[ 1072.98 \n",
|
|
"34 [35000;99999[ 3750.00 \n",
|
|
"36 [0;10000[ 1838.49 \n",
|
|
"78 [15000;20000[ 4892.74 \n",
|
|
"89 [25000;35000[ 166.73 "
|
|
]
|
|
},
|
|
"execution_count": 26,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"data_model = data_retraitee.copy()\n",
|
|
"\n",
|
|
"# Filtre pour ne garder que les lignes qui ont un sinistre (NB > 0)\n",
|
|
"data_model = data_model[data_model['NB'] > 0]\n",
|
|
"\n",
|
|
"# Calcul du cout moyen \"théorique\" des sinistres\n",
|
|
"data_model[\"CM\"] = (data_model[\"CHARGE\"] / data_model[\"NB\"])\n",
|
|
"data_model = data_model.drop(['CHARGE', 'NB', \"EXPO\"], axis=1)\n",
|
|
"data_model.head()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e3e85088",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** construisez les statistiques descriptives de la base utilisée."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 27,
|
|
"id": "c8fd3ee1",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
|
|
"columns": [
|
|
{
|
|
"name": "index",
|
|
"rawType": "object",
|
|
"type": "string"
|
|
},
|
|
{
|
|
"name": "ANNEE_CTR",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "CONTRAT_ANCIENNETE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "FREQUENCE_PAIEMENT_COTISATION",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "GROUPE_KM",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "ZONE_RISQUE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "AGE_ASSURE_PRINCIPAL",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "GENRE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "DEUXIEME_CONDUCTEUR",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "ANCIENNETE_PERMIS",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "ANNEE_CONSTRUCTION",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
},
|
|
{
|
|
"name": "ENERGIE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "EQUIPEMENT_SECURITE",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "VALEUR_DU_BIEN",
|
|
"rawType": "object",
|
|
"type": "unknown"
|
|
},
|
|
{
|
|
"name": "CM",
|
|
"rawType": "float64",
|
|
"type": "float"
|
|
}
|
|
],
|
|
"ref": "b2f9efdd-d035-4c51-9797-2e202b404c15",
|
|
"rows": [
|
|
[
|
|
"count",
|
|
"824.0",
|
|
"824",
|
|
"824",
|
|
"824",
|
|
"824",
|
|
"824.0",
|
|
"824",
|
|
"824",
|
|
"824.0",
|
|
"824.0",
|
|
"824",
|
|
"824",
|
|
"824",
|
|
"824.0"
|
|
],
|
|
[
|
|
"unique",
|
|
null,
|
|
"5",
|
|
"3",
|
|
"4",
|
|
"14",
|
|
null,
|
|
"2",
|
|
"2",
|
|
null,
|
|
null,
|
|
"3",
|
|
"2",
|
|
"6",
|
|
null
|
|
],
|
|
[
|
|
"top",
|
|
null,
|
|
"(0,1]",
|
|
"MENSUEL",
|
|
"[0;20000[",
|
|
"C",
|
|
null,
|
|
"M",
|
|
"False",
|
|
null,
|
|
null,
|
|
"ESSENCE",
|
|
"FAUX",
|
|
"[10000;15000[",
|
|
null
|
|
],
|
|
[
|
|
"freq",
|
|
null,
|
|
"297",
|
|
"398",
|
|
"391",
|
|
"269",
|
|
null,
|
|
"483",
|
|
"663",
|
|
null,
|
|
null,
|
|
"413",
|
|
"517",
|
|
"213",
|
|
null
|
|
],
|
|
[
|
|
"mean",
|
|
"2018.384708737864",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"44.383495145631066",
|
|
null,
|
|
null,
|
|
"35.68810679611651",
|
|
"2015.2123786407767",
|
|
null,
|
|
null,
|
|
null,
|
|
"4246.01697815534"
|
|
],
|
|
[
|
|
"std",
|
|
"1.515832735580178",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"13.808216667998865",
|
|
null,
|
|
null,
|
|
"19.370620845496358",
|
|
"3.1637823115731556",
|
|
null,
|
|
null,
|
|
null,
|
|
"6869.61691660173"
|
|
],
|
|
[
|
|
"min",
|
|
"2016.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"19.0",
|
|
null,
|
|
null,
|
|
"1.0",
|
|
"1998.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"7.5"
|
|
],
|
|
[
|
|
"25%",
|
|
"2017.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"34.0",
|
|
null,
|
|
null,
|
|
"18.0",
|
|
"2014.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"1159.96125"
|
|
],
|
|
[
|
|
"50%",
|
|
"2018.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"43.0",
|
|
null,
|
|
null,
|
|
"35.0",
|
|
"2016.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"2541.6499999999996"
|
|
],
|
|
[
|
|
"75%",
|
|
"2020.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"53.0",
|
|
null,
|
|
null,
|
|
"53.0",
|
|
"2017.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"4193.797500000001"
|
|
],
|
|
[
|
|
"max",
|
|
"2021.0",
|
|
null,
|
|
null,
|
|
null,
|
|
null,
|
|
"94.0",
|
|
null,
|
|
null,
|
|
"70.0",
|
|
"2021.0",
|
|
null,
|
|
null,
|
|
null,
|
|
"83421.85"
|
|
]
|
|
],
|
|
"shape": {
|
|
"columns": 14,
|
|
"rows": 11
|
|
}
|
|
},
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>ANNEE_CTR</th>\n",
|
|
" <th>CONTRAT_ANCIENNETE</th>\n",
|
|
" <th>FREQUENCE_PAIEMENT_COTISATION</th>\n",
|
|
" <th>GROUPE_KM</th>\n",
|
|
" <th>ZONE_RISQUE</th>\n",
|
|
" <th>AGE_ASSURE_PRINCIPAL</th>\n",
|
|
" <th>GENRE</th>\n",
|
|
" <th>DEUXIEME_CONDUCTEUR</th>\n",
|
|
" <th>ANCIENNETE_PERMIS</th>\n",
|
|
" <th>ANNEE_CONSTRUCTION</th>\n",
|
|
" <th>ENERGIE</th>\n",
|
|
" <th>EQUIPEMENT_SECURITE</th>\n",
|
|
" <th>VALEUR_DU_BIEN</th>\n",
|
|
" <th>CM</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>count</th>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824</td>\n",
|
|
" <td>824.000000</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>unique</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>5</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>4</td>\n",
|
|
" <td>14</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>3</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>6</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>top</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>(0,1]</td>\n",
|
|
" <td>MENSUEL</td>\n",
|
|
" <td>[0;20000[</td>\n",
|
|
" <td>C</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>M</td>\n",
|
|
" <td>False</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>ESSENCE</td>\n",
|
|
" <td>FAUX</td>\n",
|
|
" <td>[10000;15000[</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>freq</th>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>297</td>\n",
|
|
" <td>398</td>\n",
|
|
" <td>391</td>\n",
|
|
" <td>269</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>483</td>\n",
|
|
" <td>663</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>413</td>\n",
|
|
" <td>517</td>\n",
|
|
" <td>213</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>mean</th>\n",
|
|
" <td>2018.384709</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>44.383495</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>35.688107</td>\n",
|
|
" <td>2015.212379</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4246.016978</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>std</th>\n",
|
|
" <td>1.515833</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>13.808217</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>19.370621</td>\n",
|
|
" <td>3.163782</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>6869.616917</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>min</th>\n",
|
|
" <td>2016.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>19.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>1.000000</td>\n",
|
|
" <td>1998.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>7.500000</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>25%</th>\n",
|
|
" <td>2017.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>34.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>18.000000</td>\n",
|
|
" <td>2014.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>1159.961250</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>50%</th>\n",
|
|
" <td>2018.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>43.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>35.000000</td>\n",
|
|
" <td>2016.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>2541.650000</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>75%</th>\n",
|
|
" <td>2020.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>53.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>53.000000</td>\n",
|
|
" <td>2017.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>4193.797500</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>max</th>\n",
|
|
" <td>2021.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>94.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>70.000000</td>\n",
|
|
" <td>2021.000000</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>NaN</td>\n",
|
|
" <td>83421.850000</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
],
|
|
"text/plain": [
|
|
" ANNEE_CTR CONTRAT_ANCIENNETE FREQUENCE_PAIEMENT_COTISATION \\\n",
|
|
"count 824.000000 824 824 \n",
|
|
"unique NaN 5 3 \n",
|
|
"top NaN (0,1] MENSUEL \n",
|
|
"freq NaN 297 398 \n",
|
|
"mean 2018.384709 NaN NaN \n",
|
|
"std 1.515833 NaN NaN \n",
|
|
"min 2016.000000 NaN NaN \n",
|
|
"25% 2017.000000 NaN NaN \n",
|
|
"50% 2018.000000 NaN NaN \n",
|
|
"75% 2020.000000 NaN NaN \n",
|
|
"max 2021.000000 NaN NaN \n",
|
|
"\n",
|
|
" GROUPE_KM ZONE_RISQUE AGE_ASSURE_PRINCIPAL GENRE DEUXIEME_CONDUCTEUR \\\n",
|
|
"count 824 824 824.000000 824 824 \n",
|
|
"unique 4 14 NaN 2 2 \n",
|
|
"top [0;20000[ C NaN M False \n",
|
|
"freq 391 269 NaN 483 663 \n",
|
|
"mean NaN NaN 44.383495 NaN NaN \n",
|
|
"std NaN NaN 13.808217 NaN NaN \n",
|
|
"min NaN NaN 19.000000 NaN NaN \n",
|
|
"25% NaN NaN 34.000000 NaN NaN \n",
|
|
"50% NaN NaN 43.000000 NaN NaN \n",
|
|
"75% NaN NaN 53.000000 NaN NaN \n",
|
|
"max NaN NaN 94.000000 NaN NaN \n",
|
|
"\n",
|
|
" ANCIENNETE_PERMIS ANNEE_CONSTRUCTION ENERGIE EQUIPEMENT_SECURITE \\\n",
|
|
"count 824.000000 824.000000 824 824 \n",
|
|
"unique NaN NaN 3 2 \n",
|
|
"top NaN NaN ESSENCE FAUX \n",
|
|
"freq NaN NaN 413 517 \n",
|
|
"mean 35.688107 2015.212379 NaN NaN \n",
|
|
"std 19.370621 3.163782 NaN NaN \n",
|
|
"min 1.000000 1998.000000 NaN NaN \n",
|
|
"25% 18.000000 2014.000000 NaN NaN \n",
|
|
"50% 35.000000 2016.000000 NaN NaN \n",
|
|
"75% 53.000000 2017.000000 NaN NaN \n",
|
|
"max 70.000000 2021.000000 NaN NaN \n",
|
|
"\n",
|
|
" VALEUR_DU_BIEN CM \n",
|
|
"count 824 824.000000 \n",
|
|
"unique 6 NaN \n",
|
|
"top [10000;15000[ NaN \n",
|
|
"freq 213 NaN \n",
|
|
"mean NaN 4246.016978 \n",
|
|
"std NaN 6869.616917 \n",
|
|
"min NaN 7.500000 \n",
|
|
"25% NaN 1159.961250 \n",
|
|
"50% NaN 2541.650000 \n",
|
|
"75% NaN 4193.797500 \n",
|
|
"max NaN 83421.850000 "
|
|
]
|
|
},
|
|
"execution_count": 27,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
}
|
|
],
|
|
"source": [
|
|
"data_model.describe(include='all')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "92d6156a",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Etude des corrélations parmi les variables explicatives"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d7327570",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** Selon vous, pourquoi faut-il s'intéresser à la corrélation des variables ? "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "475e141b",
|
|
"metadata": {},
|
|
"source": [
|
|
"*Réponse*: Pour avoir un modèle qui fit mieux + déterminer un potentiel effet de causalité entre features et target + sélectionner certaines variables."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 28,
|
|
"id": "1b156435",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"data_set = data_model.drop(\"CM\", axis=1)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 29,
|
|
"id": "0ef0fcc0",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Séparation en variables qualitatives ou catégorielles\n",
|
|
"variables_na = []\n",
|
|
"variables_numeriques = []\n",
|
|
"variables_01 = []\n",
|
|
"variables_categorielles = []\n",
|
|
"for colu in data_set.columns:\n",
|
|
" if True in data_set[colu].isna().unique() :\n",
|
|
" variables_na.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" if str(data_set[colu].dtypes) in [\"int32\",\"int64\",\"float64\"]:\n",
|
|
" if len(data_set[colu].unique())==2 :\n",
|
|
" variables_categorielles.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" variables_numeriques.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" if len(data_set[colu].unique())==2 :\n",
|
|
" variables_categorielles.append(data_set[colu])\n",
|
|
" else :\n",
|
|
" variables_categorielles.append(data_set[colu])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e82fcade",
|
|
"metadata": {},
|
|
"source": [
|
|
"##### Corrélation des variables catégorielles :"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 16,
|
|
"id": "e130aae5",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"vars_categorielles = pd.DataFrame(variables_categorielles).transpose()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c39e2ad0",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.plotly.v1+json": {
|
|
"config": {
|
|
"plotlyServerURL": "https://plot.ly"
|
|
},
|
|
"data": [
|
|
{
|
|
"coloraxis": "coloraxis",
|
|
"hovertemplate": "x: %{x}<br>y: %{y}<br>color: %{z}<extra></extra>",
|
|
"name": "0",
|
|
"texttemplate": "%{z:.2f}",
|
|
"type": "heatmap",
|
|
"x": [
|
|
"CONTRAT_ANCIENNETE",
|
|
"FREQUENCE_PAIEMENT_COTISATION",
|
|
"GROUPE_KM",
|
|
"ZONE_RISQUE",
|
|
"GENRE",
|
|
"DEUXIEME_CONDUCTEUR",
|
|
"ENERGIE",
|
|
"EQUIPEMENT_SECURITE",
|
|
"VALEUR_DU_BIEN"
|
|
],
|
|
"xaxis": "x",
|
|
"y": [
|
|
"CONTRAT_ANCIENNETE",
|
|
"FREQUENCE_PAIEMENT_COTISATION",
|
|
"GROUPE_KM",
|
|
"ZONE_RISQUE",
|
|
"GENRE",
|
|
"DEUXIEME_CONDUCTEUR",
|
|
"ENERGIE",
|
|
"EQUIPEMENT_SECURITE",
|
|
"VALEUR_DU_BIEN"
|
|
],
|
|
"yaxis": "y",
|
|
"z": {
|
|
"bdata": "AAAAAAAA8D8AAAAAAAAAACoCGzzITrA/jS6+t390sj/aAKYMJa2eP5RMqUS3uZs/ytNpsBVXkz8AAAAAAAAAAJsekiMPM4I/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAAAAAAAAAAAABgNwyfFOK3Px3tLvtk1qI/VTS7w965nj/DbHQwNU6sP6xOyIjBVMQ/KwIbPMhOsD8AAAAAAAAAAAAAAAAAAPA/JGwWgOwjwz/Y12crRVC2P1AU8aUpk3Y/tZ25v8HgyT9++YWBDBq6PxMKBP1KAMk/ki6+t390sj8AAAAAAAAAACNsFoDsI8M/AAAAAAAA8D8AAAAAAAAAAOzpAHMW1bU/OToUIB5twT+gpoD1ZjrEP/5ATjN+vpg/0gCmDCWtnj9gNwyfFOK3P9jXZytFULY/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAA2p0N4q1bwz/UsLoqS0u5PxFqf8IHB9E/lEypRLe5mz8d7S77ZNaiP1AU8aUpk3Y/7OkAcxbVtT8AAAAAAAAAAAAAAAAAAPA/AAAAAAAAAAAAAAAAAAAAAOYlMsJ0brs/ytNpsBVXkz9RNLvD3rmeP7edub/B4Mk/OjoUIB5twT/anQ3irVvDPwAAAAAAAAAAAAAAAAAA8D8nEbUEUmnAP+SA2g/TvNE/AAAAAAAAAADDbHQwNU6sP335hYEMGro/oKaA9WY6xD/UsLoqS0u5PwAAAAAAAAAAJxG1BFJpwD8AAAAAAADwP+fmCf6XRco/mx6SIw8zgj+rTsiIwVTEPxIKBP1KAMk//kBOM36+mD8Ran/CBwfRP+YlMsJ0brs/5YDaD9O80T/n5gn+l0XKPwAAAAAAAPA/",
|
|
"dtype": "f8",
|
|
"shape": "9, 9"
|
|
}
|
|
}
|
|
],
|
|
"layout": {
|
|
"coloraxis": {
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"rgb(5,48,97)"
|
|
],
|
|
[
|
|
0.1,
|
|
"rgb(33,102,172)"
|
|
],
|
|
[
|
|
0.2,
|
|
"rgb(67,147,195)"
|
|
],
|
|
[
|
|
0.3,
|
|
"rgb(146,197,222)"
|
|
],
|
|
[
|
|
0.4,
|
|
"rgb(209,229,240)"
|
|
],
|
|
[
|
|
0.5,
|
|
"rgb(247,247,247)"
|
|
],
|
|
[
|
|
0.6,
|
|
"rgb(253,219,199)"
|
|
],
|
|
[
|
|
0.7,
|
|
"rgb(244,165,130)"
|
|
],
|
|
[
|
|
0.8,
|
|
"rgb(214,96,77)"
|
|
],
|
|
[
|
|
0.9,
|
|
"rgb(178,24,43)"
|
|
],
|
|
[
|
|
1,
|
|
"rgb(103,0,31)"
|
|
]
|
|
]
|
|
},
|
|
"template": {
|
|
"data": {
|
|
"bar": [
|
|
{
|
|
"error_x": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"error_y": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "bar"
|
|
}
|
|
],
|
|
"barpolar": [
|
|
{
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "barpolar"
|
|
}
|
|
],
|
|
"carpet": [
|
|
{
|
|
"aaxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"baxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"type": "carpet"
|
|
}
|
|
],
|
|
"choropleth": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "choropleth"
|
|
}
|
|
],
|
|
"contour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "contour"
|
|
}
|
|
],
|
|
"contourcarpet": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "contourcarpet"
|
|
}
|
|
],
|
|
"heatmap": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "heatmap"
|
|
}
|
|
],
|
|
"histogram": [
|
|
{
|
|
"marker": {
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "histogram"
|
|
}
|
|
],
|
|
"histogram2d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2d"
|
|
}
|
|
],
|
|
"histogram2dcontour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2dcontour"
|
|
}
|
|
],
|
|
"mesh3d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "mesh3d"
|
|
}
|
|
],
|
|
"parcoords": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "parcoords"
|
|
}
|
|
],
|
|
"pie": [
|
|
{
|
|
"automargin": true,
|
|
"type": "pie"
|
|
}
|
|
],
|
|
"scatter": [
|
|
{
|
|
"fillpattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
},
|
|
"type": "scatter"
|
|
}
|
|
],
|
|
"scatter3d": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatter3d"
|
|
}
|
|
],
|
|
"scattercarpet": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattercarpet"
|
|
}
|
|
],
|
|
"scattergeo": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergeo"
|
|
}
|
|
],
|
|
"scattergl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergl"
|
|
}
|
|
],
|
|
"scattermap": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermap"
|
|
}
|
|
],
|
|
"scattermapbox": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermapbox"
|
|
}
|
|
],
|
|
"scatterpolar": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolar"
|
|
}
|
|
],
|
|
"scatterpolargl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolargl"
|
|
}
|
|
],
|
|
"scatterternary": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterternary"
|
|
}
|
|
],
|
|
"surface": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "surface"
|
|
}
|
|
],
|
|
"table": [
|
|
{
|
|
"cells": {
|
|
"fill": {
|
|
"color": "#EBF0F8"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"header": {
|
|
"fill": {
|
|
"color": "#C8D4E3"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"type": "table"
|
|
}
|
|
]
|
|
},
|
|
"layout": {
|
|
"annotationdefaults": {
|
|
"arrowcolor": "#2a3f5f",
|
|
"arrowhead": 0,
|
|
"arrowwidth": 1
|
|
},
|
|
"autotypenumbers": "strict",
|
|
"coloraxis": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"colorscale": {
|
|
"diverging": [
|
|
[
|
|
0,
|
|
"#8e0152"
|
|
],
|
|
[
|
|
0.1,
|
|
"#c51b7d"
|
|
],
|
|
[
|
|
0.2,
|
|
"#de77ae"
|
|
],
|
|
[
|
|
0.3,
|
|
"#f1b6da"
|
|
],
|
|
[
|
|
0.4,
|
|
"#fde0ef"
|
|
],
|
|
[
|
|
0.5,
|
|
"#f7f7f7"
|
|
],
|
|
[
|
|
0.6,
|
|
"#e6f5d0"
|
|
],
|
|
[
|
|
0.7,
|
|
"#b8e186"
|
|
],
|
|
[
|
|
0.8,
|
|
"#7fbc41"
|
|
],
|
|
[
|
|
0.9,
|
|
"#4d9221"
|
|
],
|
|
[
|
|
1,
|
|
"#276419"
|
|
]
|
|
],
|
|
"sequential": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"sequentialminus": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
]
|
|
},
|
|
"colorway": [
|
|
"#636efa",
|
|
"#EF553B",
|
|
"#00cc96",
|
|
"#ab63fa",
|
|
"#FFA15A",
|
|
"#19d3f3",
|
|
"#FF6692",
|
|
"#B6E880",
|
|
"#FF97FF",
|
|
"#FECB52"
|
|
],
|
|
"font": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"geo": {
|
|
"bgcolor": "white",
|
|
"lakecolor": "white",
|
|
"landcolor": "#E5ECF6",
|
|
"showlakes": true,
|
|
"showland": true,
|
|
"subunitcolor": "white"
|
|
},
|
|
"hoverlabel": {
|
|
"align": "left"
|
|
},
|
|
"hovermode": "closest",
|
|
"mapbox": {
|
|
"style": "light"
|
|
},
|
|
"paper_bgcolor": "white",
|
|
"plot_bgcolor": "#E5ECF6",
|
|
"polar": {
|
|
"angularaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"radialaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"scene": {
|
|
"xaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"yaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"zaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
}
|
|
},
|
|
"shapedefaults": {
|
|
"line": {
|
|
"color": "#2a3f5f"
|
|
}
|
|
},
|
|
"ternary": {
|
|
"aaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"baxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"caxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"title": {
|
|
"x": 0.05
|
|
},
|
|
"xaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
},
|
|
"yaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
}
|
|
}
|
|
},
|
|
"title": {
|
|
"text": "Matrice de corrélation des variables catégorielles (V de Cramér)"
|
|
},
|
|
"xaxis": {
|
|
"anchor": "y",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
},
|
|
"yaxis": {
|
|
"anchor": "x",
|
|
"autorange": "reversed",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Matrice de corrélation pour les variables catégorielles (V de Cramér)\n",
|
|
"def cramers_v(confusion_matrix):\n",
|
|
" \"\"\"Calcule le V de Cramér à partir d'une matrice de contingence\"\"\"\n",
|
|
" chi2 = chi2_contingency(confusion_matrix)[0]\n",
|
|
" n = confusion_matrix.sum().sum()\n",
|
|
" phi2 = chi2 / n\n",
|
|
" r, k = confusion_matrix.shape\n",
|
|
" phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))\n",
|
|
" rcorr = r - ((r-1)**2)/(n-1)\n",
|
|
" kcorr = k - ((k-1)**2)/(n-1)\n",
|
|
" return np.sqrt(phi2corr / min((kcorr-1), (rcorr-1)))\n",
|
|
"\n",
|
|
"# Créer la matrice de corrélation\n",
|
|
"categorical_cols = vars_categorielles.columns\n",
|
|
"n_vars = len(categorical_cols)\n",
|
|
"cramers_matrix = np.zeros((n_vars, n_vars))\n",
|
|
"\n",
|
|
"for i, col1 in enumerate(categorical_cols):\n",
|
|
" for j, col2 in enumerate(categorical_cols):\n",
|
|
" if i == j:\n",
|
|
" cramers_matrix[i, j] = 1.0\n",
|
|
" else:\n",
|
|
" confusion_matrix = pd.crosstab(vars_categorielles[col1], vars_categorielles[col2])\n",
|
|
" cramers_matrix[i, j] = cramers_v(confusion_matrix)\n",
|
|
"\n",
|
|
"# Créer le DataFrame de corrélation\n",
|
|
"correlation_cat = pd.DataFrame(cramers_matrix,\n",
|
|
" index=categorical_cols,\n",
|
|
" columns=categorical_cols)\n",
|
|
"\n",
|
|
"# Visualiser avec Plotly\n",
|
|
"fig = px.imshow(correlation_cat,\n",
|
|
" text_auto='.2f', # type: ignore\n",
|
|
" aspect=\"auto\",\n",
|
|
" color_continuous_scale='RdBu_r',\n",
|
|
" title='Matrice de corrélation des variables catégorielles (V de Cramér)')\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8f615121",
|
|
"metadata": {},
|
|
"source": [
|
|
"##### Corrélation des variables numériques :"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 22,
|
|
"id": "a16215ab",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"vars_numeriques = pd.DataFrame(variables_numeriques).transpose()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 34,
|
|
"id": "532ca6c4",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"application/vnd.plotly.v1+json": {
|
|
"config": {
|
|
"plotlyServerURL": "https://plot.ly"
|
|
},
|
|
"data": [
|
|
{
|
|
"coloraxis": "coloraxis",
|
|
"hovertemplate": "x: %{x}<br>y: %{y}<br>color: %{z}<extra></extra>",
|
|
"name": "0",
|
|
"texttemplate": "%{z}",
|
|
"type": "heatmap",
|
|
"x": [
|
|
"ANNEE_CTR",
|
|
"AGE_ASSURE_PRINCIPAL",
|
|
"ANCIENNETE_PERMIS",
|
|
"ANNEE_CONSTRUCTION"
|
|
],
|
|
"xaxis": "x",
|
|
"y": [
|
|
"ANNEE_CTR",
|
|
"AGE_ASSURE_PRINCIPAL",
|
|
"ANCIENNETE_PERMIS",
|
|
"ANNEE_CONSTRUCTION"
|
|
],
|
|
"yaxis": "y",
|
|
"z": {
|
|
"bdata": "AAAAAAAA8D+ybZcEUUCbP/CBLCtO46Q/qr2Q49LN2D+ybZcEUUCbPwAAAAAAAPA/slV7SAtP4T84L73yETWgv/CBLCtO46Q/slV7SAtP4T8AAAAAAADwP0I6y25dD6E/qr2Q49LN2D84L73yETWgv0I6y25dD6E/AAAAAAAA8D8=",
|
|
"dtype": "f8",
|
|
"shape": "4, 4"
|
|
}
|
|
}
|
|
],
|
|
"layout": {
|
|
"coloraxis": {
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"rgb(5,48,97)"
|
|
],
|
|
[
|
|
0.1,
|
|
"rgb(33,102,172)"
|
|
],
|
|
[
|
|
0.2,
|
|
"rgb(67,147,195)"
|
|
],
|
|
[
|
|
0.3,
|
|
"rgb(146,197,222)"
|
|
],
|
|
[
|
|
0.4,
|
|
"rgb(209,229,240)"
|
|
],
|
|
[
|
|
0.5,
|
|
"rgb(247,247,247)"
|
|
],
|
|
[
|
|
0.6,
|
|
"rgb(253,219,199)"
|
|
],
|
|
[
|
|
0.7,
|
|
"rgb(244,165,130)"
|
|
],
|
|
[
|
|
0.8,
|
|
"rgb(214,96,77)"
|
|
],
|
|
[
|
|
0.9,
|
|
"rgb(178,24,43)"
|
|
],
|
|
[
|
|
1,
|
|
"rgb(103,0,31)"
|
|
]
|
|
]
|
|
},
|
|
"template": {
|
|
"data": {
|
|
"bar": [
|
|
{
|
|
"error_x": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"error_y": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "bar"
|
|
}
|
|
],
|
|
"barpolar": [
|
|
{
|
|
"marker": {
|
|
"line": {
|
|
"color": "#E5ECF6",
|
|
"width": 0.5
|
|
},
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "barpolar"
|
|
}
|
|
],
|
|
"carpet": [
|
|
{
|
|
"aaxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"baxis": {
|
|
"endlinecolor": "#2a3f5f",
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"minorgridcolor": "white",
|
|
"startlinecolor": "#2a3f5f"
|
|
},
|
|
"type": "carpet"
|
|
}
|
|
],
|
|
"choropleth": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "choropleth"
|
|
}
|
|
],
|
|
"contour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "contour"
|
|
}
|
|
],
|
|
"contourcarpet": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "contourcarpet"
|
|
}
|
|
],
|
|
"heatmap": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "heatmap"
|
|
}
|
|
],
|
|
"histogram": [
|
|
{
|
|
"marker": {
|
|
"pattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
}
|
|
},
|
|
"type": "histogram"
|
|
}
|
|
],
|
|
"histogram2d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2d"
|
|
}
|
|
],
|
|
"histogram2dcontour": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "histogram2dcontour"
|
|
}
|
|
],
|
|
"mesh3d": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"type": "mesh3d"
|
|
}
|
|
],
|
|
"parcoords": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "parcoords"
|
|
}
|
|
],
|
|
"pie": [
|
|
{
|
|
"automargin": true,
|
|
"type": "pie"
|
|
}
|
|
],
|
|
"scatter": [
|
|
{
|
|
"fillpattern": {
|
|
"fillmode": "overlay",
|
|
"size": 10,
|
|
"solidity": 0.2
|
|
},
|
|
"type": "scatter"
|
|
}
|
|
],
|
|
"scatter3d": [
|
|
{
|
|
"line": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatter3d"
|
|
}
|
|
],
|
|
"scattercarpet": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattercarpet"
|
|
}
|
|
],
|
|
"scattergeo": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergeo"
|
|
}
|
|
],
|
|
"scattergl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattergl"
|
|
}
|
|
],
|
|
"scattermap": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermap"
|
|
}
|
|
],
|
|
"scattermapbox": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scattermapbox"
|
|
}
|
|
],
|
|
"scatterpolar": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolar"
|
|
}
|
|
],
|
|
"scatterpolargl": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterpolargl"
|
|
}
|
|
],
|
|
"scatterternary": [
|
|
{
|
|
"marker": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"type": "scatterternary"
|
|
}
|
|
],
|
|
"surface": [
|
|
{
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
},
|
|
"colorscale": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"type": "surface"
|
|
}
|
|
],
|
|
"table": [
|
|
{
|
|
"cells": {
|
|
"fill": {
|
|
"color": "#EBF0F8"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"header": {
|
|
"fill": {
|
|
"color": "#C8D4E3"
|
|
},
|
|
"line": {
|
|
"color": "white"
|
|
}
|
|
},
|
|
"type": "table"
|
|
}
|
|
]
|
|
},
|
|
"layout": {
|
|
"annotationdefaults": {
|
|
"arrowcolor": "#2a3f5f",
|
|
"arrowhead": 0,
|
|
"arrowwidth": 1
|
|
},
|
|
"autotypenumbers": "strict",
|
|
"coloraxis": {
|
|
"colorbar": {
|
|
"outlinewidth": 0,
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"colorscale": {
|
|
"diverging": [
|
|
[
|
|
0,
|
|
"#8e0152"
|
|
],
|
|
[
|
|
0.1,
|
|
"#c51b7d"
|
|
],
|
|
[
|
|
0.2,
|
|
"#de77ae"
|
|
],
|
|
[
|
|
0.3,
|
|
"#f1b6da"
|
|
],
|
|
[
|
|
0.4,
|
|
"#fde0ef"
|
|
],
|
|
[
|
|
0.5,
|
|
"#f7f7f7"
|
|
],
|
|
[
|
|
0.6,
|
|
"#e6f5d0"
|
|
],
|
|
[
|
|
0.7,
|
|
"#b8e186"
|
|
],
|
|
[
|
|
0.8,
|
|
"#7fbc41"
|
|
],
|
|
[
|
|
0.9,
|
|
"#4d9221"
|
|
],
|
|
[
|
|
1,
|
|
"#276419"
|
|
]
|
|
],
|
|
"sequential": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
],
|
|
"sequentialminus": [
|
|
[
|
|
0,
|
|
"#0d0887"
|
|
],
|
|
[
|
|
0.1111111111111111,
|
|
"#46039f"
|
|
],
|
|
[
|
|
0.2222222222222222,
|
|
"#7201a8"
|
|
],
|
|
[
|
|
0.3333333333333333,
|
|
"#9c179e"
|
|
],
|
|
[
|
|
0.4444444444444444,
|
|
"#bd3786"
|
|
],
|
|
[
|
|
0.5555555555555556,
|
|
"#d8576b"
|
|
],
|
|
[
|
|
0.6666666666666666,
|
|
"#ed7953"
|
|
],
|
|
[
|
|
0.7777777777777778,
|
|
"#fb9f3a"
|
|
],
|
|
[
|
|
0.8888888888888888,
|
|
"#fdca26"
|
|
],
|
|
[
|
|
1,
|
|
"#f0f921"
|
|
]
|
|
]
|
|
},
|
|
"colorway": [
|
|
"#636efa",
|
|
"#EF553B",
|
|
"#00cc96",
|
|
"#ab63fa",
|
|
"#FFA15A",
|
|
"#19d3f3",
|
|
"#FF6692",
|
|
"#B6E880",
|
|
"#FF97FF",
|
|
"#FECB52"
|
|
],
|
|
"font": {
|
|
"color": "#2a3f5f"
|
|
},
|
|
"geo": {
|
|
"bgcolor": "white",
|
|
"lakecolor": "white",
|
|
"landcolor": "#E5ECF6",
|
|
"showlakes": true,
|
|
"showland": true,
|
|
"subunitcolor": "white"
|
|
},
|
|
"hoverlabel": {
|
|
"align": "left"
|
|
},
|
|
"hovermode": "closest",
|
|
"mapbox": {
|
|
"style": "light"
|
|
},
|
|
"paper_bgcolor": "white",
|
|
"plot_bgcolor": "#E5ECF6",
|
|
"polar": {
|
|
"angularaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"radialaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"scene": {
|
|
"xaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"yaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
},
|
|
"zaxis": {
|
|
"backgroundcolor": "#E5ECF6",
|
|
"gridcolor": "white",
|
|
"gridwidth": 2,
|
|
"linecolor": "white",
|
|
"showbackground": true,
|
|
"ticks": "",
|
|
"zerolinecolor": "white"
|
|
}
|
|
},
|
|
"shapedefaults": {
|
|
"line": {
|
|
"color": "#2a3f5f"
|
|
}
|
|
},
|
|
"ternary": {
|
|
"aaxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"baxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
},
|
|
"bgcolor": "#E5ECF6",
|
|
"caxis": {
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": ""
|
|
}
|
|
},
|
|
"title": {
|
|
"x": 0.05
|
|
},
|
|
"xaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
},
|
|
"yaxis": {
|
|
"automargin": true,
|
|
"gridcolor": "white",
|
|
"linecolor": "white",
|
|
"ticks": "",
|
|
"title": {
|
|
"standoff": 15
|
|
},
|
|
"zerolinecolor": "white",
|
|
"zerolinewidth": 2
|
|
}
|
|
}
|
|
},
|
|
"title": {
|
|
"text": "Matrice de corrélation des variables numériques"
|
|
},
|
|
"xaxis": {
|
|
"anchor": "y",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
},
|
|
"yaxis": {
|
|
"anchor": "x",
|
|
"autorange": "reversed",
|
|
"domain": [
|
|
0,
|
|
1
|
|
]
|
|
}
|
|
}
|
|
}
|
|
},
|
|
"metadata": {},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"vars_numeriques.corr()\n",
|
|
"fig = px.imshow(vars_numeriques.corr(),\n",
|
|
" text_auto=True,\n",
|
|
" aspect=\"auto\",\n",
|
|
" color_continuous_scale='RdBu_r',\n",
|
|
" title='Matrice de corrélation des variables numériques')\n",
|
|
"fig.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "98c7dba6",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** quels sont vos commentaires ?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "212209ec",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Preprocessing"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "65aca700",
|
|
"metadata": {},
|
|
"source": [
|
|
"Deux étapes sont nécessaires avant de lancer l'apprentissage d'un modèle, c'est ce qu'on connait comme le *Preprocessing* :\n",
|
|
"\n",
|
|
"* Les modèles proposés par la librairie \"sklearn\" ne gèrent que des variables numériques. Il est donc nécessaire de transformer les variables catégorielles en variables numériques : ce processus s'appelle le *One Hot Encoding*.\n",
|
|
"* Normaliser les données numériques"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "95f5cc9f",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant de réaliser le One Hot Encoding des variables catégorielles. Vous pourrez utiliser la fonction \"preproc.OneHotEncoder\" de la librairie sklearn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 38,
|
|
"id": "b8530717",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"encoder = preproc.OneHotEncoder(sparse_output=False, drop='first')\n",
|
|
"encoder.fit(vars_categorielles)\n",
|
|
"vars_categorielles_enc = encoder.transform(vars_categorielles)\n",
|
|
"vars_categorielles_enc = pd.DataFrame(vars_categorielles_enc, columns=encoder.get_feature_names_out()) # type: ignore"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b70abc5c",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant normaliser les variables numériques présentes dans la base. Vous pourrez utiliser la fonction \"preproc.StandardScaler\" de la librairie sklearn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 39,
|
|
"id": "4ff3847d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"scaler = preproc.StandardScaler()\n",
|
|
"scaler.fit(vars_numeriques)\n",
|
|
"vars_numeriques_scaled = scaler.transform(vars_numeriques)\n",
|
|
"vars_numeriques_scaled = pd.DataFrame(vars_numeriques_scaled, columns=vars_numeriques.columns)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "62d49546",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Sampling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "64d229f4",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant construire la base d'apprentissage (80% des données) et la base de test (20%)."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 40,
|
|
"id": "6a1c7907",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"train, test = train_test_split(data_model, test_size=0.2, random_state=42)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "84dc7a07",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Fitting"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "97c7b783",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant construire le modèle"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bd26339b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8d624704",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** proposez un bout de code permettant d'évaluer les performances du modèle (MAE, MSE et RMSE)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c4ca2cf9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "fb2fe98c",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** que pensez-vous des performances de ce modèle ?"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7ecba832",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Algorithme supervisé : Random Forest "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "efcb8987",
|
|
"metadata": {},
|
|
"source": [
|
|
"A ce stade, nous avons vu les différentes étapes pour lancer un algorithme de Machine Learning. Néanmoins, ces étapes ne sont pas suffisantes pour construire un modèle performant. \n",
|
|
"En effet, afin de construire un modèle performant le Data Scientist doit agir sur l'apprentissage du modèle. Dans ce qui suit nous :\n",
|
|
"* Changerons d'algorithme pour utiliser un algorithme plus performant (Random Forest)\n",
|
|
"* Raliserons un *grid search* sur les paramètres du modèle\n",
|
|
"* Appliquerons l'apprentissage par validation croisée\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "d6723a2f",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Modèle avec Validation Croisée"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3716b09f",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Sampling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ab1e1367",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3f5d735e",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Fitting avec Cross-Validation"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "bc819f8f",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** construisez un modèle RF (RandomForestRegressor) en implémentant la technique de validation croisée. Pensez à enregistrer au sein d'une variable/liste les performances (MAE, MSE & RMSE) du modèle au sein de chaque fold."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b515460e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Initialisation\n",
|
|
"# Nombre de sous-échantillons pour la cross-validation\n",
|
|
"num_splits = 5\n",
|
|
"\n",
|
|
"# Random Forest regressor\n",
|
|
"rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)\n",
|
|
"\n",
|
|
"# Initialisation du KFold cross-validation splitter\n",
|
|
"kf = KFold(n_splits=num_splits)\n",
|
|
"\n",
|
|
"# Listes pour enregistrer les performances du modèle\n",
|
|
"MAE_scores = []\n",
|
|
"MSE_scores = []\n",
|
|
"RMSE_scores = []"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "eebb394f",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Entrainement avec cross-validation\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "b067126c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Métriques sur tous les folds\n",
|
|
"\n",
|
|
"#MAE\n",
|
|
"for fold, mae in enumerate(MAE_scores, start=1):\n",
|
|
" print(f\"Fold {fold} MAE:\", mae)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6597152c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#MSE\n",
|
|
"for fold, mse in enumerate(MSE_scores, start=1):\n",
|
|
" print(f\"Fold {fold} MSE:\", mse)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "63ff1c9d",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#RMSE\n",
|
|
"for fold, rmse in enumerate(RMSE_scores, start=1):\n",
|
|
" print(f\"Fold {fold} RMSE:\", rmse)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ec1961c2",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** Commentez les résultats."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5a8163ef",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Ajout d'un Grid Search pour les hyper paramètres"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "5a6adbfe",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Sampling"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d9342ad6",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dce52b11",
|
|
"metadata": {},
|
|
"source": [
|
|
"#### Fitting avec Cross-Validation et *Grid Search*"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7e3a9dd0",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Exercice :** Intégrez la technique de Grid Search pour rechercher les paramètres optimaux du modèle."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "6d58dbc2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Initialisation\n",
|
|
"# Nombre de sous-échantillons pour la cross-validation\n",
|
|
"num_splits = 5\n",
|
|
"\n",
|
|
"# Initialisation du KFold cross-validation splitter\n",
|
|
"kf = KFold(n_splits=num_splits)\n",
|
|
"\n",
|
|
"# Listes pour enregistrer les performances du modèle\n",
|
|
"MAE_scores = []\n",
|
|
"MSE_scores = []\n",
|
|
"RMSE_scores = []\n",
|
|
"\n",
|
|
"# Hyperparamètres à tester\n",
|
|
"n_estimators_values = [] #Complétez ici par les paramètres à tester\n",
|
|
"max_depth_values = [] #Complétez ici par les paramètres à tester\n",
|
|
"min_samples_split_values = [] #Complétez ici par les paramètres à tester\n",
|
|
"\n",
|
|
"# Liste pour sauveagrder les meilleurs résultats\n",
|
|
"best_score = np.inf\n",
|
|
"best_params = {}\n",
|
|
"\n",
|
|
"MAE_best_score = []\n",
|
|
"MSE_best_score = []\n",
|
|
"RMSE_best_score = []"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "47da5172",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#Complétez ici avec votre code"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d4936c46",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Meilleurs résultats\n",
|
|
"print(\"Meilleurs paramètres:\", best_params)\n",
|
|
"print(\"Meilleure RMSE :\", best_score)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3215c463",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Métriques sur tous les folds\n",
|
|
"\n",
|
|
"#RMSE\n",
|
|
"for fold, rmse in enumerate(RMSE_best_score, start=1):\n",
|
|
" print(f\"Fold {fold} RMSE:\", rmse)\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bb9a5c9b",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#MAE\n",
|
|
"for fold, mse in enumerate(MSE_best_score, start=1):\n",
|
|
" print(f\"Fold {fold} MSE:\", mse)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "0f0768ad",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#MSE\n",
|
|
"for fold, mae in enumerate(MAE_best_score, start=1):\n",
|
|
" print(f\"Fold {fold} MAE:\", mae)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "802a625f",
|
|
"metadata": {},
|
|
"source": [
|
|
"**Question :** Commentez les résultats"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "studies",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.13.3"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|