{ "cells": [ { "cell_type": "markdown", "id": "8750d15b", "metadata": {}, "source": [ "# Cours 3 : Machine Learning - Algorithmes supervisés (1/2)" ] }, { "cell_type": "markdown", "id": "f7c08ae5", "metadata": {}, "source": [ "## Préambule" ] }, { "cell_type": "markdown", "id": "ec7ecb4b", "metadata": {}, "source": [ "Les objectifs de cette séance (3h) sont :\n", "* Préparation des bases de modélisation (sampling)\n", "* Mettre en application un modèle supervisé simple.\n", "* Construire un modèle de Machine Learning (cross-validation et hyperparamétrage) pour résoudre un problème de régression\n", "* Analyser les performances du modèle" ] }, { "cell_type": "markdown", "id": "4e99c600", "metadata": {}, "source": [ "## Préparation du workspace" ] }, { "cell_type": "markdown", "id": "c1b01045", "metadata": {}, "source": [ "### Import de librairies " ] }, { "cell_type": "code", "execution_count": null, "id": "97d58527", "metadata": {}, "outputs": [], "source": [ "# Données\n", "import numpy as np\n", "import pandas as pd\n", "\n", "#Graphiques\n", "import seaborn as sns\n", "\n", "sns.set()\n", "import plotly.express as px\n", "import plotly.graph_objects as gp\n", "import sklearn.preprocessing as preproc\n", "\n", "#Statistiques\n", "from scipy.stats import chi2_contingency\n", "from sklearn import metrics\n", "\n", "# Machine Learning\n", "from sklearn.cluster import KMeans\n", "import sklearn.metrics as metrics\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.model_selection import KFold, train_test_split\n", "from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor" ] }, { "cell_type": "markdown", "id": "06153286", "metadata": {}, "source": [ "### Définition des fonctions " ] }, { "cell_type": "code", "execution_count": null, "id": "c67db932", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "985e4e97", "metadata": {}, "source": [ "### Constantes" ] }, { "cell_type": "code", "execution_count": 91, "id": "c9597b48", "metadata": {}, "outputs": [], "source": [ "input_path = \"./1_inputs\"\n", "output_path = \"./2_outputs\"" ] }, { "cell_type": "markdown", "id": "b2b035d2", "metadata": {}, "source": [ "### Import des données" ] }, { "cell_type": "code", "execution_count": 92, "id": "8051b5f4", "metadata": {}, "outputs": [], "source": [ "path =input_path + '/base_retraitee.csv'\n", "data_retraitee = pd.read_csv(path,sep=\",\",decimal=\".\")" ] }, { "cell_type": "markdown", "id": "a2578ba1", "metadata": {}, "source": [ "## Algorithme supervisé : CART " ] }, { "cell_type": "markdown", "id": "aaa0b27d", "metadata": {}, "source": [ "Dans cette partie l'objectif est de construire un modèle simple (algorithme CART) afin de voir les différentes étapes nécessaire au lancement d'un modèle\n", "Nous modéliserons directement le coût des sinistres. " ] }, { "cell_type": "markdown", "id": "a0458a05", "metadata": {}, "source": [ "### Construction du modèle" ] }, { "cell_type": "markdown", "id": "b3715c37", "metadata": {}, "source": [ "La première étape est de calculer les côut moyen de chaque sinistre (target ou variable réponse). Cette variable sera la variable à prédire en fonction des variables explicatives." ] }, { "cell_type": "code", "execution_count": 93, "id": "c427a4b8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(824, 14)" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_model = data_retraitee.copy()\n", "\n", "# Filtre pour ne garder que les lignes qui ont un sinistre (NB > 0)\n", "data_model = data_model[data_model['NB'] > 0]\n", "\n", "# Calcul du cout moyen \"théorique\" des sinistres\n", "data_model[\"CM\"] = (data_model[\"CHARGE\"] / data_model[\"NB\"])\n", "data_model = data_model.drop(['CHARGE', 'NB', \"EXPO\"], axis=1)\n", "data_model.shape" ] }, { "cell_type": "markdown", "id": "e3e85088", "metadata": {}, "source": [ "**Exercice :** construisez les statistiques descriptives de la base utilisée." ] }, { "cell_type": "code", "execution_count": 94, "id": "c8fd3ee1", "metadata": {}, "outputs": [ { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "object", "type": "string" }, { "name": "ANNEE_CTR", "rawType": "float64", "type": "float" }, { "name": "CONTRAT_ANCIENNETE", "rawType": "object", "type": "unknown" }, { "name": "FREQUENCE_PAIEMENT_COTISATION", "rawType": "object", "type": "unknown" }, { "name": "GROUPE_KM", "rawType": "object", "type": "unknown" }, { "name": "ZONE_RISQUE", "rawType": "object", "type": "unknown" }, { "name": "AGE_ASSURE_PRINCIPAL", "rawType": "float64", "type": "float" }, { "name": "GENRE", "rawType": "object", "type": "unknown" }, { "name": "DEUXIEME_CONDUCTEUR", "rawType": "object", "type": "unknown" }, { "name": "ANCIENNETE_PERMIS", "rawType": "float64", "type": "float" }, { "name": "ANNEE_CONSTRUCTION", "rawType": "float64", "type": "float" }, { "name": "ENERGIE", "rawType": "object", "type": "unknown" }, { "name": "EQUIPEMENT_SECURITE", "rawType": "object", "type": "unknown" }, { "name": "VALEUR_DU_BIEN", "rawType": "object", "type": "unknown" }, { "name": "CM", "rawType": "float64", "type": "float" } ], "ref": "8d8166c3-6828-4361-92de-ebce2dadb512", "rows": [ [ "count", "824.0", "824", "824", "824", "824", "824.0", "824", "824", "824.0", "824.0", "824", "824", "824", "824.0" ], [ "unique", null, "5", "3", "4", "14", null, "2", "2", null, null, "3", "2", "6", null ], [ "top", null, "(0,1]", "MENSUEL", "[0;20000[", "C", null, "M", "False", null, null, "ESSENCE", "FAUX", "[10000;15000[", null ], [ "freq", null, "297", "398", "391", "269", null, "483", "663", null, null, "413", "517", "213", null ], [ "mean", "2018.384708737864", null, null, null, null, "44.383495145631066", null, null, "35.68810679611651", "2015.2123786407767", null, null, null, "4246.01697815534" ], [ "std", "1.515832735580178", null, null, null, null, "13.808216667998865", null, null, "19.370620845496358", "3.1637823115731556", null, null, null, "6869.61691660173" ], [ "min", "2016.0", null, null, null, null, "19.0", null, null, "1.0", "1998.0", null, null, null, "7.5" ], [ "25%", "2017.0", null, null, null, null, "34.0", null, null, "18.0", "2014.0", null, null, null, "1159.96125" ], [ "50%", "2018.0", null, null, null, null, "43.0", null, null, "35.0", "2016.0", null, null, null, "2541.6499999999996" ], [ "75%", "2020.0", null, null, null, null, "53.0", null, null, "53.0", "2017.0", null, null, null, "4193.797500000001" ], [ "max", "2021.0", null, null, null, null, "94.0", null, null, "70.0", "2021.0", null, null, null, "83421.85" ] ], "shape": { "columns": 14, "rows": 11 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ANNEE_CTRCONTRAT_ANCIENNETEFREQUENCE_PAIEMENT_COTISATIONGROUPE_KMZONE_RISQUEAGE_ASSURE_PRINCIPALGENREDEUXIEME_CONDUCTEURANCIENNETE_PERMISANNEE_CONSTRUCTIONENERGIEEQUIPEMENT_SECURITEVALEUR_DU_BIENCM
count824.000000824824824824824.000000824824824.000000824.000000824824824824.000000
uniqueNaN53414NaN22NaNNaN326NaN
topNaN(0,1]MENSUEL[0;20000[CNaNMFalseNaNNaNESSENCEFAUX[10000;15000[NaN
freqNaN297398391269NaN483663NaNNaN413517213NaN
mean2018.384709NaNNaNNaNNaN44.383495NaNNaN35.6881072015.212379NaNNaNNaN4246.016978
std1.515833NaNNaNNaNNaN13.808217NaNNaN19.3706213.163782NaNNaNNaN6869.616917
min2016.000000NaNNaNNaNNaN19.000000NaNNaN1.0000001998.000000NaNNaNNaN7.500000
25%2017.000000NaNNaNNaNNaN34.000000NaNNaN18.0000002014.000000NaNNaNNaN1159.961250
50%2018.000000NaNNaNNaNNaN43.000000NaNNaN35.0000002016.000000NaNNaNNaN2541.650000
75%2020.000000NaNNaNNaNNaN53.000000NaNNaN53.0000002017.000000NaNNaNNaN4193.797500
max2021.000000NaNNaNNaNNaN94.000000NaNNaN70.0000002021.000000NaNNaNNaN83421.850000
\n", "
" ], "text/plain": [ " ANNEE_CTR CONTRAT_ANCIENNETE FREQUENCE_PAIEMENT_COTISATION \\\n", "count 824.000000 824 824 \n", "unique NaN 5 3 \n", "top NaN (0,1] MENSUEL \n", "freq NaN 297 398 \n", "mean 2018.384709 NaN NaN \n", "std 1.515833 NaN NaN \n", "min 2016.000000 NaN NaN \n", "25% 2017.000000 NaN NaN \n", "50% 2018.000000 NaN NaN \n", "75% 2020.000000 NaN NaN \n", "max 2021.000000 NaN NaN \n", "\n", " GROUPE_KM ZONE_RISQUE AGE_ASSURE_PRINCIPAL GENRE DEUXIEME_CONDUCTEUR \\\n", "count 824 824 824.000000 824 824 \n", "unique 4 14 NaN 2 2 \n", "top [0;20000[ C NaN M False \n", "freq 391 269 NaN 483 663 \n", "mean NaN NaN 44.383495 NaN NaN \n", "std NaN NaN 13.808217 NaN NaN \n", "min NaN NaN 19.000000 NaN NaN \n", "25% NaN NaN 34.000000 NaN NaN \n", "50% NaN NaN 43.000000 NaN NaN \n", "75% NaN NaN 53.000000 NaN NaN \n", "max NaN NaN 94.000000 NaN NaN \n", "\n", " ANCIENNETE_PERMIS ANNEE_CONSTRUCTION ENERGIE EQUIPEMENT_SECURITE \\\n", "count 824.000000 824.000000 824 824 \n", "unique NaN NaN 3 2 \n", "top NaN NaN ESSENCE FAUX \n", "freq NaN NaN 413 517 \n", "mean 35.688107 2015.212379 NaN NaN \n", "std 19.370621 3.163782 NaN NaN \n", "min 1.000000 1998.000000 NaN NaN \n", "25% 18.000000 2014.000000 NaN NaN \n", "50% 35.000000 2016.000000 NaN NaN \n", "75% 53.000000 2017.000000 NaN NaN \n", "max 70.000000 2021.000000 NaN NaN \n", "\n", " VALEUR_DU_BIEN CM \n", "count 824 824.000000 \n", "unique 6 NaN \n", "top [10000;15000[ NaN \n", "freq 213 NaN \n", "mean NaN 4246.016978 \n", "std NaN 6869.616917 \n", "min NaN 7.500000 \n", "25% NaN 1159.961250 \n", "50% NaN 2541.650000 \n", "75% NaN 4193.797500 \n", "max NaN 83421.850000 " ] }, "execution_count": 94, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_model.describe(include='all')" ] }, { "cell_type": "markdown", "id": "92d6156a", "metadata": {}, "source": [ "#### Etude des corrélations parmi les variables explicatives" ] }, { "cell_type": "markdown", "id": "d7327570", "metadata": {}, "source": [ "**Question :** Selon vous, pourquoi faut-il s'intéresser à la corrélation des variables ? " ] }, { "cell_type": "markdown", "id": "475e141b", "metadata": {}, "source": [ "*Réponse*: Pour avoir un modèle qui fit mieux + déterminer un potentiel effet de causalité entre features et target + sélectionner certaines variables." ] }, { "cell_type": "code", "execution_count": 95, "id": "1b156435", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(824, 13)" ] }, "execution_count": 95, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_set = data_model.drop(\"CM\", axis=1)\n", "data_set.shape" ] }, { "cell_type": "code", "execution_count": 96, "id": "0ef0fcc0", "metadata": {}, "outputs": [], "source": [ "#Séparation en variables qualitatives ou catégorielles\n", "variables_na = []\n", "variables_numeriques = []\n", "variables_01 = []\n", "variables_categorielles = []\n", "for colu in data_set.columns:\n", " if True in data_set[colu].isna().unique() :\n", " variables_na.append(data_set[colu])\n", " else :\n", " if str(data_set[colu].dtypes) in [\"int32\",\"int64\",\"float64\"]:\n", " if len(data_set[colu].unique())==2 :\n", " variables_categorielles.append(data_set[colu])\n", " else :\n", " variables_numeriques.append(data_set[colu])\n", " else :\n", " if len(data_set[colu].unique())==2 :\n", " variables_categorielles.append(data_set[colu])\n", " else :\n", " variables_categorielles.append(data_set[colu])" ] }, { "cell_type": "markdown", "id": "e82fcade", "metadata": {}, "source": [ "##### Corrélation des variables catégorielles :" ] }, { "cell_type": "code", "execution_count": 97, "id": "e130aae5", "metadata": {}, "outputs": [], "source": [ "vars_categorielles = pd.DataFrame(variables_categorielles).transpose()" ] }, { "cell_type": "code", "execution_count": 123, "id": "c39e2ad0", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "coloraxis": "coloraxis", "hovertemplate": "x: %{x}
y: %{y}
color: %{z}", "name": "0", "texttemplate": "%{z:.2f}", "type": "heatmap", "x": [ "CONTRAT_ANCIENNETE", "FREQUENCE_PAIEMENT_COTISATION", "GROUPE_KM", "ZONE_RISQUE", "GENRE", "DEUXIEME_CONDUCTEUR", "ENERGIE", "EQUIPEMENT_SECURITE", "VALEUR_DU_BIEN" ], "xaxis": "x", "y": [ "CONTRAT_ANCIENNETE", "FREQUENCE_PAIEMENT_COTISATION", "GROUPE_KM", "ZONE_RISQUE", "GENRE", "DEUXIEME_CONDUCTEUR", "ENERGIE", "EQUIPEMENT_SECURITE", "VALEUR_DU_BIEN" ], "yaxis": "y", "z": { "bdata": "AAAAAAAA8D8AAAAAAAAAACoCGzzITrA/jS6+t390sj/aAKYMJa2eP5RMqUS3uZs/ytNpsBVXkz8AAAAAAAAAAJsekiMPM4I/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAAAAAAAAAAAABgNwyfFOK3Px3tLvtk1qI/VTS7w965nj/DbHQwNU6sP6xOyIjBVMQ/KwIbPMhOsD8AAAAAAAAAAAAAAAAAAPA/JGwWgOwjwz/Y12crRVC2P1AU8aUpk3Y/tZ25v8HgyT9++YWBDBq6PxMKBP1KAMk/ki6+t390sj8AAAAAAAAAACNsFoDsI8M/AAAAAAAA8D8AAAAAAAAAAOzpAHMW1bU/OToUIB5twT+gpoD1ZjrEP/5ATjN+vpg/0gCmDCWtnj9gNwyfFOK3P9jXZytFULY/AAAAAAAAAAAAAAAAAADwPwAAAAAAAAAA2p0N4q1bwz/UsLoqS0u5PxFqf8IHB9E/lEypRLe5mz8d7S77ZNaiP1AU8aUpk3Y/7OkAcxbVtT8AAAAAAAAAAAAAAAAAAPA/AAAAAAAAAAAAAAAAAAAAAOYlMsJ0brs/ytNpsBVXkz9RNLvD3rmeP7edub/B4Mk/OjoUIB5twT/anQ3irVvDPwAAAAAAAAAAAAAAAAAA8D8nEbUEUmnAP+SA2g/TvNE/AAAAAAAAAADDbHQwNU6sP335hYEMGro/oKaA9WY6xD/UsLoqS0u5PwAAAAAAAAAAJxG1BFJpwD8AAAAAAADwP+fmCf6XRco/mx6SIw8zgj+rTsiIwVTEPxIKBP1KAMk//kBOM36+mD8Ran/CBwfRP+YlMsJ0brs/5YDaD9O80T/n5gn+l0XKPwAAAAAAAPA/", "dtype": "f8", "shape": "9, 9" } } ], "layout": { "coloraxis": { "colorscale": [ [ 0, "rgb(5,48,97)" ], [ 0.1, "rgb(33,102,172)" ], [ 0.2, "rgb(67,147,195)" ], [ 0.3, "rgb(146,197,222)" ], [ 0.4, "rgb(209,229,240)" ], [ 0.5, "rgb(247,247,247)" ], [ 0.6, "rgb(253,219,199)" ], [ 0.7, "rgb(244,165,130)" ], [ 0.8, "rgb(214,96,77)" ], [ 0.9, "rgb(178,24,43)" ], [ 1, "rgb(103,0,31)" ] ] }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermap": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermap" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "Matrice de corrélation des variables catégorielles (V de Cramér)" }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ] }, "yaxis": { "anchor": "x", "autorange": "reversed", "domain": [ 0, 1 ] } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Matrice de corrélation pour les variables catégorielles (V de Cramér)\n", "def cramers_v(confusion_matrix):\n", " \"\"\"Calcule le V de Cramér à partir d'une matrice de contingence\"\"\"\n", " chi2 = chi2_contingency(confusion_matrix)[0]\n", " n = confusion_matrix.sum().sum()\n", " phi2 = chi2 / n\n", " r, k = confusion_matrix.shape\n", " phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))\n", " rcorr = r - ((r-1)**2)/(n-1)\n", " kcorr = k - ((k-1)**2)/(n-1)\n", " return np.sqrt(phi2corr / min((kcorr-1), (rcorr-1)))\n", "\n", "# Créer la matrice de corrélation\n", "categorical_cols = vars_categorielles.columns\n", "n_vars = len(categorical_cols)\n", "cramers_matrix = np.zeros((n_vars, n_vars))\n", "\n", "for i, col1 in enumerate(categorical_cols):\n", " for j, col2 in enumerate(categorical_cols):\n", " if i == j:\n", " cramers_matrix[i, j] = 1.0\n", " else:\n", " confusion_matrix = pd.crosstab(vars_categorielles[col1], vars_categorielles[col2])\n", " cramers_matrix[i, j] = cramers_v(confusion_matrix)\n", "\n", "# Créer le DataFrame de corrélation\n", "correlation_cat = pd.DataFrame(cramers_matrix,\n", " index=categorical_cols,\n", " columns=categorical_cols)\n", "\n", "# Visualiser avec Plotly\n", "fig = px.imshow(correlation_cat,\n", " text_auto='.2f', # type: ignore\n", " aspect=\"auto\",\n", " color_continuous_scale='RdBu_r',\n", " title='Matrice de corrélation des variables catégorielles (V de Cramér)')\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "8f615121", "metadata": {}, "source": [ "##### Corrélation des variables numériques :" ] }, { "cell_type": "code", "execution_count": 99, "id": "a16215ab", "metadata": {}, "outputs": [], "source": [ "vars_numeriques = pd.DataFrame(variables_numeriques).transpose()" ] }, { "cell_type": "code", "execution_count": 100, "id": "532ca6c4", "metadata": {}, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "config": { "plotlyServerURL": "https://plot.ly" }, "data": [ { "coloraxis": "coloraxis", "hovertemplate": "x: %{x}
y: %{y}
color: %{z}", "name": "0", "texttemplate": "%{z}", "type": "heatmap", "x": [ "ANNEE_CTR", "AGE_ASSURE_PRINCIPAL", "ANCIENNETE_PERMIS", "ANNEE_CONSTRUCTION" ], "xaxis": "x", "y": [ "ANNEE_CTR", "AGE_ASSURE_PRINCIPAL", "ANCIENNETE_PERMIS", "ANNEE_CONSTRUCTION" ], "yaxis": "y", "z": { "bdata": "AAAAAAAA8D+ybZcEUUCbP/CBLCtO46Q/qr2Q49LN2D+ybZcEUUCbPwAAAAAAAPA/slV7SAtP4T84L73yETWgv/CBLCtO46Q/slV7SAtP4T8AAAAAAADwP0I6y25dD6E/qr2Q49LN2D84L73yETWgv0I6y25dD6E/AAAAAAAA8D8=", "dtype": "f8", "shape": "4, 4" } } ], "layout": { "coloraxis": { "colorscale": [ [ 0, "rgb(5,48,97)" ], [ 0.1, "rgb(33,102,172)" ], [ 0.2, "rgb(67,147,195)" ], [ 0.3, "rgb(146,197,222)" ], [ 0.4, "rgb(209,229,240)" ], [ 0.5, "rgb(247,247,247)" ], [ 0.6, "rgb(253,219,199)" ], [ 0.7, "rgb(244,165,130)" ], [ 0.8, "rgb(214,96,77)" ], [ 0.9, "rgb(178,24,43)" ], [ 1, "rgb(103,0,31)" ] ] }, "template": { "data": { "bar": [ { "error_x": { "color": "#2a3f5f" }, "error_y": { "color": "#2a3f5f" }, "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "bar" } ], "barpolar": [ { "marker": { "line": { "color": "#E5ECF6", "width": 0.5 }, "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "barpolar" } ], "carpet": [ { "aaxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "baxis": { "endlinecolor": "#2a3f5f", "gridcolor": "white", "linecolor": "white", "minorgridcolor": "white", "startlinecolor": "#2a3f5f" }, "type": "carpet" } ], "choropleth": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "choropleth" } ], "contour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "contour" } ], "contourcarpet": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "contourcarpet" } ], "heatmap": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "heatmap" } ], "histogram": [ { "marker": { "pattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 } }, "type": "histogram" } ], "histogram2d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2d" } ], "histogram2dcontour": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "histogram2dcontour" } ], "mesh3d": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "type": "mesh3d" } ], "parcoords": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "parcoords" } ], "pie": [ { "automargin": true, "type": "pie" } ], "scatter": [ { "fillpattern": { "fillmode": "overlay", "size": 10, "solidity": 0.2 }, "type": "scatter" } ], "scatter3d": [ { "line": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatter3d" } ], "scattercarpet": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattercarpet" } ], "scattergeo": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergeo" } ], "scattergl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattergl" } ], "scattermap": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermap" } ], "scattermapbox": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scattermapbox" } ], "scatterpolar": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolar" } ], "scatterpolargl": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterpolargl" } ], "scatterternary": [ { "marker": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "type": "scatterternary" } ], "surface": [ { "colorbar": { "outlinewidth": 0, "ticks": "" }, "colorscale": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "type": "surface" } ], "table": [ { "cells": { "fill": { "color": "#EBF0F8" }, "line": { "color": "white" } }, "header": { "fill": { "color": "#C8D4E3" }, "line": { "color": "white" } }, "type": "table" } ] }, "layout": { "annotationdefaults": { "arrowcolor": "#2a3f5f", "arrowhead": 0, "arrowwidth": 1 }, "autotypenumbers": "strict", "coloraxis": { "colorbar": { "outlinewidth": 0, "ticks": "" } }, "colorscale": { "diverging": [ [ 0, "#8e0152" ], [ 0.1, "#c51b7d" ], [ 0.2, "#de77ae" ], [ 0.3, "#f1b6da" ], [ 0.4, "#fde0ef" ], [ 0.5, "#f7f7f7" ], [ 0.6, "#e6f5d0" ], [ 0.7, "#b8e186" ], [ 0.8, "#7fbc41" ], [ 0.9, "#4d9221" ], [ 1, "#276419" ] ], "sequential": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ], "sequentialminus": [ [ 0, "#0d0887" ], [ 0.1111111111111111, "#46039f" ], [ 0.2222222222222222, "#7201a8" ], [ 0.3333333333333333, "#9c179e" ], [ 0.4444444444444444, "#bd3786" ], [ 0.5555555555555556, "#d8576b" ], [ 0.6666666666666666, "#ed7953" ], [ 0.7777777777777778, "#fb9f3a" ], [ 0.8888888888888888, "#fdca26" ], [ 1, "#f0f921" ] ] }, "colorway": [ "#636efa", "#EF553B", "#00cc96", "#ab63fa", "#FFA15A", "#19d3f3", "#FF6692", "#B6E880", "#FF97FF", "#FECB52" ], "font": { "color": "#2a3f5f" }, "geo": { "bgcolor": "white", "lakecolor": "white", "landcolor": "#E5ECF6", "showlakes": true, "showland": true, "subunitcolor": "white" }, "hoverlabel": { "align": "left" }, "hovermode": "closest", "mapbox": { "style": "light" }, "paper_bgcolor": "white", "plot_bgcolor": "#E5ECF6", "polar": { "angularaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "radialaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "scene": { "xaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "yaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" }, "zaxis": { "backgroundcolor": "#E5ECF6", "gridcolor": "white", "gridwidth": 2, "linecolor": "white", "showbackground": true, "ticks": "", "zerolinecolor": "white" } }, "shapedefaults": { "line": { "color": "#2a3f5f" } }, "ternary": { "aaxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "baxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" }, "bgcolor": "#E5ECF6", "caxis": { "gridcolor": "white", "linecolor": "white", "ticks": "" } }, "title": { "x": 0.05 }, "xaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 }, "yaxis": { "automargin": true, "gridcolor": "white", "linecolor": "white", "ticks": "", "title": { "standoff": 15 }, "zerolinecolor": "white", "zerolinewidth": 2 } } }, "title": { "text": "Matrice de corrélation des variables numériques" }, "xaxis": { "anchor": "y", "domain": [ 0, 1 ] }, "yaxis": { "anchor": "x", "autorange": "reversed", "domain": [ 0, 1 ] } } } }, "metadata": {}, "output_type": "display_data" } ], "source": [ "vars_numeriques.corr()\n", "fig = px.imshow(vars_numeriques.corr(),\n", " text_auto=True,\n", " aspect=\"auto\",\n", " color_continuous_scale='RdBu_r',\n", " title='Matrice de corrélation des variables numériques')\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "98c7dba6", "metadata": {}, "source": [ "**Question :** quels sont vos commentaires ?" ] }, { "cell_type": "markdown", "id": "67406b54", "metadata": {}, "source": [ "*Réponse*: Aucune des variables ne semblent corrélées." ] }, { "cell_type": "markdown", "id": "212209ec", "metadata": {}, "source": [ "#### Preprocessing" ] }, { "cell_type": "markdown", "id": "65aca700", "metadata": {}, "source": [ "Deux étapes sont nécessaires avant de lancer l'apprentissage d'un modèle, c'est ce qu'on connait comme le *Preprocessing* :\n", "\n", "* Les modèles proposés par la librairie \"sklearn\" ne gèrent que des variables numériques. Il est donc nécessaire de transformer les variables catégorielles en variables numériques : ce processus s'appelle le *One Hot Encoding*.\n", "* Normaliser les données numériques" ] }, { "cell_type": "markdown", "id": "95f5cc9f", "metadata": {}, "source": [ "**Exercice :** proposez un bout de code permettant de réaliser le One Hot Encoding des variables catégorielles. Vous pourrez utiliser la fonction \"preproc.OneHotEncoder\" de la librairie sklearn" ] }, { "cell_type": "code", "execution_count": 101, "id": "b8530717", "metadata": {}, "outputs": [], "source": [ "encoder = preproc.OneHotEncoder()\n", "encoder.fit(vars_categorielles)\n", "vars_categorielles_enc = encoder.transform(vars_categorielles)\n", "vars_categorielles_enc = pd.DataFrame(vars_categorielles_enc.toarray(), columns=encoder.get_feature_names_out(vars_categorielles.columns)) # type: ignore" ] }, { "cell_type": "markdown", "id": "b70abc5c", "metadata": {}, "source": [ "**Exercice :** proposez un bout de code permettant normaliser les variables numériques présentes dans la base. Vous pourrez utiliser la fonction \"preproc.StandardScaler\" de la librairie sklearn" ] }, { "cell_type": "code", "execution_count": 102, "id": "4ff3847d", "metadata": {}, "outputs": [], "source": [ "scaler = preproc.StandardScaler()\n", "scaler.fit(vars_numeriques)\n", "vars_numeriques_scaled = scaler.transform(vars_numeriques)\n", "vars_numeriques_scaled = pd.DataFrame(vars_numeriques_scaled, columns=vars_numeriques.columns)" ] }, { "cell_type": "code", "execution_count": 117, "id": "128d4a36", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(824, 46)\n" ] }, { "data": { "application/vnd.microsoft.datawrangler.viewer.v0+json": { "columns": [ { "name": "index", "rawType": "int64", "type": "integer" }, { "name": "ANNEE_CTR", "rawType": "float64", "type": "float" }, { "name": "AGE_ASSURE_PRINCIPAL", "rawType": "float64", "type": "float" }, { "name": "ANCIENNETE_PERMIS", "rawType": "float64", "type": "float" }, { "name": "ANNEE_CONSTRUCTION", "rawType": "float64", "type": "float" }, { "name": "CONTRAT_ANCIENNETE_(-1,0]", "rawType": "float64", "type": "float" }, { "name": "CONTRAT_ANCIENNETE_(0,1]", "rawType": "float64", "type": "float" }, { "name": "CONTRAT_ANCIENNETE_(1,2]", "rawType": "float64", "type": "float" }, { "name": "CONTRAT_ANCIENNETE_(2,5]", "rawType": "float64", "type": "float" }, { "name": "CONTRAT_ANCIENNETE_(5,10]", "rawType": "float64", "type": "float" }, { "name": "FREQUENCE_PAIEMENT_COTISATION_ANNUEL", "rawType": "float64", "type": "float" }, { "name": "FREQUENCE_PAIEMENT_COTISATION_MENSUEL", "rawType": "float64", "type": "float" }, { "name": "FREQUENCE_PAIEMENT_COTISATION_TRIMESTRIEL", "rawType": "float64", "type": "float" }, { "name": "GROUPE_KM_[0;20000[", "rawType": "float64", "type": "float" }, { "name": "GROUPE_KM_[20000;40000[", "rawType": "float64", "type": "float" }, { "name": "GROUPE_KM_[40000;60000[", "rawType": "float64", "type": "float" }, { "name": "GROUPE_KM_[60000;99999[", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_A", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_B", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_C", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_D", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_E", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_F", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_G", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_H", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_I", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_J", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_K", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_L", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_M", "rawType": "float64", "type": "float" }, { "name": "ZONE_RISQUE_T", "rawType": "float64", "type": "float" }, { "name": "GENRE_F", "rawType": "float64", "type": "float" }, { "name": "GENRE_M", "rawType": "float64", "type": "float" }, { "name": "DEUXIEME_CONDUCTEUR_False", "rawType": "float64", "type": "float" }, { "name": "DEUXIEME_CONDUCTEUR_True", "rawType": "float64", "type": "float" }, { "name": "ENERGIE_AUTRE", "rawType": "float64", "type": "float" }, { "name": "ENERGIE_DIESEL", "rawType": "float64", "type": "float" }, { "name": "ENERGIE_ESSENCE", "rawType": "float64", "type": "float" }, { "name": "EQUIPEMENT_SECURITE_FAUX", "rawType": "float64", "type": "float" }, { "name": "EQUIPEMENT_SECURITE_VRAI", "rawType": "float64", "type": "float" }, { "name": "VALEUR_DU_BIEN_[0;10000[", "rawType": "float64", "type": "float" }, { "name": "VALEUR_DU_BIEN_[10000;15000[", "rawType": "float64", "type": "float" }, { "name": "VALEUR_DU_BIEN_[15000;20000[", "rawType": "float64", "type": "float" }, { "name": "VALEUR_DU_BIEN_[20000;25000[", "rawType": "float64", "type": "float" }, { "name": "VALEUR_DU_BIEN_[25000;35000[", "rawType": "float64", "type": "float" }, { "name": "VALEUR_DU_BIEN_[35000;99999[", "rawType": "float64", "type": "float" }, { "name": "CM", "rawType": "float64", "type": "float" } ], "ref": "85e30838-5a51-4c2c-8483-c3033e7d9195", "rows": [ [ "0", "0.40615626262983295", "-0.31764836563527515", "0.067767057718506", "0.5653698304986595", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1072.98" ], [ "1", "1.06626032654885", "-1.2596885906311412", "-1.1719751563806404", "0.8816391722032739", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "3750.0" ], [ "2", "0.40615626262983295", "-1.839405652167059", "-1.740190337842749", "0.5653698304986595", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1838.49" ], [ "3", "0.40615626262983295", "-0.31764836563527515", "0.48101446241822143", "0.8816391722032739", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "4892.74" ], [ "4", "-0.25394780128918387", "-1.7669410194750692", "-1.2752870075555691", "-0.38343819461518397", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "166.73" ], [ "5", "-0.9140518652082007", "-1.332153223323131", "-1.5335666354928914", "-0.6997075363197984", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "4859.58" ], [ "6", "-0.25394780128918387", "-0.31764836563527515", "-0.7587277516809249", "-0.38343819461518397", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "2160.98" ], [ "7", "-0.25394780128918387", "0.4069979612846219", "-0.34548034698120944", "-1.015976878024413", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "2316.165" ], [ "8", "-1.5741559291272176", "-0.8249007944792031", "-0.8103836772683893", "-0.38343819461518397", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1603.99" ], [ "9", "0.40615626262983295", "1.856290615124416", "0.7392940903555436", "0.8816391722032739", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1653.21" ], [ "10", "-0.25394780128918387", "1.7838259824324263", "1.4624770485800456", "0.5653698304986595", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "3537.32" ], [ "11", "0.40615626262983295", "-0.17271910025129572", "-0.34548034698120944", "0.24910048879404498", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1531.35" ], [ "12", "-0.9140518652082007", "0.2620686959006425", "0.6876381647680792", "-0.6997075363197984", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "26196.5" ], [ "13", "1.7263643904678667", "-1.0422946925551722", "-1.2236310819681047", "0.5653698304986595", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "8130.34" ], [ "14", "0.40615626262983295", "0.8417857574365601", "1.3075092718176524", "0.5653698304986595", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "7281.26" ], [ "15", "0.40615626262983295", "0.2620686959006425", "0.48101446241822143", "0.8816391722032739", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "835.17" ], [ "16", "1.06626032654885", "2.0736845132003854", "1.617444825342439", "0.8816391722032739", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "7598.7" ], [ "17", "-1.5741559291272176", "-0.24518373294328544", "-0.39713627256867384", "-1.9647849031382563", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "6518.33" ], [ "18", "-0.9140518652082007", "3.3780479016562", "0.9975737182928658", "-5.127478320184401", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "881.52" ], [ "19", "1.06626032654885", "-0.7524361617872134", "0.3260466856558282", "0.8816391722032739", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "3955.825" ], [ "20", "-1.5741559291272176", "0.9867150228205396", "0.2743907600683637", "0.5653698304986595", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "3309.14" ], [ "21", "1.06626032654885", "0.8417857574365601", "-0.13885664463135172", "0.8816391722032739", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "157.95" ], [ "22", "-1.5741559291272176", "0.9142503901285499", "1.255853346230188", "-0.38343819461518397", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "3073.62" ], [ "23", "0.40615626262983295", "2.7258662074282927", "1.51413297416751", "-0.38343819461518397", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "4719.99" ], [ "24", "0.40615626262983295", "0.2620686959006425", "0.3260466856558282", "0.5653698304986595", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "8519.2" ], [ "25", "0.40615626262983295", "-1.6220117540910899", "-1.2236310819681047", "-1.3322462197290275", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "3750.0" ], [ "26", "1.7263643904678667", "-0.24518373294328544", "-0.035544793456422856", "0.24910048879404498", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "819.0" ], [ "27", "-1.5741559291272176", "0.11713943051666309", "-0.39713627256867384", "-0.38343819461518397", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "5141.66" ], [ "28", "0.40615626262983295", "-0.6799715290952236", "-1.4302547843179625", "1.1979085139078884", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "8087.1" ], [ "29", "-0.25394780128918387", "-0.31764836563527515", "0.3260466856558282", "-1.015976878024413", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1985.24" ], [ "30", "-0.25394780128918387", "-1.2596885906311412", "-1.2236310819681047", "0.5653698304986595", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "166.73" ], [ "31", "-0.9140518652082007", "-1.4046178560151208", "-1.0686633052057115", "-0.38343819461518397", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1360.63" ], [ "32", "-0.25394780128918387", "-1.6220117540910899", "-1.3269429331430336", "-1.015976878024413", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1045.92" ], [ "33", "-0.9140518652082007", "-0.8973654271711928", "-1.2236310819681047", "-0.6997075363197984", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "3168.47" ], [ "34", "-1.5741559291272176", "-1.2596885906311412", "-1.3269429331430336", "-1.3322462197290275", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "3064.59" ], [ "35", "-0.9140518652082007", "0.8417857574365601", "-0.19051257021881615", "-0.38343819461518397", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1797.13" ], [ "36", "0.40615626262983295", "-0.46257763101925453", "0.48101446241822143", "0.5653698304986595", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "6445.05" ], [ "37", "0.40615626262983295", "0.33453332859263224", "-1.2752870075555691", "0.5653698304986595", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "6134.28" ], [ "38", "-1.5741559291272176", "-0.9698300598631825", "-1.0686633052057115", "-0.0671688529105695", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "16466.86" ], [ "39", "1.7263643904678667", "-0.8249007944792031", "-0.9136955284433181", "1.1979085139078884", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "3750.0" ], [ "40", "-0.9140518652082007", "-1.0422946925551722", "-1.120319230793176", "0.24910048879404498", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "8269.76" ], [ "41", "1.06626032654885", "0.5519272266686014", "-0.6554159005059961", "-0.38343819461518397", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "5018.84" ], [ "42", "0.40615626262983295", "-0.027789834867316315", "0.6876381647680792", "-1.015976878024413", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "3750.0" ], [ "43", "-0.25394780128918387", "-0.027789834867316315", "1.152541495055259", "0.24910048879404498", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1175.34" ], [ "44", "-0.9140518652082007", "2.2910784112763545", "1.6691007509299034", "-0.38343819461518397", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "759.22" ], [ "45", "-1.5741559291272176", "0.4069979612846219", "1.2041974206427235", "-0.0671688529105695", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "9043.6" ], [ "46", "1.06626032654885", "1.2765735535884983", "1.255853346230188", "0.24910048879404498", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "3750.0" ], [ "47", "-0.9140518652082007", "1.349038186280488", "-0.34548034698120944", "-3.5461316116613286", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1679.02" ], [ "48", "0.40615626262983295", "0.2620686959006425", "1.4624770485800456", "0.8816391722032739", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "1.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "6275.67" ], [ "49", "-0.25394780128918387", "0.04467479782467339", "1.4624770485800456", "0.5653698304986595", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "1.0", "0.0", "0.0", "0.0", "0.0", "1.0", "0.0", "7.5" ] ], "shape": { "columns": 46, "rows": 824 } }, "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ANNEE_CTRAGE_ASSURE_PRINCIPALANCIENNETE_PERMISANNEE_CONSTRUCTIONCONTRAT_ANCIENNETE_(-1,0]CONTRAT_ANCIENNETE_(0,1]CONTRAT_ANCIENNETE_(1,2]CONTRAT_ANCIENNETE_(2,5]CONTRAT_ANCIENNETE_(5,10]FREQUENCE_PAIEMENT_COTISATION_ANNUEL...ENERGIE_ESSENCEEQUIPEMENT_SECURITE_FAUXEQUIPEMENT_SECURITE_VRAIVALEUR_DU_BIEN_[0;10000[VALEUR_DU_BIEN_[10000;15000[VALEUR_DU_BIEN_[15000;20000[VALEUR_DU_BIEN_[20000;25000[VALEUR_DU_BIEN_[25000;35000[VALEUR_DU_BIEN_[35000;99999[CM
00.406156-0.3176480.0677670.5653700.01.00.00.00.00.0...1.00.01.00.00.01.00.00.00.01072.980
11.066260-1.259689-1.1719750.8816391.00.00.00.00.00.0...0.01.00.00.00.00.00.00.01.03750.000
20.406156-1.839406-1.7401900.5653701.00.00.00.00.00.0...1.00.01.01.00.00.00.00.00.01838.490
30.406156-0.3176480.4810140.8816391.00.00.00.00.00.0...0.01.00.00.00.01.00.00.00.04892.740
4-0.253948-1.766941-1.275287-0.3834380.00.01.00.00.00.0...1.01.00.00.00.00.00.01.00.0166.730
..................................................................
819-0.9140520.4069980.894262-2.5973240.00.01.00.00.00.0...0.00.01.00.00.00.01.00.00.01216.755
820-0.2539480.4069981.5657890.2491000.01.00.00.00.00.0...1.01.00.00.01.00.00.00.00.02071.560
8210.406156-1.766941-1.5335670.5653700.00.01.00.00.00.0...0.00.01.00.00.01.00.00.00.05077.640
822-0.253948-1.766941-1.275287-1.6485160.01.00.00.00.00.0...0.01.00.00.01.00.00.00.00.05228.550
8231.0662600.4069980.0677670.5653700.00.00.01.00.00.0...0.01.00.00.00.01.00.00.00.05880.340
\n", "

824 rows × 46 columns

\n", "
" ], "text/plain": [ " ANNEE_CTR AGE_ASSURE_PRINCIPAL ANCIENNETE_PERMIS ANNEE_CONSTRUCTION \\\n", "0 0.406156 -0.317648 0.067767 0.565370 \n", "1 1.066260 -1.259689 -1.171975 0.881639 \n", "2 0.406156 -1.839406 -1.740190 0.565370 \n", "3 0.406156 -0.317648 0.481014 0.881639 \n", "4 -0.253948 -1.766941 -1.275287 -0.383438 \n", ".. ... ... ... ... \n", "819 -0.914052 0.406998 0.894262 -2.597324 \n", "820 -0.253948 0.406998 1.565789 0.249100 \n", "821 0.406156 -1.766941 -1.533567 0.565370 \n", "822 -0.253948 -1.766941 -1.275287 -1.648516 \n", "823 1.066260 0.406998 0.067767 0.565370 \n", "\n", " CONTRAT_ANCIENNETE_(-1,0] CONTRAT_ANCIENNETE_(0,1] \\\n", "0 0.0 1.0 \n", "1 1.0 0.0 \n", "2 1.0 0.0 \n", "3 1.0 0.0 \n", "4 0.0 0.0 \n", ".. ... ... \n", "819 0.0 0.0 \n", "820 0.0 1.0 \n", "821 0.0 0.0 \n", "822 0.0 1.0 \n", "823 0.0 0.0 \n", "\n", " CONTRAT_ANCIENNETE_(1,2] CONTRAT_ANCIENNETE_(2,5] \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 0.0 0.0 \n", "4 1.0 0.0 \n", ".. ... ... \n", "819 1.0 0.0 \n", "820 0.0 0.0 \n", "821 1.0 0.0 \n", "822 0.0 0.0 \n", "823 0.0 1.0 \n", "\n", " CONTRAT_ANCIENNETE_(5,10] FREQUENCE_PAIEMENT_COTISATION_ANNUEL ... \\\n", "0 0.0 0.0 ... \n", "1 0.0 0.0 ... \n", "2 0.0 0.0 ... \n", "3 0.0 0.0 ... \n", "4 0.0 0.0 ... \n", ".. ... ... ... \n", "819 0.0 0.0 ... \n", "820 0.0 0.0 ... \n", "821 0.0 0.0 ... \n", "822 0.0 0.0 ... \n", "823 0.0 0.0 ... \n", "\n", " ENERGIE_ESSENCE EQUIPEMENT_SECURITE_FAUX EQUIPEMENT_SECURITE_VRAI \\\n", "0 1.0 0.0 1.0 \n", "1 0.0 1.0 0.0 \n", "2 1.0 0.0 1.0 \n", "3 0.0 1.0 0.0 \n", "4 1.0 1.0 0.0 \n", ".. ... ... ... \n", "819 0.0 0.0 1.0 \n", "820 1.0 1.0 0.0 \n", "821 0.0 0.0 1.0 \n", "822 0.0 1.0 0.0 \n", "823 0.0 1.0 0.0 \n", "\n", " VALEUR_DU_BIEN_[0;10000[ VALEUR_DU_BIEN_[10000;15000[ \\\n", "0 0.0 0.0 \n", "1 0.0 0.0 \n", "2 1.0 0.0 \n", "3 0.0 0.0 \n", "4 0.0 0.0 \n", ".. ... ... \n", "819 0.0 0.0 \n", "820 0.0 1.0 \n", "821 0.0 0.0 \n", "822 0.0 1.0 \n", "823 0.0 0.0 \n", "\n", " VALEUR_DU_BIEN_[15000;20000[ VALEUR_DU_BIEN_[20000;25000[ \\\n", "0 1.0 0.0 \n", "1 0.0 0.0 \n", "2 0.0 0.0 \n", "3 1.0 0.0 \n", "4 0.0 0.0 \n", ".. ... ... \n", "819 0.0 1.0 \n", "820 0.0 0.0 \n", "821 1.0 0.0 \n", "822 0.0 0.0 \n", "823 1.0 0.0 \n", "\n", " VALEUR_DU_BIEN_[25000;35000[ VALEUR_DU_BIEN_[35000;99999[ CM \n", "0 0.0 0.0 1072.980 \n", "1 0.0 1.0 3750.000 \n", "2 0.0 0.0 1838.490 \n", "3 0.0 0.0 4892.740 \n", "4 1.0 0.0 166.730 \n", ".. ... ... ... \n", "819 0.0 0.0 1216.755 \n", "820 0.0 0.0 2071.560 \n", "821 0.0 0.0 5077.640 \n", "822 0.0 0.0 5228.550 \n", "823 0.0 0.0 5880.340 \n", "\n", "[824 rows x 46 columns]" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Concatenate the transformed variables\n", "data_model_preprocessed = pd.concat([vars_numeriques_scaled, vars_categorielles_enc], axis=1) # type: ignore\n", "\n", "# Add the CM column (target variable) to get 824x46 shape\n", "data_model_preprocessed['CM'] = data_model['CM'].values\n", "\n", "print(data_model_preprocessed.shape)\n", "data_model_preprocessed" ] }, { "cell_type": "markdown", "id": "62d49546", "metadata": {}, "source": [ "#### Sampling" ] }, { "cell_type": "markdown", "id": "64d229f4", "metadata": {}, "source": [ "**Exercice :** proposez un bout de code permettant construire la base d'apprentissage (80% des données) et la base de test (20%)." ] }, { "cell_type": "code", "execution_count": 118, "id": "6a1c7907", "metadata": {}, "outputs": [], "source": [ "train, test = train_test_split(data_model_preprocessed, test_size=0.2, random_state=42)" ] }, { "cell_type": "markdown", "id": "84dc7a07", "metadata": {}, "source": [ "#### Fitting" ] }, { "cell_type": "markdown", "id": "97c7b783", "metadata": {}, "source": [ "**Exercice :** proposez un bout de code permettant construire le modèle" ] }, { "cell_type": "code", "execution_count": 121, "id": "053e013c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
DecisionTreeRegressor(max_depth=5, random_state=42)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "DecisionTreeRegressor(max_depth=5, random_state=42)" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tree = DecisionTreeRegressor(max_depth=5, random_state=42)\n", "tree.fit(train.drop(\"CM\", axis=1), train[\"CM\"])" ] }, { "cell_type": "markdown", "id": "8d624704", "metadata": {}, "source": [ "**Exercice :** proposez un bout de code permettant d'évaluer les performances du modèle (MAE, MSE et RMSE)" ] }, { "cell_type": "code", "execution_count": 125, "id": "c4ca2cf9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "MAE: 3683.84\n", "MSE: 55216550.75\n", "RMSE: 7430.78\n" ] } ], "source": [ "y_pred = tree.predict(test.drop(\"CM\", axis=1))\n", "\n", "mae = metrics.mean_absolute_error(test[\"CM\"], y_pred)\n", "mse = metrics.mean_squared_error(test[\"CM\"], y_pred)\n", "rmse = metrics.root_mean_squared_error(test[\"CM\"], y_pred)\n", "\n", "print(f\"MAE: {mae:.2f}\")\n", "print(f\"MSE: {mse:.2f}\")\n", "print(f\"RMSE: {rmse:.2f}\")" ] }, { "cell_type": "markdown", "id": "fb2fe98c", "metadata": {}, "source": [ "**Question :** que pensez-vous des performances de ce modèle ?" ] }, { "cell_type": "markdown", "id": "7ecba832", "metadata": {}, "source": [ "## Algorithme supervisé : Random Forest " ] }, { "cell_type": "markdown", "id": "efcb8987", "metadata": {}, "source": [ "A ce stade, nous avons vu les différentes étapes pour lancer un algorithme de Machine Learning. Néanmoins, ces étapes ne sont pas suffisantes pour construire un modèle performant. \n", "En effet, afin de construire un modèle performant le Data Scientist doit agir sur l'apprentissage du modèle. Dans ce qui suit nous :\n", "* Changerons d'algorithme pour utiliser un algorithme plus performant (Random Forest)\n", "* Raliserons un *grid search* sur les paramètres du modèle\n", "* Appliquerons l'apprentissage par validation croisée\n" ] }, { "cell_type": "markdown", "id": "d6723a2f", "metadata": {}, "source": [ "### Modèle avec Validation Croisée" ] }, { "cell_type": "markdown", "id": "3716b09f", "metadata": {}, "source": [ "#### Sampling" ] }, { "cell_type": "code", "execution_count": null, "id": "ab1e1367", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "3f5d735e", "metadata": {}, "source": [ "#### Fitting avec Cross-Validation" ] }, { "cell_type": "markdown", "id": "bc819f8f", "metadata": {}, "source": [ "**Exercice :** construisez un modèle RF (RandomForestRegressor) en implémentant la technique de validation croisée. Pensez à enregistrer au sein d'une variable/liste les performances (MAE, MSE & RMSE) du modèle au sein de chaque fold." ] }, { "cell_type": "code", "execution_count": 106, "id": "b515460e", "metadata": {}, "outputs": [], "source": [ "#Initialisation\n", "# Nombre de sous-échantillons pour la cross-validation\n", "num_splits = 5\n", "\n", "# Random Forest regressor\n", "rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)\n", "\n", "# Initialisation du KFold cross-validation splitter\n", "kf = KFold(n_splits=num_splits)\n", "\n", "# Listes pour enregistrer les performances du modèle\n", "MAE_scores = []\n", "MSE_scores = []\n", "RMSE_scores = []" ] }, { "cell_type": "code", "execution_count": 107, "id": "eebb394f", "metadata": {}, "outputs": [], "source": [ "# Entrainement avec cross-validation\n" ] }, { "cell_type": "code", "execution_count": 108, "id": "b067126c", "metadata": {}, "outputs": [], "source": [ "# Métriques sur tous les folds\n", "\n", "#MAE\n", "for fold, mae in enumerate(MAE_scores, start=1):\n", " print(f\"Fold {fold} MAE:\", mae)" ] }, { "cell_type": "code", "execution_count": 109, "id": "6597152c", "metadata": {}, "outputs": [], "source": [ "#MSE\n", "for fold, mse in enumerate(MSE_scores, start=1):\n", " print(f\"Fold {fold} MSE:\", mse)" ] }, { "cell_type": "code", "execution_count": 110, "id": "63ff1c9d", "metadata": {}, "outputs": [], "source": [ "#RMSE\n", "for fold, rmse in enumerate(RMSE_scores, start=1):\n", " print(f\"Fold {fold} RMSE:\", rmse)" ] }, { "cell_type": "markdown", "id": "ec1961c2", "metadata": {}, "source": [ "**Question :** Commentez les résultats." ] }, { "cell_type": "markdown", "id": "5a8163ef", "metadata": {}, "source": [ "### Ajout d'un Grid Search pour les hyper paramètres" ] }, { "cell_type": "markdown", "id": "5a6adbfe", "metadata": {}, "source": [ "#### Sampling" ] }, { "cell_type": "code", "execution_count": null, "id": "d9342ad6", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "dce52b11", "metadata": {}, "source": [ "#### Fitting avec Cross-Validation et *Grid Search*" ] }, { "cell_type": "markdown", "id": "7e3a9dd0", "metadata": {}, "source": [ "**Exercice :** Intégrez la technique de Grid Search pour rechercher les paramètres optimaux du modèle." ] }, { "cell_type": "code", "execution_count": 111, "id": "6d58dbc2", "metadata": {}, "outputs": [], "source": [ "#Initialisation\n", "# Nombre de sous-échantillons pour la cross-validation\n", "num_splits = 5\n", "\n", "# Initialisation du KFold cross-validation splitter\n", "kf = KFold(n_splits=num_splits)\n", "\n", "# Listes pour enregistrer les performances du modèle\n", "MAE_scores = []\n", "MSE_scores = []\n", "RMSE_scores = []\n", "\n", "# Hyperparamètres à tester\n", "n_estimators_values = [] #Complétez ici par les paramètres à tester\n", "max_depth_values = [] #Complétez ici par les paramètres à tester\n", "min_samples_split_values = [] #Complétez ici par les paramètres à tester\n", "\n", "# Liste pour sauveagrder les meilleurs résultats\n", "best_score = np.inf\n", "best_params = {}\n", "\n", "MAE_best_score = []\n", "MSE_best_score = []\n", "RMSE_best_score = []" ] }, { "cell_type": "code", "execution_count": 112, "id": "47da5172", "metadata": {}, "outputs": [], "source": [ "#Complétez ici avec votre code" ] }, { "cell_type": "code", "execution_count": 113, "id": "d4936c46", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Meilleurs paramètres: {}\n", "Meilleure RMSE : inf\n" ] } ], "source": [ "# Meilleurs résultats\n", "print(\"Meilleurs paramètres:\", best_params)\n", "print(\"Meilleure RMSE :\", best_score)" ] }, { "cell_type": "code", "execution_count": 114, "id": "3215c463", "metadata": {}, "outputs": [], "source": [ "# Métriques sur tous les folds\n", "\n", "#RMSE\n", "for fold, rmse in enumerate(RMSE_best_score, start=1):\n", " print(f\"Fold {fold} RMSE:\", rmse)\n" ] }, { "cell_type": "code", "execution_count": 115, "id": "bb9a5c9b", "metadata": {}, "outputs": [], "source": [ "#MAE\n", "for fold, mse in enumerate(MSE_best_score, start=1):\n", " print(f\"Fold {fold} MSE:\", mse)" ] }, { "cell_type": "code", "execution_count": 116, "id": "0f0768ad", "metadata": {}, "outputs": [], "source": [ "#MSE\n", "for fold, mae in enumerate(MAE_best_score, start=1):\n", " print(f\"Fold {fold} MAE:\", mae)" ] }, { "cell_type": "markdown", "id": "802a625f", "metadata": {}, "source": [ "**Question :** Commentez les résultats" ] } ], "metadata": { "kernelspec": { "display_name": "studies", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.3" } }, "nbformat": 4, "nbformat_minor": 5 }