mirror of
https://github.com/ArthurDanjou/ArtStudies.git
synced 2026-01-14 20:59:57 +01:00
Checkpoint from VS Code for coding agent session
This commit is contained in:
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
14237
M2/Machine Learning/TP_2/1_inputs/base_retraitee.csv
Normal file
14237
M2/Machine Learning/TP_2/1_inputs/base_retraitee.csv
Normal file
File diff suppressed because it is too large
Load Diff
541
M2/Machine Learning/TP_2/2025_TP_2_M2_ISF.ipynb
Normal file
541
M2/Machine Learning/TP_2/2025_TP_2_M2_ISF.ipynb
Normal file
@@ -0,0 +1,541 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8750d15b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Cours 2 : Algorithmes non supervisés "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f7c08ae5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Préambule"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ec7ecb4b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Les objectifs de cette séance (3h) sont :\n",
|
||||
"* Mettre en application un modèle non-supervisé (K-means et C.A.H)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4e99c600",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Préparation du workspace"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c1b01045",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Import de librairies "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "97d58527",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Données\n",
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"#Graphiques \n",
|
||||
"import seaborn as sns\n",
|
||||
"sns.set()\n",
|
||||
"import plotly.express as px\n",
|
||||
"import plotly.graph_objects as gp\n",
|
||||
"\n",
|
||||
"#Statistiques\n",
|
||||
"from scipy.stats import chi2_contingency\n",
|
||||
"\n",
|
||||
"# Machine Learning\n",
|
||||
"from sklearn.cluster import KMeans\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from scipy.cluster.hierarchy import dendrogram, linkage\n",
|
||||
"from sklearn.cluster import AgglomerativeClustering"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "985e4e97",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Constantes"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "c9597b48",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"input_path = \"./1_inputs\"\n",
|
||||
"output_path = \"./2_outputs\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2ff398d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Exercice (implémentation des exercices du support de cours)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "ea2a0164",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Défintion de E\n",
|
||||
"x = #Complétez avec votre code\n",
|
||||
"\n",
|
||||
"#Représentation graphique\n",
|
||||
"y=[0, 0, 0, 0, 0]\n",
|
||||
"plt.scatter(x, y)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5e4abc23",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### K-means : Question 1 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5dea6f90",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Déterminer la partition optimale par k-means en prenant pour centres initiaux les éléments 1, 2, 18**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "41cc10ba",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Définition des centres initiaux\n",
|
||||
"init_points= #Complétez avec votre code\n",
|
||||
"\n",
|
||||
"#Itinitialisation algo\n",
|
||||
"kmeans = KMeans(init=init_points.reshape(-1,1),\n",
|
||||
" n_clusters=#Complétez avec votre code,\n",
|
||||
" n_init = 1) \n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "54857e7b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Transformation des données : plusieurs échantillons de 1 dimension\n",
|
||||
"data_x = np.array(x)\n",
|
||||
"data_x = data_x.reshape(-1,1)\n",
|
||||
"\n",
|
||||
"# Fitting \n",
|
||||
"kmeans.fit(data_x)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "72efd783",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Centroides finaux\n",
|
||||
"final_centroids = kmeans.cluster_centers_\n",
|
||||
"labels = kmeans.labels_\n",
|
||||
"\n",
|
||||
"final_centroids"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "3110c8ca",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Représentation Graphique \n",
|
||||
"plt.scatter(x, y, c=labels, cmap='viridis')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a24927bc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### K-means : Question 2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c18297ba",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Déterminer la partition optimale par k-means en prenant pour centres initiaux les éléments 18, 20, 31**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "d0ccbcf3-a06f-4757-bdd8-2cc3bd1626c6",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "b957bbe8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "2b85bc73",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### K-means : Question 3"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0c085473",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Déterminer la partition optimale par k-means en prenant comme partition initiale {{1},{2,18},{20,31}}**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "0047b80a",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5eaad20e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Classification Ascendante Hiérarchique avec le lien simple"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1ebaaa05",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Défintion de E\n",
|
||||
"x = #Complétez avec votre code\n",
|
||||
"\n",
|
||||
"#Représentation graphique\n",
|
||||
"y=[0, 0, 0, 0,0,0,0]\n",
|
||||
"plt.scatter(x, y)\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5e96f7f3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Calcul de CAH avec lien simple\n",
|
||||
"data = list(zip(x))\n",
|
||||
"\n",
|
||||
"linkage_data = linkage(data, \n",
|
||||
" method=#Complétez avec votre code , \n",
|
||||
" metric=#Complétez avec votre code)\n",
|
||||
"\n",
|
||||
"dendrogram(linkage_data, labels=x)\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "874c878c",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Calcul de la partition de l'espace \n",
|
||||
"hierarchical_cluster = AgglomerativeClustering(n_clusters=#Complétez avec votre code, \n",
|
||||
" affinity=#Complétez avec votre code, \n",
|
||||
" linkage=#Complétez avec votre code)\n",
|
||||
"\n",
|
||||
"labels = hierarchical_cluster.fit_predict(data) \n",
|
||||
"print(labels)\n",
|
||||
"\n",
|
||||
"#Représentation Graphique \n",
|
||||
"plt.scatter(x, y, c=labels, cmap='viridis')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "75420ae4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Classification Ascendante Hiérarchique avec le lien complet"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8f098bc3",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "99bc3508",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## K-means: Cas pratique"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b2b035d2",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Import des données"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "8051b5f4",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"path =input_path + '/base_retraitee.csv'\n",
|
||||
"data_retraitee = pd.read_csv(path,sep=\",\",decimal=\".\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "aeff9cff",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Exercice :** Regrouper les zones géographiques en 5 zones homogènes en termes :\n",
|
||||
"* Fréquence de sinistres (La fréquence est égale au Nombre de sinistres divisé par l'exposition)\n",
|
||||
"* Charge \n",
|
||||
"* Fréquence de sinistres x Charge \n",
|
||||
" \n",
|
||||
"A chaque fois :\n",
|
||||
"* Afficher les coordonnées des centroïdes\n",
|
||||
"* Représenter graphiquement la partition obtenue"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1c4333b8",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Regroupement de zones selon la fréquence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6e35f286",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9c738659",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Regroupement de zones selon le coût moyen"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f461bfb8",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6b154f4a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Regroupement de zones selon (fréquence; le coût moyen)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "1d89f70e",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f1cac03f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## C.A.H : Cas pratique"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bffff328",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Exercice :** Comparer les résultats obtenus via K-means à ceux d'une C.A.H (lien simple) pour la fréquence et (fréquence; coût moyen)\n",
|
||||
" \n",
|
||||
"A chaque fois :\n",
|
||||
"* Tracer le dendrogramme associé\n",
|
||||
"* Représenter graphiquement la partition obtenue"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8453bf02",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Regroupement de zones selon la fréquence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "341bf2b2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6ace7bc5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Regroupement de zones selon (fréquence; le coût moyen)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "16103b5b",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "12961201",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Application : création de Model Points (K-means)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6567c970",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Dans certains cas, il se peut que la modélisation ligne par ligne ne soit pas adaptée. C'est le cas des produits collectifs en assurance ou lorsque le nombre d'individus est trop important. \n",
|
||||
"Dans ce cas de figure, il est nécessaire d'agréger l'information afin d'avoir des \"individus type\". Chacun de ces individus est appelé *Model Point*. \n",
|
||||
"L'algorithme des k-means peut s'avérer utile pour le regroupement d'individus sous forme de *Mode Points* lorsque les variables explicatives sont numériques. \n",
|
||||
" \n",
|
||||
"Afin d'illustre ce propos, nous agrègerons la base de données selon les variables ANNEE_CTR, AGE_ASSURE_PRINCIPAL, ANCIENNETE_PERMIS et ANNEE_CONSTRUCTION afin de créer 100 Model Points. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "a250bff9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Exercice :** Construire la nouvelle base de modélisation (les nouveaux individus deviennent les Model Points et chacune de modalités devient le centroïde de la classe)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "5b42c2b5",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Complétez avec votre code"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.11.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
Reference in New Issue
Block a user