mirror of
https://github.com/ArthurDanjou/ArtStudies.git
synced 2026-01-25 19:52:37 +01:00
1900 lines
183 KiB
Plaintext
1900 lines
183 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"# TP4 Ridge, Lasso, CV\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"### Table of Contents\n",
|
||
"\n",
|
||
"* [0. Data Preparation ](#chapter0)\n",
|
||
"* [1. Ridge and Lasso Regression ](#chapter1)\n",
|
||
"* [2. Cross validation for the hyperparameters $\\alpha$ of Ridge and Lasso](#chapter2)\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 0. Data Preparation <a class=\"anchor\" id=\"chapter0\"></a>\n",
|
||
"\n",
|
||
"We will predict the salary of a baseball player and use the dataset `Hitters`.\n",
|
||
"\n",
|
||
"Reference : book \"James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning. Vol. 112. New York: springer, 2013\"."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-03-19T09:18:59.373219Z",
|
||
"start_time": "2025-03-19T09:18:59.369013Z"
|
||
}
|
||
},
|
||
"source": [
|
||
"\n",
|
||
"import warnings\n",
|
||
"\n",
|
||
"warnings.filterwarnings('ignore')"
|
||
],
|
||
"outputs": [],
|
||
"execution_count": 1
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-03-19T09:19:03.853918Z",
|
||
"start_time": "2025-03-19T09:19:02.315325Z"
|
||
}
|
||
},
|
||
"source": [
|
||
"import numpy as np\n",
|
||
"import pandas as pd # dataframes are in pandas \n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"\n",
|
||
"hitters = pd.read_csv(\"data/Hitters.csv\", index_col=\"Name\")\n",
|
||
"\n",
|
||
"hitters"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
" AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits \\\n",
|
||
"Name \n",
|
||
"-Andy Allanson 293 66 1 30 29 14 1 293 66 \n",
|
||
"-Alan Ashby 315 81 7 24 38 39 14 3449 835 \n",
|
||
"-Alvin Davis 479 130 18 66 72 76 3 1624 457 \n",
|
||
"-Andre Dawson 496 141 20 65 78 37 11 5628 1575 \n",
|
||
"-Andres Galarraga 321 87 10 39 42 30 2 396 101 \n",
|
||
"... ... ... ... ... ... ... ... ... ... \n",
|
||
"-Willie McGee 497 127 7 65 48 37 5 2703 806 \n",
|
||
"-Willie Randolph 492 136 5 76 50 94 12 5511 1511 \n",
|
||
"-Wayne Tolleson 475 126 3 61 43 52 6 1700 433 \n",
|
||
"-Willie Upshaw 573 144 9 85 60 78 8 3198 857 \n",
|
||
"-Willie Wilson 631 170 9 77 44 31 11 4908 1457 \n",
|
||
"\n",
|
||
" CHmRun CRuns CRBI CWalks League Division PutOuts \\\n",
|
||
"Name \n",
|
||
"-Andy Allanson 1 30 29 14 A E 446 \n",
|
||
"-Alan Ashby 69 321 414 375 N W 632 \n",
|
||
"-Alvin Davis 63 224 266 263 A W 880 \n",
|
||
"-Andre Dawson 225 828 838 354 N E 200 \n",
|
||
"-Andres Galarraga 12 48 46 33 N E 805 \n",
|
||
"... ... ... ... ... ... ... ... \n",
|
||
"-Willie McGee 32 379 311 138 N E 325 \n",
|
||
"-Willie Randolph 39 897 451 875 A E 313 \n",
|
||
"-Wayne Tolleson 7 217 93 146 A W 37 \n",
|
||
"-Willie Upshaw 97 470 420 332 A E 1314 \n",
|
||
"-Willie Wilson 30 775 357 249 A W 408 \n",
|
||
"\n",
|
||
" Assists Errors Salary NewLeague \n",
|
||
"Name \n",
|
||
"-Andy Allanson 33 20 NaN A \n",
|
||
"-Alan Ashby 43 10 475.0 N \n",
|
||
"-Alvin Davis 82 14 480.0 A \n",
|
||
"-Andre Dawson 11 3 500.0 N \n",
|
||
"-Andres Galarraga 40 4 91.5 N \n",
|
||
"... ... ... ... ... \n",
|
||
"-Willie McGee 9 3 700.0 N \n",
|
||
"-Willie Randolph 381 20 875.0 A \n",
|
||
"-Wayne Tolleson 113 7 385.0 A \n",
|
||
"-Willie Upshaw 131 12 960.0 A \n",
|
||
"-Willie Wilson 4 3 1000.0 A \n",
|
||
"\n",
|
||
"[322 rows x 20 columns]"
|
||
],
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>AtBat</th>\n",
|
||
" <th>Hits</th>\n",
|
||
" <th>HmRun</th>\n",
|
||
" <th>Runs</th>\n",
|
||
" <th>RBI</th>\n",
|
||
" <th>Walks</th>\n",
|
||
" <th>Years</th>\n",
|
||
" <th>CAtBat</th>\n",
|
||
" <th>CHits</th>\n",
|
||
" <th>CHmRun</th>\n",
|
||
" <th>CRuns</th>\n",
|
||
" <th>CRBI</th>\n",
|
||
" <th>CWalks</th>\n",
|
||
" <th>League</th>\n",
|
||
" <th>Division</th>\n",
|
||
" <th>PutOuts</th>\n",
|
||
" <th>Assists</th>\n",
|
||
" <th>Errors</th>\n",
|
||
" <th>Salary</th>\n",
|
||
" <th>NewLeague</th>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>Name</th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" <th></th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>-Andy Allanson</th>\n",
|
||
" <td>293</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>29</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>293</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>29</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>446</td>\n",
|
||
" <td>33</td>\n",
|
||
" <td>20</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Alan Ashby</th>\n",
|
||
" <td>315</td>\n",
|
||
" <td>81</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>24</td>\n",
|
||
" <td>38</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>3449</td>\n",
|
||
" <td>835</td>\n",
|
||
" <td>69</td>\n",
|
||
" <td>321</td>\n",
|
||
" <td>414</td>\n",
|
||
" <td>375</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>632</td>\n",
|
||
" <td>43</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>475.0</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Alvin Davis</th>\n",
|
||
" <td>479</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>18</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>72</td>\n",
|
||
" <td>76</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1624</td>\n",
|
||
" <td>457</td>\n",
|
||
" <td>63</td>\n",
|
||
" <td>224</td>\n",
|
||
" <td>266</td>\n",
|
||
" <td>263</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>880</td>\n",
|
||
" <td>82</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>480.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Andre Dawson</th>\n",
|
||
" <td>496</td>\n",
|
||
" <td>141</td>\n",
|
||
" <td>20</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>78</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>5628</td>\n",
|
||
" <td>1575</td>\n",
|
||
" <td>225</td>\n",
|
||
" <td>828</td>\n",
|
||
" <td>838</td>\n",
|
||
" <td>354</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>200</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>500.0</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Andres Galarraga</th>\n",
|
||
" <td>321</td>\n",
|
||
" <td>87</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>42</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>396</td>\n",
|
||
" <td>101</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>46</td>\n",
|
||
" <td>33</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>805</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>91.5</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Willie McGee</th>\n",
|
||
" <td>497</td>\n",
|
||
" <td>127</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>2703</td>\n",
|
||
" <td>806</td>\n",
|
||
" <td>32</td>\n",
|
||
" <td>379</td>\n",
|
||
" <td>311</td>\n",
|
||
" <td>138</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>325</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>700.0</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Willie Randolph</th>\n",
|
||
" <td>492</td>\n",
|
||
" <td>136</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>76</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>94</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>5511</td>\n",
|
||
" <td>1511</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>897</td>\n",
|
||
" <td>451</td>\n",
|
||
" <td>875</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>313</td>\n",
|
||
" <td>381</td>\n",
|
||
" <td>20</td>\n",
|
||
" <td>875.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Wayne Tolleson</th>\n",
|
||
" <td>475</td>\n",
|
||
" <td>126</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>61</td>\n",
|
||
" <td>43</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>1700</td>\n",
|
||
" <td>433</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>217</td>\n",
|
||
" <td>93</td>\n",
|
||
" <td>146</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>113</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>385.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Willie Upshaw</th>\n",
|
||
" <td>573</td>\n",
|
||
" <td>144</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>78</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>3198</td>\n",
|
||
" <td>857</td>\n",
|
||
" <td>97</td>\n",
|
||
" <td>470</td>\n",
|
||
" <td>420</td>\n",
|
||
" <td>332</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>1314</td>\n",
|
||
" <td>131</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>960.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>-Willie Wilson</th>\n",
|
||
" <td>631</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>31</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>4908</td>\n",
|
||
" <td>1457</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>775</td>\n",
|
||
" <td>357</td>\n",
|
||
" <td>249</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>408</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1000.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>322 rows × 20 columns</p>\n",
|
||
"</div>"
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"execution_count": 2
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 1** : \n",
|
||
"\n",
|
||
"In `pd.read_csv(\"Hitters.csv\", index_col = \"Name\") `, what does `index_col = \"Name\"` mean ? Try without `index_col = \"Name\"`.\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-03-19T09:19:12.575121Z",
|
||
"start_time": "2025-03-19T09:19:12.561648Z"
|
||
}
|
||
},
|
||
"source": [
|
||
"hitters_bis = pd.read_csv(\"data/Hitters.csv\")\n",
|
||
"\n",
|
||
"hitters_bis"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
" Name AtBat Hits HmRun Runs RBI Walks Years CAtBat \\\n",
|
||
"0 -Andy Allanson 293 66 1 30 29 14 1 293 \n",
|
||
"1 -Alan Ashby 315 81 7 24 38 39 14 3449 \n",
|
||
"2 -Alvin Davis 479 130 18 66 72 76 3 1624 \n",
|
||
"3 -Andre Dawson 496 141 20 65 78 37 11 5628 \n",
|
||
"4 -Andres Galarraga 321 87 10 39 42 30 2 396 \n",
|
||
".. ... ... ... ... ... ... ... ... ... \n",
|
||
"317 -Willie McGee 497 127 7 65 48 37 5 2703 \n",
|
||
"318 -Willie Randolph 492 136 5 76 50 94 12 5511 \n",
|
||
"319 -Wayne Tolleson 475 126 3 61 43 52 6 1700 \n",
|
||
"320 -Willie Upshaw 573 144 9 85 60 78 8 3198 \n",
|
||
"321 -Willie Wilson 631 170 9 77 44 31 11 4908 \n",
|
||
"\n",
|
||
" CHits ... CRuns CRBI CWalks League Division PutOuts Assists \\\n",
|
||
"0 66 ... 30 29 14 A E 446 33 \n",
|
||
"1 835 ... 321 414 375 N W 632 43 \n",
|
||
"2 457 ... 224 266 263 A W 880 82 \n",
|
||
"3 1575 ... 828 838 354 N E 200 11 \n",
|
||
"4 101 ... 48 46 33 N E 805 40 \n",
|
||
".. ... ... ... ... ... ... ... ... ... \n",
|
||
"317 806 ... 379 311 138 N E 325 9 \n",
|
||
"318 1511 ... 897 451 875 A E 313 381 \n",
|
||
"319 433 ... 217 93 146 A W 37 113 \n",
|
||
"320 857 ... 470 420 332 A E 1314 131 \n",
|
||
"321 1457 ... 775 357 249 A W 408 4 \n",
|
||
"\n",
|
||
" Errors Salary NewLeague \n",
|
||
"0 20 NaN A \n",
|
||
"1 10 475.0 N \n",
|
||
"2 14 480.0 A \n",
|
||
"3 3 500.0 N \n",
|
||
"4 4 91.5 N \n",
|
||
".. ... ... ... \n",
|
||
"317 3 700.0 N \n",
|
||
"318 20 875.0 A \n",
|
||
"319 7 385.0 A \n",
|
||
"320 12 960.0 A \n",
|
||
"321 3 1000.0 A \n",
|
||
"\n",
|
||
"[322 rows x 21 columns]"
|
||
],
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>Name</th>\n",
|
||
" <th>AtBat</th>\n",
|
||
" <th>Hits</th>\n",
|
||
" <th>HmRun</th>\n",
|
||
" <th>Runs</th>\n",
|
||
" <th>RBI</th>\n",
|
||
" <th>Walks</th>\n",
|
||
" <th>Years</th>\n",
|
||
" <th>CAtBat</th>\n",
|
||
" <th>CHits</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>CRuns</th>\n",
|
||
" <th>CRBI</th>\n",
|
||
" <th>CWalks</th>\n",
|
||
" <th>League</th>\n",
|
||
" <th>Division</th>\n",
|
||
" <th>PutOuts</th>\n",
|
||
" <th>Assists</th>\n",
|
||
" <th>Errors</th>\n",
|
||
" <th>Salary</th>\n",
|
||
" <th>NewLeague</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>-Andy Allanson</td>\n",
|
||
" <td>293</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>29</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>293</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>29</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>446</td>\n",
|
||
" <td>33</td>\n",
|
||
" <td>20</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>-Alan Ashby</td>\n",
|
||
" <td>315</td>\n",
|
||
" <td>81</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>24</td>\n",
|
||
" <td>38</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>3449</td>\n",
|
||
" <td>835</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>321</td>\n",
|
||
" <td>414</td>\n",
|
||
" <td>375</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>632</td>\n",
|
||
" <td>43</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>475.0</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>-Alvin Davis</td>\n",
|
||
" <td>479</td>\n",
|
||
" <td>130</td>\n",
|
||
" <td>18</td>\n",
|
||
" <td>66</td>\n",
|
||
" <td>72</td>\n",
|
||
" <td>76</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1624</td>\n",
|
||
" <td>457</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>224</td>\n",
|
||
" <td>266</td>\n",
|
||
" <td>263</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>880</td>\n",
|
||
" <td>82</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>480.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>-Andre Dawson</td>\n",
|
||
" <td>496</td>\n",
|
||
" <td>141</td>\n",
|
||
" <td>20</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>78</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>5628</td>\n",
|
||
" <td>1575</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>828</td>\n",
|
||
" <td>838</td>\n",
|
||
" <td>354</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>200</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>500.0</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>-Andres Galarraga</td>\n",
|
||
" <td>321</td>\n",
|
||
" <td>87</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>39</td>\n",
|
||
" <td>42</td>\n",
|
||
" <td>30</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>396</td>\n",
|
||
" <td>101</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>46</td>\n",
|
||
" <td>33</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>805</td>\n",
|
||
" <td>40</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>91.5</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>...</th>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>...</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>317</th>\n",
|
||
" <td>-Willie McGee</td>\n",
|
||
" <td>497</td>\n",
|
||
" <td>127</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>48</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>2703</td>\n",
|
||
" <td>806</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>379</td>\n",
|
||
" <td>311</td>\n",
|
||
" <td>138</td>\n",
|
||
" <td>N</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>325</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>700.0</td>\n",
|
||
" <td>N</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>318</th>\n",
|
||
" <td>-Willie Randolph</td>\n",
|
||
" <td>492</td>\n",
|
||
" <td>136</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>76</td>\n",
|
||
" <td>50</td>\n",
|
||
" <td>94</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>5511</td>\n",
|
||
" <td>1511</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>897</td>\n",
|
||
" <td>451</td>\n",
|
||
" <td>875</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>313</td>\n",
|
||
" <td>381</td>\n",
|
||
" <td>20</td>\n",
|
||
" <td>875.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>319</th>\n",
|
||
" <td>-Wayne Tolleson</td>\n",
|
||
" <td>475</td>\n",
|
||
" <td>126</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>61</td>\n",
|
||
" <td>43</td>\n",
|
||
" <td>52</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>1700</td>\n",
|
||
" <td>433</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>217</td>\n",
|
||
" <td>93</td>\n",
|
||
" <td>146</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>37</td>\n",
|
||
" <td>113</td>\n",
|
||
" <td>7</td>\n",
|
||
" <td>385.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>320</th>\n",
|
||
" <td>-Willie Upshaw</td>\n",
|
||
" <td>573</td>\n",
|
||
" <td>144</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>85</td>\n",
|
||
" <td>60</td>\n",
|
||
" <td>78</td>\n",
|
||
" <td>8</td>\n",
|
||
" <td>3198</td>\n",
|
||
" <td>857</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>470</td>\n",
|
||
" <td>420</td>\n",
|
||
" <td>332</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>E</td>\n",
|
||
" <td>1314</td>\n",
|
||
" <td>131</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>960.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>321</th>\n",
|
||
" <td>-Willie Wilson</td>\n",
|
||
" <td>631</td>\n",
|
||
" <td>170</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>77</td>\n",
|
||
" <td>44</td>\n",
|
||
" <td>31</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>4908</td>\n",
|
||
" <td>1457</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>775</td>\n",
|
||
" <td>357</td>\n",
|
||
" <td>249</td>\n",
|
||
" <td>A</td>\n",
|
||
" <td>W</td>\n",
|
||
" <td>408</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>1000.0</td>\n",
|
||
" <td>A</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>322 rows × 21 columns</p>\n",
|
||
"</div>"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"execution_count": 4
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Answer for ex. 1 : \n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 2** :\n",
|
||
"\n",
|
||
"(1) Sample size of `Hitters` ? How many features in `Hitters` ?\n",
|
||
"\n",
|
||
"(2) What are the features in `Hitters` ? \n",
|
||
"\n",
|
||
"(3) Are all the features in $\\mathbb{R}$? \n",
|
||
"\n",
|
||
"(4) Are there many missing data ? `print` the number of missing data for each feature. \n",
|
||
"\n",
|
||
"- Hint : \n",
|
||
" - (2) et (3) Use `pandas.DataFrame.dtypes`. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html\n",
|
||
" - (4) Use `pandas.DataFrame.isnull`.\n",
|
||
" https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html\n",
|
||
" See the example below. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Hint for Question (4) :\n",
|
||
"ex = pd.DataFrame(dict(nom=['Alice', 'Nicolas', 'Jean'],\n",
|
||
" age=[19, np.NaN, np.NaN],\n",
|
||
" exam=[15, 14, np.NaN]))\n",
|
||
"\n",
|
||
"print(\"data : \\n\", ex)\n",
|
||
"print(\"First result : \\n\", ex.isnull())\n",
|
||
"print(\"Second result : \\n\", ex.isnull().sum())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {
|
||
"ExecuteTime": {
|
||
"end_time": "2025-03-19T09:22:51.269241Z",
|
||
"start_time": "2025-03-19T09:22:51.242330Z"
|
||
}
|
||
},
|
||
"source": [
|
||
"print(hitters.shape)\n",
|
||
"print(hitters.columns)\n",
|
||
"print(hitters.describe())\n",
|
||
"print(hitters.isnull().sum())"
|
||
],
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"(322, 20)\n",
|
||
"Index(['AtBat', 'Hits', 'HmRun', 'Runs', 'RBI', 'Walks', 'Years', 'CAtBat',\n",
|
||
" 'CHits', 'CHmRun', 'CRuns', 'CRBI', 'CWalks', 'League', 'Division',\n",
|
||
" 'PutOuts', 'Assists', 'Errors', 'Salary', 'NewLeague'],\n",
|
||
" dtype='object')\n",
|
||
" AtBat Hits HmRun Runs RBI Walks \\\n",
|
||
"count 322.000000 322.000000 322.000000 322.000000 322.000000 322.000000 \n",
|
||
"mean 380.928571 101.024845 10.770186 50.909938 48.027950 38.742236 \n",
|
||
"std 153.404981 46.454741 8.709037 26.024095 26.166895 21.639327 \n",
|
||
"min 16.000000 1.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"25% 255.250000 64.000000 4.000000 30.250000 28.000000 22.000000 \n",
|
||
"50% 379.500000 96.000000 8.000000 48.000000 44.000000 35.000000 \n",
|
||
"75% 512.000000 137.000000 16.000000 69.000000 64.750000 53.000000 \n",
|
||
"max 687.000000 238.000000 40.000000 130.000000 121.000000 105.000000 \n",
|
||
"\n",
|
||
" Years CAtBat CHits CHmRun CRuns \\\n",
|
||
"count 322.000000 322.00000 322.000000 322.000000 322.000000 \n",
|
||
"mean 7.444099 2648.68323 717.571429 69.490683 358.795031 \n",
|
||
"std 4.926087 2324.20587 654.472627 86.266061 334.105886 \n",
|
||
"min 1.000000 19.00000 4.000000 0.000000 1.000000 \n",
|
||
"25% 4.000000 816.75000 209.000000 14.000000 100.250000 \n",
|
||
"50% 6.000000 1928.00000 508.000000 37.500000 247.000000 \n",
|
||
"75% 11.000000 3924.25000 1059.250000 90.000000 526.250000 \n",
|
||
"max 24.000000 14053.00000 4256.000000 548.000000 2165.000000 \n",
|
||
"\n",
|
||
" CRBI CWalks PutOuts Assists Errors \\\n",
|
||
"count 322.000000 322.000000 322.000000 322.000000 322.000000 \n",
|
||
"mean 330.118012 260.239130 288.937888 106.913043 8.040373 \n",
|
||
"std 333.219617 267.058085 280.704614 136.854876 6.368359 \n",
|
||
"min 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||
"25% 88.750000 67.250000 109.250000 7.000000 3.000000 \n",
|
||
"50% 220.500000 170.500000 212.000000 39.500000 6.000000 \n",
|
||
"75% 426.250000 339.250000 325.000000 166.000000 11.000000 \n",
|
||
"max 1659.000000 1566.000000 1378.000000 492.000000 32.000000 \n",
|
||
"\n",
|
||
" Salary \n",
|
||
"count 263.000000 \n",
|
||
"mean 535.925882 \n",
|
||
"std 451.118681 \n",
|
||
"min 67.500000 \n",
|
||
"25% 190.000000 \n",
|
||
"50% 425.000000 \n",
|
||
"75% 750.000000 \n",
|
||
"max 2460.000000 \n",
|
||
"AtBat 0\n",
|
||
"Hits 0\n",
|
||
"HmRun 0\n",
|
||
"Runs 0\n",
|
||
"RBI 0\n",
|
||
"Walks 0\n",
|
||
"Years 0\n",
|
||
"CAtBat 0\n",
|
||
"CHits 0\n",
|
||
"CHmRun 0\n",
|
||
"CRuns 0\n",
|
||
"CRBI 0\n",
|
||
"CWalks 0\n",
|
||
"League 0\n",
|
||
"Division 0\n",
|
||
"PutOuts 0\n",
|
||
"Assists 0\n",
|
||
"Errors 0\n",
|
||
"Salary 59\n",
|
||
"NewLeague 0\n",
|
||
"dtype: int64\n"
|
||
]
|
||
}
|
||
],
|
||
"execution_count": 8
|
||
},
|
||
{
|
||
"metadata": {},
|
||
"cell_type": "markdown",
|
||
"source": "There are 20 features, all the features are in $\\mathbb{R}$. And there are 59 missing values in the column **Salary**"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"As `Salary`is the **target** we want to predict, if it is missing for a player, we will remove this player. \n",
|
||
"\n",
|
||
"To simplify here, **we only take numeric features** and ignore factors (i.e. categorical attributes) like `League`, `Division` and `NewLeague`. \n",
|
||
"\n",
|
||
"**Remark :**\n",
|
||
"To handle the categorical features, one can use one-hot encoding, see https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 3** :\n",
|
||
"\n",
|
||
"(1) Remove the players for whom `Salary` is missing. \n",
|
||
"\n",
|
||
"- Hint : use `pandas.DataFrame.dropna`. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html\n",
|
||
"\n",
|
||
"\n",
|
||
"(2) `Salary` is the **target**, denoted as `Y` in the next cell. For the features ( `X` in the next cell, we remove `League`, `Division` and `NewLeague`.) \n",
|
||
"\n",
|
||
"- Hint : (1) You can use `dtypes == 'int64'` to select the integer-valued features. Alternative : use `select_dtypes`with `include=number`\n",
|
||
" (2) Use `pandas.DataFrame.loc` to access the dataframe. \n",
|
||
"https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Ex 3\n",
|
||
"\n",
|
||
"# (1)\n",
|
||
"\n",
|
||
"\n",
|
||
"# (2)\n",
|
||
"# Complete with your code\n",
|
||
"\n",
|
||
"X =\n",
|
||
"Y =\n",
|
||
"\n",
|
||
"# check-point\n",
|
||
"print(Y.isnull().sum()) # should be 0\n",
|
||
"print(X.shape) # should be (322-59, 20-4)=(263,16)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 4**\n",
|
||
"Split the data into a train set and a test set. Use 30% of the data for the test set and a random state = 42. (In the end you can try other values than 42). "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#Answer for Exercise 4\n",
|
||
"from # complete\n",
|
||
"\n",
|
||
"Xtrain, Xtest, Ytrain, Ytest = # complete"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Standardization**\n",
|
||
"We will standardize the data before applying Lasso or Ridge as is usually advised. For this we use the transformer `StandardScaler`. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.preprocessing import StandardScaler\n",
|
||
"\n",
|
||
"scaler = StandardScaler()\n",
|
||
"XtrainScaled = scaler.fit_transform(Xtrain)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let us check that the columns are now standardized :"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"XtrainScaled.std(axis=0), XtrainScaled.mean(axis=0)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Note that the initial dataframe is now a numpy array :"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"type(XtrainScaled)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Let us turn Ytrain into a numpy array as well :"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "Ytrain = Ytrain.to_numpy()"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1. Ridge and Lasso <a class=\"anchor\" id=\"chapter1\"></a>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Ridge**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 5** : (1) Create a Ridge regression model. Specify `fit_intercept=True`, use the default value `alpha=1` and call it `ridge`. Check what alpha corresponds to. \n",
|
||
"\n",
|
||
"Hint : Use `sklearn.linear_model.Ridge`.\n",
|
||
"\n",
|
||
"https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html\n",
|
||
"\n",
|
||
"(2) Fit `ridge` on the data `(XtrainScaled,Ytrain)`.\n",
|
||
"\n",
|
||
"(3) Display the estimated coefficients (`intercept` included !). "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for ex. 4, complete with your code\n",
|
||
"coef_ridge = []\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 6** : (1) Create a Lasso regression model with `alpha=1` and `fit_intercept=True`, call it `lasso`. What does `alpha` correspond to ?\n",
|
||
"\n",
|
||
"Hint : use `sklearn.linear_model.Lasso` https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html\n",
|
||
"\n",
|
||
"(2) Fit `lasso` on the data `(XtrainScaled,Ytrain)`.\n",
|
||
"\n",
|
||
"(3) Display the estimated coefficients. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Ex. 6\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"\n",
|
||
"-------------------\n",
|
||
"\n",
|
||
"In the next exercise, we will display the variation of the Ridge estimator coefficients as a function of the regularization (`alpha`, *shrinkage parameter* ).\n",
|
||
"Let us generate 50 different values for alpha as follows :"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "alpha_s = np.logspace(-4, 6, 50)"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"*Info about logspace* (if necessary)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "np.logspace(0, 3, 4)"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "10 ** (np.linspace(0, 3, 4))"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "np.logspace(-2, 2, 5) == 10 ** (np.linspace(-2, 2, 5))"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(alpha_s)\n",
|
||
"\n",
|
||
"fig, (ax1, ax2) = plt.subplots(1, 2)\n",
|
||
"ax1.plot(alpha_s, \".\")\n",
|
||
"ax2.plot(np.log10(alpha_s), \".\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 7** : (1) For each `alpha` in `alpha_s`, fit a ridge model. \n",
|
||
"You can use `ridge.set_params`. See : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html. \n",
|
||
" \n",
|
||
"\n",
|
||
"(2) Plot the coefficients (without the intercept) w.r.t. `log10(alpha)`. What do you observe ? \n",
|
||
"\n",
|
||
"(3) (**Optional** ) Create a linear regression model, compare the coefficients given by the OLS estimator with those you get with a Ridge model with a small `alpha`. \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Exercise 7\n",
|
||
"\n",
|
||
"coefs_ridge = []\n",
|
||
"for alpha in alpha_s:\n",
|
||
"#complete\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Lasso**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We will do the same questions as in Exercise 7 but for Lasso. \n",
|
||
"\n",
|
||
"For that, we will use `sklearn.linear_model.lasso_path`. \n",
|
||
"See https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.lasso_path.html \n",
|
||
"\n",
|
||
"Note that the output of `lasso_path` is a `tuple`. \n",
|
||
"You can read this example : https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_lasso_lars_elasticnet_path.html#sphx-glr-auto-examples-linear-model-plot-lasso-lasso-lars-elasticnet-path-py\n",
|
||
"\n",
|
||
"(Also : note the shape of the array of coefficients output by `lasso_path`...)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 114,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"((16, 50), (50, 16))"
|
||
]
|
||
},
|
||
"execution_count": 114,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"from sklearn.linear_model import lasso_path\n",
|
||
"\n",
|
||
"alphas_lasso, coefs_lasso, _ = lasso_path(XtrainScaled, Ytrain, n_alphas=50)\n",
|
||
"coefs_lasso.shape, coefs_ridge.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"(in coefs_ridge the rows corresponded to the different values of alpha and the columns were the features, it is the opposite for the output of `lasso_path`).\n",
|
||
"\n",
|
||
"We used 50 values for alphas (those values are automatically determined by `lasso_path`). "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 115,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 640x480 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"coefs_lasso = coefs_lasso.T\n",
|
||
"plt.plot(np.log10(alphas_lasso), coefs_lasso)\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 116,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"[<matplotlib.lines.Line2D at 0x1693d5520>]"
|
||
]
|
||
},
|
||
"execution_count": 116,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 640x480 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"#the values of alphas chosen by defaults are also on a logarithmic scale\n",
|
||
"plt.plot(np.log10(alphas_lasso), '.')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 117,
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"Text(0, 0.5, 'Lasso coefficients')"
|
||
]
|
||
},
|
||
"execution_count": 117,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
},
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 800x600 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"fig, ax = plt.subplots(figsize=(8, 6))\n",
|
||
"ax.plot(np.log10(alphas_lasso), coefs_lasso)\n",
|
||
"ax.set_xlabel('log10(alpha)')\n",
|
||
"ax.set_ylabel('Lasso coefficients')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"-------------------\n",
|
||
"\n",
|
||
"Now let us show that **Lasso will help us select the features**, i.e. some coefficients will be set to 0 when `alpha` increases. \n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 8** : Compute the number `nb` of zeros among the **Lasso** coefficients for each `alpha`. (Hint in the next cell)\n",
|
||
"\n",
|
||
"Plot `nb` w.r.t. `log10(alphas)`.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Hint for Ex. 8\n",
|
||
"\n",
|
||
"ind = np.array([[0, 1, 1], [0, 1, 1], [0, 1, 0]])\n",
|
||
"\n",
|
||
"print(\"1.\\n\", ind)\n",
|
||
"print(\"2.\\n\", ind == 0)\n",
|
||
"print(\"3. Le nombre de 0 de chaque colonne est :\\n \", (ind == 0).sum(axis=0))\n",
|
||
"print(\"4. Le nombre de 0 de chaque ligne est : \\n\", (ind == 0).sum(axis=1))\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Ex. 8\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 9** : Compute the number `nb` of 0 among the **Ridge** coefficients for each `alpha`. \n",
|
||
"\n",
|
||
"Plot `nb` w.r.t. `log10(alphas)`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Ex. 9\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {
|
||
"tags": []
|
||
},
|
||
"source": [
|
||
"----------------\n",
|
||
"\n",
|
||
"\n",
|
||
"## 2. Cross Validation for the Lasso and Ridge hyperparameter $\\alpha$ <a class=\"anchor\" id=\"chapter2\"></a>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"In the next exercises, we will select an optimal `alpha` among the values in `alpha_s` by cross-validation.\n",
|
||
"\n",
|
||
"We will use `sklearn.linear_model.RidgeCV` and `sklearn.linear_model.LassoCV`.\n",
|
||
"\n",
|
||
"Reference :\n",
|
||
"\n",
|
||
"1. `sklearn.linear_model.RidgeCV` : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html\n",
|
||
"\n",
|
||
"2. `sklearn.linear_model.LassoCV` : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoCV.html\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.linear_model import RidgeCV, LassoCV\n",
|
||
"\n",
|
||
"alpha_s = np.logspace(-4, 6, 50)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 10** : (1) Create a model for `RidgeCV`, call it `ridgeCV`, with parameters `alphas=alpha_s` and `store_cv_values=True`. Fit `ridgeCV` on `(XtrainScaled,Ytrain)`. \n",
|
||
"\n",
|
||
"Remark : to be completely rigorous, we should have created a pipeline containing the scaler to avoid what we call \"data leakage\" in the CV calculation. We will ignore this detail here to simplify. \n",
|
||
"\n",
|
||
"(2) We have 50 different values for `alpha`. For each `alpha`, how many `score`s do we get for the cross-validation ? Read the info concerning the parameter `cv_values_` of `RidgeCV` in https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeCV.html \n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Ex. 10\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 11** : For each alpha, give the mean of the scores, and call it `alpha_score`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Hint for Exercise 11 : \n",
|
||
"\n",
|
||
"ind2 = np.array([[1, 2, 3], [4, 5, 6]])\n",
|
||
"\n",
|
||
"print(\"1\\n\", ind2.shape)\n",
|
||
"print(\"2\\n\", ind2.mean(axis=0))\n",
|
||
"print(\"3\\n\", ind2.mean(axis=1))\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Ex. 11\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Exercise 12** : Plot `alpha_score` w.r.t. `log10(alpha_s)`. Which `alpha` are we going to choose ? Display the coefficients for the chosen `alpha`.\n",
|
||
"\n",
|
||
"Hint : read the attributes `alpha_` and `coef_` of ridgecv."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Exercise 12\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"We will do the same for **Lasso** using `LassoCV`. \n",
|
||
"\n",
|
||
"**Remark** : By default, `LassoCV` uses 5-fold cross-validation, which is different from *Leave-One-Out Cross-Validation* used in `RidgeCV`. \n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.linear_model import LassoCV\n",
|
||
"\n",
|
||
"lassoCV = LassoCV(n_alphas=50)\n",
|
||
"lassoCV.fit(XtrainScaled, Ytrain)\n",
|
||
"lassoCV.mse_path_.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"\"5\" corresponds to the 5-folds and \"50\" corresponds to the 50 values of alpha. Here we let LassoCV choose those 50 values. \n",
|
||
"\n",
|
||
"Pay attention to the fact that, again, the rows and the columns are reversed compared to RidgeCV output : that is why we use the parameter \"axis=1\" in the next cell. \n",
|
||
"\n",
|
||
"NB : the default value for cv is 5 here (using leave-one-out for Lasso would be too costly (no fast formula, contrary to Ridge)). "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"alpha_score = lassoCV.mse_path_.mean(axis=1)\n",
|
||
"alpha_score.shape"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"alphas = lassoCV.alphas_\n",
|
||
"plt.plot(np.log10(alphas), alpha_score)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(\"best alpha for lasso : \", lassoCV.alpha_)\n",
|
||
"print(\"lasso coef for the best alpha : \", lassoCV.coef_)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## OPTIONAL : Comparing estimators"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"Remark about the standardization : the standardization needs to be done in three steps : \n",
|
||
"- first we use `fit` on the train set, which means the standardizer computes the mean and the std for each feature in your train set.\n",
|
||
"- Then you transform the data : it means, for each column, you remove the mean computed before (i.e. the means computed on the **train** set) and divide by the std computed before (i.e. the std computed on the **train** set).\n",
|
||
"- But pay attention to the fact that you must transform the train set **and** the test set. But the fit has to be done on the train set only !\n",
|
||
"\n",
|
||
"Previously, we used the shorter syntax `fit_transform` that allows you to do the `transform` calculation and then to `fit`. We could do that because we only used the train set. But for the test set, we must only use `transform`. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": "XtestScaled = scaler.transform(Xtest)"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Prediction of Salary on the test set using the OLS estimator**"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.linear_model import LinearRegression\n",
|
||
"\n",
|
||
"LinReg = LinearRegression()\n",
|
||
"LinReg.fit(Xtrain,\n",
|
||
" Ytrain) # no need to scale for OLS if you just want to predict (unless the solver works best with scaled data)\n",
|
||
"# the predictions should not be different with or without standardization (could differ only owing to numerical problems)\n",
|
||
"hatY_LinReg = LinReg.predict(Xtest)\n",
|
||
"\n",
|
||
"fig, ax = plt.subplots()\n",
|
||
"ax.scatter(Ytest, hatY_LinReg, s=5)\n",
|
||
"ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls='--', c='gray')\n",
|
||
"ax.set_xlabel('Ytest')\n",
|
||
"ax.set_ylabel('hatY')\n",
|
||
"ax.set_title('Predicted vs true salaries for OLS estimator')\n",
|
||
"ax.axis('square')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional Exercise 1**\n",
|
||
"Do the same with the Ridge estimator (with alpha chosen by CV). "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Optional Exercise 1\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional Exercise 2** Do the same with Lasso with alpha chosen by CV. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Answer for Optional Exercise 2\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Calculation of the best alpha with a \"BIC criterion\".**\n",
|
||
"Here is an alternative method to choose the \"best\" alpha : "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"from sklearn.linear_model import LassoLarsIC\n",
|
||
"\n",
|
||
"lassoBIC = LassoLarsIC(criterion='bic')\n",
|
||
"lassoBIC.fit(Xtrainscaled, Ytrain)\n",
|
||
"print(\"best alpha chosen by BIC criterion :\", lassoBIC.alpha_)\n",
|
||
"print(\"best alpha chosen by CV :\", lassoCV.alpha_)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Comparison between predicted salary and true salary for \"LassoBIC\" estimator :** "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"lasso.set_params(alpha=alpha_BIC)\n",
|
||
"lasso.fit(XtrainScaled, Ytrain)\n",
|
||
"\n",
|
||
"hatY_BIC = lasso.predict(Xtest)\n",
|
||
"\n",
|
||
"fig, ax = plt.subplots()\n",
|
||
"ax.scatter(Ytest, hatY_BIC, s=5)\n",
|
||
"ax.plot([0, 1], [0, 1], transform=ax.transAxes, ls='--', c='gray')\n",
|
||
"ax.set_xlabel('Ytest')\n",
|
||
"ax.set_ylabel('hatY')\n",
|
||
"ax.set_title('Predicted vs true salaries for LassoBIC estimator')\n",
|
||
"ax.axis('square')"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional Exercise 3** Compute the MSE for these four different estimators (LassoCV, LassoBIC, OLS, RidgeCV)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# answer for Optional Exercise 3\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional Exercise 4** Display the boxplot of the absolute errors for each estimator. "
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# answer for Optional Exercise 4\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Optional Exercise 5**\n",
|
||
"Based on the above information, which estimator would you recommend?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# answer fot Optional Exercise 5"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.9.21"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 4
|
||
}
|