diff --git a/M1/Stats learning/TP0_Intro_Jupyter_Python.ipynb b/M1/Stats learning/TP0_Intro_Jupyter_Python.ipynb
new file mode 100644
index 0000000..377d158
--- /dev/null
+++ b/M1/Stats learning/TP0_Intro_Jupyter_Python.ipynb
@@ -0,0 +1,1356 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# TP0 Introduction\n",
+ "\n",
+ "### Table of Contents\n",
+ "\n",
+ "* [1. Introduction to Jupyter notebooks](#chapter1)\n",
+ " * [1.1 Markdown cell](#section_1_1)\n",
+ " * [1.2 Python cell](sSection_1_2)\n",
+ " \n",
+ "* [2. Introduction to Python](#chapter2)\n",
+ " * [2.1 Variables and types, list and np.array](#section_2_1)\n",
+ " * [2.1.1. Variable](#section_2_1_1)\n",
+ " * [2.1.2. List](#section_2_1_2)\n",
+ " * [2.1.3. Type `array`](#section_2_1_3)\n",
+ " * [2.2 basic operations](#section_2_2)\n",
+ " * [2.2.1 Operations on figures](#section_2_2_1)\n",
+ " * [2.2.2 Operations on a matrix](#section_2_2_2)\n",
+ " * [2.3 `for`-loop and `if...else`](#section_2_3)\n",
+ " * [2.4 Define your function](#section_2_4)\n",
+ " * [2.5 Graphs](#section_2_5)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 1. Introduction to Jupyter notebook"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To launch Jupyter, you need to run the command in a terminal:\n",
+ "\n",
+ "`jupyter notebook`\n",
+ "\n",
+ "This will automatically open a web browser where you can work. The main tab represents the file tree starting from the directory where the command was executed.\n",
+ "\n",
+ "Notebooks are composed of cells containing code (in Python) or text (plain or formatted with Markdown markup). These notebooks allow for interactive calculations in Python and are an excellent tool for teaching.\n",
+ "\n",
+ "You can edit a cell by double-clicking on it, and evaluate it by pressing **Ctrl+Enter** (you will also often use **Shift+Enter** to evaluate and move to the next cell). The buttons in the toolbar will be very useful; hover over them to display a tooltip if their icon is not clear enough.\n",
+ "\n",
+ "Don't forget to save your work from time to time, even though Jupyter performs regular automatic saves."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can click on *Help -> User Interface Tour* or *Help -> Keyboard Shortcuts* to get explanations[5]."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 1** : Delete the following cell:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 2** : Add a Python cell below"
+ ]
+ },
+ {
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:11:15.477711Z",
+ "start_time": "2025-01-22T09:11:15.474249Z"
+ }
+ },
+ "cell_type": "code",
+ "source": "print(\"Hello World\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Hello World\n"
+ ]
+ }
+ ],
+ "execution_count": 1
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1.1 Markdown cell \n",
+ "*Markdown* is a text format that allows minimal formatting. It enables you to quickly:\n",
+ "- create lists.\n",
+ "- make *italics*, **bold**, ~~strikethrough~~, etc.\n",
+ "- create [links](https://fr.wikipedia.org/wiki/Markdown)\n",
+ "- write mathematical formulas using $$\\LaTeX$$ (see [here](http://www.tuteurs.ens.fr/logiciels/latex/maths.html#s2) for a quick introduction to $$\\LaTeX$$).\n",
+ "\n",
+ "Citations:\n",
+ "[1] https://fr.wikipedia.org/wiki/Markdown"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 3** : change the first word to *italics* and the second word to **bold** "
+ ]
+ },
+ {
+ "metadata": {},
+ "cell_type": "markdown",
+ "source": [
+ "*first word*\n",
+ "\n",
+ "**second word**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 4** : Type a simple formula in $$\\LaTeX$$ (for example : integration, expectation of a random variable)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": "$\\mathbb{E}[X]=\\frac{1}{\\lambda}$\n"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1.2 Python cell\n",
+ "The following cell is a python cell. You can run it with maj+ Enter (or the triangular icon : \"run the selected cell and advance\" or go to the run tab)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 5** : Run the next cell. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:13:39.085033Z",
+ "start_time": "2025-01-22T09:13:39.081416Z"
+ }
+ },
+ "source": [
+ "# This is a Python comment. \n",
+ "# Comment your code :\n",
+ "# Code should be readable and understandable by other users\n",
+ "#\n",
+ "\n",
+ "print(\"hello world !\")"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "hello world !\n"
+ ]
+ }
+ ],
+ "execution_count": 2
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 6** : Next cell is formatted in markdown. Switch to a Python cell. "
+ ]
+ },
+ {
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:14:10.188568Z",
+ "start_time": "2025-01-22T09:14:10.185558Z"
+ }
+ },
+ "cell_type": "code",
+ "source": "print(\"hello world !\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "hello world !\n"
+ ]
+ }
+ ],
+ "execution_count": 3
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 7** : Turn the next cell into a markdown cell. "
+ ]
+ },
+ {
+ "metadata": {},
+ "cell_type": "markdown",
+ "source": "Oops! This is a Markdown cell"
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 2. Introduction to Python"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Possible resources (mostly for numpy and matplotlib) : \n",
+ "- https://numpy.org/doc/2.2/user/absolute_beginners.html\n",
+ "- https://file.cz123.top/5textbook/CODING/Numerical_Python.pdf\n",
+ "- https://cs231n.github.io/python-numpy-tutorial/\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Import a library. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:14:38.458151Z",
+ "start_time": "2025-01-22T09:14:37.524456Z"
+ }
+ },
+ "source": [
+ "# Import the library numpy\n",
+ "import numpy\n",
+ "\n",
+ "# Import the library numpy and give it a diminutive \n",
+ "import numpy as np\n",
+ "\n",
+ "# Import part of a library\n",
+ "from scipy.stats import norm, multivariate_normal"
+ ],
+ "outputs": [],
+ "execution_count": 5
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.1 Variables and types, list and np.array"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.1.1. Variable "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:14:40.801476Z",
+ "start_time": "2025-01-22T09:14:40.798177Z"
+ }
+ },
+ "source": [
+ "#Integer\n",
+ "a = 4\n",
+ "print(\"a = \", a, \"\\t\\t\\t\\t its type is\", type(a))\n",
+ "\n",
+ "# Float\n",
+ "a = 3.5\n",
+ "print(\"a = \", a, \"\\t\\t\\t its type is\", type(a))\n",
+ "a = 1e7 # = 10^7, --> float\n",
+ "print(\"a = \", a, \"\\t\\t its type is\", type(a))\n",
+ "a = np.pi\n",
+ "print(\"a = \", a, \"\\t\\t its type is\", type(a))\n",
+ "\n",
+ "# Boolean\n",
+ "a = True\n",
+ "print(\"a = \", a, \"\\t\\t\\t its type is\", type(a))\n",
+ "\n",
+ "# String\n",
+ "a = \"Hello World!\"\n",
+ "print(\"a = \", a, \"\\t\\t its type is\", type(a))\n",
+ "\n",
+ "# Lists (mutable)\n",
+ "a = [1, 2, 3]\n",
+ "print(\"a = \", a, \"\\t\\t\\t its type is\", type(a))\n",
+ "\n",
+ "# Tuples (immutable)\n",
+ "a = (1.5, [1, 2], \"coucou\")\n",
+ "print(\"a = \", a, \"\\t its type is\", type(a))"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a = 4 \t\t\t\t its type is \n",
+ "a = 3.5 \t\t\t its type is \n",
+ "a = 10000000.0 \t\t its type is \n",
+ "a = 3.141592653589793 \t\t its type is \n",
+ "a = True \t\t\t its type is \n",
+ "a = Hello World! \t\t its type is \n",
+ "a = [1, 2, 3] \t\t\t its type is \n",
+ "a = (1.5, [1, 2], 'coucou') \t its type is \n"
+ ]
+ }
+ ],
+ "execution_count": 6
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.1.2. List \n",
+ "Run the following cells : "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:15:47.717601Z",
+ "start_time": "2025-01-22T09:15:47.714511Z"
+ }
+ },
+ "source": [
+ "#Empty list\n",
+ "L = []\n",
+ "print(\"The empty list : L =\", L, \"\\n\")"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The empty list : L = [] \n",
+ "\n"
+ ]
+ }
+ ],
+ "execution_count": 7
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:15:49.504901Z",
+ "start_time": "2025-01-22T09:15:49.502Z"
+ }
+ },
+ "source": [
+ "L = [1, 2, 3, 4, 5]\n",
+ "print(\"L =\", L)\n",
+ "\n",
+ "#Indices in Python start at zero !\n",
+ "print(\"the first element is\", L[0])\n",
+ "\n",
+ "#Another way to to access the last element of a list :\n",
+ "print(\"The last element is \", L[-1])\n",
+ "\n",
+ "# Access to a sublist \n",
+ "print(\"L contains the sublist\", L[1:4])"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "L = [1, 2, 3, 4, 5]\n",
+ "the first element is 1\n",
+ "The last element is 5\n",
+ "L contains the sublist [2, 3, 4]\n"
+ ]
+ }
+ ],
+ "execution_count": 8
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:15:51.173918Z",
+ "start_time": "2025-01-22T09:15:51.170290Z"
+ }
+ },
+ "source": [
+ "L = [1, 2, 3, 4, 5]\n",
+ "print(\"L =\", L)\n",
+ "\n",
+ "# my_liste.append()\n",
+ "L.append(100)\n",
+ "print(\"L =\", L)\n"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "L = [1, 2, 3, 4, 5]\n",
+ "L = [1, 2, 3, 4, 5, 100]\n"
+ ]
+ }
+ ],
+ "execution_count": 9
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.1.3. Type `array` : \n",
+ "In addition to these basic types, we work with the numpy library, which introduces the array type. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 8** : import numpy and use `np.array` to define a matrix $a$\n",
+ "\n",
+ "$a=\\begin{pmatrix}\n",
+ "1 & 2 \\\\\n",
+ "3 & 4 \\\\\n",
+ "\\end{pmatrix}$\n",
+ "\n",
+ "Display the matrix and its type."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:16:18.283762Z",
+ "start_time": "2025-01-22T09:16:18.280869Z"
+ }
+ },
+ "source": [
+ "# Your answer for Q8:\n",
+ "a = np.array([[1, 2], [3, 4]])\n",
+ "print(a)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[1 2]\n",
+ " [3 4]]\n"
+ ]
+ }
+ ],
+ "execution_count": 11
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.2 Basic operations \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.2.1 Operations on numbers\n",
+ "\n",
+ "Operators : `+,-,/,*`, and many other ones using the packages `math` and `numpy`.\n",
+ "\n",
+ "Package `math`: https://docs.python.org/3/library/math.html"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 9** : display $e$ and $log(e)$. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:17:01.140658Z",
+ "start_time": "2025-01-22T09:17:01.138212Z"
+ }
+ },
+ "source": [
+ "import math\n",
+ "\n",
+ "print(\"e = \", math.exp(1))\n",
+ "print(\"log(e) = \", math.log(math.exp(1)))\n",
+ "\n",
+ "print(\"e = \", np.exp(1))\n",
+ "print(\"log(e) = \", np.log(np.exp(1)))"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "e = 2.718281828459045\n",
+ "log(e) = 1.0\n",
+ "e = 2.718281828459045\n",
+ "log(e) = 1.0\n"
+ ]
+ }
+ ],
+ "execution_count": 14
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 2.2.2 Operations on an array"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:17:03.680124Z",
+ "start_time": "2025-01-22T09:17:03.675137Z"
+ }
+ },
+ "source": [
+ "# Creation of a 2 x 3 array filled with zeroes:\n",
+ "A = np.zeros([2, 3])\n",
+ "print(\"A = \\n\", A)\n",
+ "\n",
+ "# Creation of a 2 x 3 array filled with ones\n",
+ "B = np.ones([2, 3])\n",
+ "print(\"\\nB = \\n\", B)\n",
+ "\n",
+ "# Identity matrix\n",
+ "C = np.eye(3) # alternative : np.identity(3)\n",
+ "print(\"\\n C = \\n\", A)\n",
+ "\n",
+ "# arange \n",
+ "print('np.arange(3) =', np.arange(3))\n",
+ "print('np.arange(2,5)=', np.arange(2, 5))\n"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "A = \n",
+ " [[0. 0. 0.]\n",
+ " [0. 0. 0.]]\n",
+ "\n",
+ "B = \n",
+ " [[1. 1. 1.]\n",
+ " [1. 1. 1.]]\n",
+ "\n",
+ " C = \n",
+ " [[0. 0. 0.]\n",
+ " [0. 0. 0.]]\n",
+ "np.arange(3) = [0 1 2]\n",
+ "np.arange(2,5)= [2 3 4]\n"
+ ]
+ }
+ ],
+ "execution_count": 15
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 10**\n",
+ "Create an array that contains 10 evenly spaced numbers over [3,9].\n",
+ "\n",
+ "Click here for a hint\n",
+ "Use linspace https://numpy.org/doc/2.1/reference/generated/numpy.linspace.html\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:17:27.970224Z",
+ "start_time": "2025-01-22T09:17:27.966232Z"
+ }
+ },
+ "source": [
+ "space = np.linspace(3, 9, 10)\n",
+ "print(space)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[3. 3.66666667 4.33333333 5. 5.66666667 6.33333333\n",
+ " 7. 7.66666667 8.33333333 9. ]\n"
+ ]
+ }
+ ],
+ "execution_count": 17
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Matrix multiplication \n",
+ "**Beware**, `a*b` , where `a` and `b` are 2-dimensional arrays, corresponds to elementwise multiplication whereas matrix multiplication can be performed with `np.dot` or `@`.\n",
+ "The official documentation says `@`is to be preferred. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:17:44.926867Z",
+ "start_time": "2025-01-22T09:17:44.922056Z"
+ }
+ },
+ "source": [
+ "a = np.array([[1, 2], [3, 4]])\n",
+ "b = np.eye(2)\n",
+ "\n",
+ "print(\"a=\\n\", a)\n",
+ "print(\"b=\\n\", b)\n",
+ "\n",
+ "print(\" a*b = \\n \", a * b)\n",
+ "print(\" np.dot(a,b) = \\n\", np.dot(a, b))\n",
+ "print(\" a@b = \\n\", a @ b) # equivalent to np.matmul(a,b)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a=\n",
+ " [[1 2]\n",
+ " [3 4]]\n",
+ "b=\n",
+ " [[1. 0.]\n",
+ " [0. 1.]]\n",
+ " a*b = \n",
+ " [[1. 0.]\n",
+ " [0. 4.]]\n",
+ " np.dot(a,b) = \n",
+ " [[1. 2.]\n",
+ " [3. 4.]]\n",
+ " a@b = \n",
+ " [[1. 2.]\n",
+ " [3. 4.]]\n"
+ ]
+ }
+ ],
+ "execution_count": 18
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Inner product for one dimensional arrays**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:17:48.202259Z",
+ "start_time": "2025-01-22T09:17:48.198663Z"
+ }
+ },
+ "source": [
+ "a = np.arange(3)\n",
+ "b = np.arange(4, 7)\n",
+ "c = np.dot(a, b)\n",
+ "d = np.inner(a, b)\n",
+ "print('a = ', a)\n",
+ "print('b = ', b)\n",
+ "print('np.dot(a,b)=', c)\n",
+ "print('np.inner(a,b)=', d) # alternative to np.dot for the inner product\n",
+ "# of one-dimensional arrays"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a = [0 1 2]\n",
+ "b = [4 5 6]\n",
+ "np.dot(a,b)= 17\n",
+ "np.inner(a,b)= 17\n"
+ ]
+ }
+ ],
+ "execution_count": 19
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Reshaping**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:17:50.913721Z",
+ "start_time": "2025-01-22T09:17:50.910196Z"
+ }
+ },
+ "source": [
+ "#Turning a one-dimensional array with 6 elements into a \n",
+ "#2x3 dimensional array \n",
+ "a = np.arange(6)\n",
+ "b = a.reshape(2, 3)\n",
+ "print('a = ', a)\n",
+ "print('b =\\n', b)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a = [0 1 2 3 4 5]\n",
+ "b =\n",
+ " [[0 1 2]\n",
+ " [3 4 5]]\n"
+ ]
+ }
+ ],
+ "execution_count": 20
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 11**\n",
+ "Turn the previous 2X3 dimensional array `b` into a 3x2 dimensional array using `reshape`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:18:51.612181Z",
+ "start_time": "2025-01-22T09:18:51.609302Z"
+ }
+ },
+ "source": [
+ "b2 = b.copy().reshape(3, 2)\n",
+ "print(b2)\n",
+ "print(b2.shape)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[0 1]\n",
+ " [2 3]\n",
+ " [4 5]]\n",
+ "(3, 2)\n"
+ ]
+ }
+ ],
+ "execution_count": 27
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 12** Does the new array corresponds to the transpose of `b`? In case it does not, display the transpose of `b`. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:19:31.699357Z",
+ "start_time": "2025-01-22T09:19:31.697031Z"
+ }
+ },
+ "source": [
+ "print(b.transpose())\n",
+ "print(b.transpose() == b2)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[0 3]\n",
+ " [1 4]\n",
+ " [2 5]]\n",
+ "[[ True False]\n",
+ " [False False]\n",
+ " [False True]]\n"
+ ]
+ }
+ ],
+ "execution_count": 30
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.3 `for`-loop and `if...else`\n",
+ "\n",
+ "Python code is structured by *indentation*.\n",
+ "\n",
+ "**Beware** : Python indices start at **ZERO** !"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 13** : Write a `for` loop that displays every integer `i` from 2 to 12. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:19:43.363208Z",
+ "start_time": "2025-01-22T09:19:43.359985Z"
+ }
+ },
+ "source": [
+ "for i in range(2, 13):\n",
+ " print(i)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2\n",
+ "3\n",
+ "4\n",
+ "5\n",
+ "6\n",
+ "7\n",
+ "8\n",
+ "9\n",
+ "10\n",
+ "11\n",
+ "12\n"
+ ]
+ }
+ ],
+ "execution_count": 31
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 14** : Write a program to check if a number is divisible by both 3 and 13 or not, using if-else. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:20:24.414268Z",
+ "start_time": "2025-01-22T09:20:24.410942Z"
+ }
+ },
+ "source": [
+ "def divisible_by_3_and13(n):\n",
+ " if n % 3 == 0 and n % 13 == 0:\n",
+ " print(n, \"is divisible by 3 and 13\")\n",
+ " else:\n",
+ " print(n, \"is not divisible by 3 and 13\")\n",
+ "\n",
+ "\n",
+ "print(divisible_by_3_and13(39))\n",
+ "print(divisible_by_3_and13(26))"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "39 is divisible by 3 and 13\n",
+ "None\n",
+ "26 is not divisible by 3 and 13\n",
+ "None\n"
+ ]
+ }
+ ],
+ "execution_count": 33
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.4 Functions "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To define a function, use the keyword `def`. To let a function return a value, use the return statement. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:20:41.546944Z",
+ "start_time": "2025-01-22T09:20:41.542979Z"
+ }
+ },
+ "source": [
+ "# Example\n",
+ "def my_function(x):\n",
+ " return x + 3\n",
+ "\n",
+ "\n",
+ "print(my_function(2))"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "5\n"
+ ]
+ }
+ ],
+ "execution_count": 34
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 15** : Define a function named *square_cube*. The input is a integer, the output is its square and its cube. Diplay the result of carre_cube(2)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:20:59.887492Z",
+ "start_time": "2025-01-22T09:20:59.884773Z"
+ }
+ },
+ "source": [
+ "def square_cube(x):\n",
+ " return x ** 2, x ** 3\n",
+ "\n",
+ "\n",
+ "print(square_cube(2))"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(4, 8)\n"
+ ]
+ }
+ ],
+ "execution_count": 35
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.5 Graphs "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Matplotlib tutorial :\n",
+ "https://matplotlib.org/stable/tutorials/introductory/pyplot.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:21:02.381191Z",
+ "start_time": "2025-01-22T09:21:02.126204Z"
+ }
+ },
+ "source": [
+ "import matplotlib.pyplot as plt"
+ ],
+ "outputs": [],
+ "execution_count": 36
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The basic command is `plt.plot(x,y)` where $x$ and $y$ are lists/arrays of the same size. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 16** : Plot the graph of the standard normal distribution density on [-5,5]. You can use `scipy.stats.norm`. \n",
+ "https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:22:37.451301Z",
+ "start_time": "2025-01-22T09:22:37.386127Z"
+ }
+ },
+ "source": [
+ "xx = np.linspace(-5, 5, 100)\n",
+ "yy = norm.pdf(xx)\n",
+ "plt.plot(xx, yy)\n",
+ "plt.ylabel('Density')\n",
+ "plt.xlabel('x')\n",
+ "plt.title('Standard normal distribution')\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": ""
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "execution_count": 39
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Question 17** : Plot the graph of the density of the two-dimensional standard normal distribution $\\mathcal{N}(0, I_2)$ on [-3,3]$\\times$[-3,3].\n",
+ "\n",
+ "We will use `plot_surface`. You can find an example here https://matplotlib.org/stable/gallery/mplot3d/surface3d.html\n",
+ "\n",
+ "For the PDF (probability density function), you can use `scipy.stats.multivariate_normal`. \n",
+ "https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multivariate_normal.html"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Click here for a hint\n",
+ "Use `meshgrid`. See below for more info about `meshgrid`. \n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:31:03.547534Z",
+ "start_time": "2025-01-22T09:31:03.467548Z"
+ }
+ },
+ "source": [
+ "from scipy.stats import multivariate_normal\n",
+ "\n",
+ "fig, ax = plt.subplots(subplot_kw={\"projection\": \"3d\"})\n",
+ "\n",
+ "X = np.arange(-3, 3, 0.25)\n",
+ "Y = np.arange(-3, 3, 0.25)\n",
+ "X, Y = np.meshgrid(X, Y)\n",
+ "R = multivariate_normal([0, 0], np.eye(2))\n",
+ "\n",
+ "surf = ax.plot_surface(X, Y, R.pdf(np.dstack((X, Y))), cmap='coolwarm', linewidth=0, antialiased=False)\n",
+ "\n",
+ "fig.colorbar(surf, shrink=0.5, aspect=5)\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": ""
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "execution_count": 56
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you feel you need to practice more, there are lots of resources online, for instance : https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.ipynb (exercises with solutions !)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Details about `meshgrid`: "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:31:27.147136Z",
+ "start_time": "2025-01-22T09:31:27.142313Z"
+ }
+ },
+ "source": [
+ "x = np.arange(3)\n",
+ "y = np.arange(3)\n",
+ "\n",
+ "\n",
+ "def f(x, y):\n",
+ " return x ** 2 + y ** 2\n",
+ "\n",
+ "\n",
+ "f(x, y)\n"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([0, 2, 8])"
+ ]
+ },
+ "execution_count": 57,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 57
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The output is [f(0,0),f(1,1),f(2,2)]\n",
+ "Now if the desired output is [f(0,0),f(0,1),f(0,2),f(1,0),f(1,1), etc. jusqu'Ã f(2,2)], \n",
+ "i.e. the f(i,j)'s for every (i,j) where i=0,1,2 and j=0,1,2, then you should use \n",
+ " `meshgrid`(or `mgrid`) instead. The output is two arrays :"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:31:53.126413Z",
+ "start_time": "2025-01-22T09:31:53.120541Z"
+ }
+ },
+ "source": [
+ "X, Y = np.meshgrid(x, y)\n",
+ "X, Y"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([[0, 1, 2],\n",
+ " [0, 1, 2],\n",
+ " [0, 1, 2]]),\n",
+ " array([[0, 0, 0],\n",
+ " [1, 1, 1],\n",
+ " [2, 2, 2]]))"
+ ]
+ },
+ "execution_count": 58,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 58
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:32:24.722840Z",
+ "start_time": "2025-01-22T09:32:24.717892Z"
+ }
+ },
+ "source": "X ** 2 + Y ** 2 # gives the desired output",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[0, 1, 4],\n",
+ " [1, 2, 5],\n",
+ " [4, 5, 8]])"
+ ]
+ },
+ "execution_count": 59,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 59
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:32:27.491058Z",
+ "start_time": "2025-01-22T09:32:27.486755Z"
+ }
+ },
+ "source": [
+ "#same calculation, except here Z is a list of lists \n",
+ "Z = [[x ** 2 + y ** 2 for x in range(3)] for y in range(3)]\n",
+ "Z"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[[0, 1, 4], [1, 2, 5], [4, 5, 8]]"
+ ]
+ },
+ "execution_count": 60,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 60
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:32:34.752570Z",
+ "start_time": "2025-01-22T09:32:34.747781Z"
+ }
+ },
+ "source": [
+ "#Alternative : using mgrid (vu qu'on utilise une grille uniforme)\n",
+ "X, Y = np.mgrid[0:3, 0:3] # equivalent to np.meshgrid(np.arange(3),np.arange(3))\n",
+ "X, Y"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([[0, 0, 0],\n",
+ " [1, 1, 1],\n",
+ " [2, 2, 2]]),\n",
+ " array([[0, 1, 2],\n",
+ " [0, 1, 2],\n",
+ " [0, 1, 2]]))"
+ ]
+ },
+ "execution_count": 61,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 61
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:32:36.382070Z",
+ "start_time": "2025-01-22T09:32:36.377342Z"
+ }
+ },
+ "source": [
+ "# from -1 to 1, in steps of 0.5\n",
+ "X, Y = np.mgrid[-1:1:.5, -1:1:.5]\n",
+ "X, Y"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([[-1. , -1. , -1. , -1. ],\n",
+ " [-0.5, -0.5, -0.5, -0.5],\n",
+ " [ 0. , 0. , 0. , 0. ],\n",
+ " [ 0.5, 0.5, 0.5, 0.5]]),\n",
+ " array([[-1. , -0.5, 0. , 0.5],\n",
+ " [-1. , -0.5, 0. , 0.5],\n",
+ " [-1. , -0.5, 0. , 0.5],\n",
+ " [-1. , -0.5, 0. , 0.5]]))"
+ ]
+ },
+ "execution_count": 62,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 62
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/M1/Stats learning/TP1 A first example.ipynb b/M1/Stats learning/TP1 A first example.ipynb
new file mode 100644
index 0000000..714b670
--- /dev/null
+++ b/M1/Stats learning/TP1 A first example.ipynb
@@ -0,0 +1,1700 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# TP1 A first example of statistical learning\n",
+ "\n",
+ "\n",
+ "### Table of Contents\n",
+ "\n",
+ "* [1. Linear regression](#chapter1)\n",
+ "* [2. Polynomial regression](#chapter2)\n",
+ " * [2.1 `PolynomialFeatures` 2-dimensional features](#section2_1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We will use the following simulated data. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:32:45.876883Z",
+ "start_time": "2025-01-22T09:32:45.601010Z"
+ }
+ },
+ "source": [
+ "from cProfile import label\n",
+ "\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "rng = np.random.default_rng(seed=42)\n",
+ "size = 100\n",
+ "x = np.sort(rng.uniform(-5, 5, size))\n",
+ "X = x.reshape(size, 1) # See later (Question 8) for the reason why we reshape\n",
+ "# X into a 2-dimensional array with the same size (i.e \n",
+ "# same number of elements)\n",
+ "y = 0.5 + x ** 2 + x + 2 * rng.standard_normal(size)"
+ ],
+ "outputs": [],
+ "execution_count": 1
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now let us display the data : "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:34:02.429428Z",
+ "start_time": "2025-01-22T09:34:02.363995Z"
+ }
+ },
+ "source": [
+ "plt.scatter(x, y)\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": ""
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "execution_count": 7
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 1** : \n",
+ "1. Which variable is the *feature/input/covariable*? \n",
+ "2. Which variable is the *label/outcome/target*? \n",
+ "3. What is the dimension $X$? What is the sample size? \n"
+ ]
+ },
+ {
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:34:58.459483Z",
+ "start_time": "2025-01-22T09:34:58.457153Z"
+ }
+ },
+ "cell_type": "code",
+ "source": "print(X.shape, x.shape)",
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(100, 1) (100,)\n"
+ ]
+ }
+ ],
+ "execution_count": 8
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Answer for Exercise 1 :\n",
+ "\n",
+ "1. Feature : x\n",
+ "\n",
+ "2. Target : y\n",
+ "\n",
+ "3. The dimension of X is (100, 1) and the sample size is 100.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1. Linear regression \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We will first use linear regression to model the relationship between $x$ and $y$. That is, we are looking for $\\hat{a}$ and $\\hat{b}$ such that $\\hat{f}(x)=\\hat{a}\\cdot x+\\hat{b}$ is close to $y$ in the sense of quadratic loss.\n",
+ "\n",
+ "We will use the sklearn package : `sklearn.linear_model.LinearRegression`. \n",
+ "\n",
+ "(Ref: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) <-- See the example on this page. The goal is to get used to sklearn syntax as we will use this package for all sessions (except for the neural networks). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 2** : Define a linear regression model, named `lin_reg`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:38:51.198238Z",
+ "start_time": "2025-01-22T09:38:51.195801Z"
+ }
+ },
+ "source": [
+ "from sklearn.linear_model import LinearRegression\n",
+ "\n",
+ "lin_reg = LinearRegression()"
+ ],
+ "outputs": [],
+ "execution_count": 14
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 3** : Train the linear regression model `lin_reg` on $X$, y. If the training is successful, `lin_reg` will store the model and you will be able to access the different results using its attributes. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:38:55.111182Z",
+ "start_time": "2025-01-22T09:38:55.104393Z"
+ }
+ },
+ "source": "lin_reg.fit(X, y)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "LinearRegression()"
+ ],
+ "text/html": [
+ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LinearRegression()
"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 16
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 4**: Try the following code and explain the problem. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:38:57.098642Z",
+ "start_time": "2025-01-22T09:38:56.937883Z"
+ }
+ },
+ "source": "lin_reg.fit(x, y) # we used x instead of X",
+ "outputs": [
+ {
+ "ename": "ValueError",
+ "evalue": "Expected 2D array, got 1D array instead:\narray=[-4.9263773 -4.77287927 -4.69182165 -4.56196234 -4.41697258 -4.36182744\n -4.12350081 -4.09952139 -4.05822652 -3.85469926 -3.81994098 -3.71886367\n -3.70078495 -3.60247516 -3.60203002 -3.47687897 -3.45710508 -3.38728221\n -3.3302708 -3.10528641 -3.05361292 -3.00091798 -2.85415327 -2.73090651\n -2.72761278 -2.66060514 -2.18616108 -2.11671896 -2.06406242 -1.98487911\n -1.96049902 -1.87633359 -1.74174642 -1.45474032 -1.38187389 -1.29540294\n -1.29201976 -1.18978774 -1.12521621 -0.93613139 -0.91471356 -0.63282611\n -0.62848081 -0.6112156 -0.56585801 -0.53843724 -0.49614062 -0.41084224\n -0.3812277 -0.33278996 -0.30444189 -0.28903794 -0.24295074 0.01044775\n 0.53579401 0.54584787 0.57032152 0.59207161 0.65236106 0.68741196\n 1.30282593 1.31664399 1.3471832 1.4386512 1.61916515 1.6431354\n 1.64850857 1.68402962 1.69813995 1.82495504 1.83048953 1.96320375\n 1.97368029 2.00265102 2.05165379 2.22359351 2.44762156 2.5808774\n 2.61139702 2.64998857 2.73956049 2.78383497 2.80729031 2.83898209\n 2.86064305 2.86924378 3.04764357 3.14020385 3.22761613 3.27631172\n 3.32259801 3.32678196 3.53403073 3.5859792 3.93121121 4.26764989\n 4.61897665 4.67509732 4.70698024 4.75622352].\nReshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.",
+ "output_type": "error",
+ "traceback": [
+ "\u001B[0;31m---------------------------------------------------------------------------\u001B[0m",
+ "\u001B[0;31mValueError\u001B[0m Traceback (most recent call last)",
+ "Cell \u001B[0;32mIn[17], line 1\u001B[0m\n\u001B[0;32m----> 1\u001B[0m \u001B[43mlin_reg\u001B[49m\u001B[38;5;241;43m.\u001B[39;49m\u001B[43mfit\u001B[49m\u001B[43m(\u001B[49m\u001B[43mx\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43my\u001B[49m\u001B[43m)\u001B[49m \u001B[38;5;66;03m# we used x instead of X\u001B[39;00m\n",
+ "File \u001B[0;32m/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/sklearn/base.py:1389\u001B[0m, in \u001B[0;36m_fit_context..decorator..wrapper\u001B[0;34m(estimator, *args, **kwargs)\u001B[0m\n\u001B[1;32m 1382\u001B[0m estimator\u001B[38;5;241m.\u001B[39m_validate_params()\n\u001B[1;32m 1384\u001B[0m \u001B[38;5;28;01mwith\u001B[39;00m config_context(\n\u001B[1;32m 1385\u001B[0m skip_parameter_validation\u001B[38;5;241m=\u001B[39m(\n\u001B[1;32m 1386\u001B[0m prefer_skip_nested_validation \u001B[38;5;129;01mor\u001B[39;00m global_skip_validation\n\u001B[1;32m 1387\u001B[0m )\n\u001B[1;32m 1388\u001B[0m ):\n\u001B[0;32m-> 1389\u001B[0m \u001B[38;5;28;01mreturn\u001B[39;00m \u001B[43mfit_method\u001B[49m\u001B[43m(\u001B[49m\u001B[43mestimator\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43margs\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mkwargs\u001B[49m\u001B[43m)\u001B[49m\n",
+ "File \u001B[0;32m/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/sklearn/linear_model/_base.py:601\u001B[0m, in \u001B[0;36mLinearRegression.fit\u001B[0;34m(self, X, y, sample_weight)\u001B[0m\n\u001B[1;32m 597\u001B[0m n_jobs_ \u001B[38;5;241m=\u001B[39m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mn_jobs\n\u001B[1;32m 599\u001B[0m accept_sparse \u001B[38;5;241m=\u001B[39m \u001B[38;5;28;01mFalse\u001B[39;00m \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;28mself\u001B[39m\u001B[38;5;241m.\u001B[39mpositive \u001B[38;5;28;01melse\u001B[39;00m [\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mcsr\u001B[39m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mcsc\u001B[39m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mcoo\u001B[39m\u001B[38;5;124m\"\u001B[39m]\n\u001B[0;32m--> 601\u001B[0m X, y \u001B[38;5;241m=\u001B[39m \u001B[43mvalidate_data\u001B[49m\u001B[43m(\u001B[49m\n\u001B[1;32m 602\u001B[0m \u001B[43m \u001B[49m\u001B[38;5;28;43mself\u001B[39;49m\u001B[43m,\u001B[49m\n\u001B[1;32m 603\u001B[0m \u001B[43m \u001B[49m\u001B[43mX\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 604\u001B[0m \u001B[43m \u001B[49m\u001B[43my\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 605\u001B[0m \u001B[43m \u001B[49m\u001B[43maccept_sparse\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43maccept_sparse\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 606\u001B[0m \u001B[43m \u001B[49m\u001B[43my_numeric\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;28;43;01mTrue\u001B[39;49;00m\u001B[43m,\u001B[49m\n\u001B[1;32m 607\u001B[0m \u001B[43m \u001B[49m\u001B[43mmulti_output\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;28;43;01mTrue\u001B[39;49;00m\u001B[43m,\u001B[49m\n\u001B[1;32m 608\u001B[0m \u001B[43m \u001B[49m\u001B[43mforce_writeable\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;28;43;01mTrue\u001B[39;49;00m\u001B[43m,\u001B[49m\n\u001B[1;32m 609\u001B[0m \u001B[43m\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 611\u001B[0m has_sw \u001B[38;5;241m=\u001B[39m sample_weight \u001B[38;5;129;01mis\u001B[39;00m \u001B[38;5;129;01mnot\u001B[39;00m \u001B[38;5;28;01mNone\u001B[39;00m\n\u001B[1;32m 612\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m has_sw:\n",
+ "File \u001B[0;32m/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/sklearn/utils/validation.py:2961\u001B[0m, in \u001B[0;36mvalidate_data\u001B[0;34m(_estimator, X, y, reset, validate_separately, skip_check_array, **check_params)\u001B[0m\n\u001B[1;32m 2959\u001B[0m y \u001B[38;5;241m=\u001B[39m check_array(y, input_name\u001B[38;5;241m=\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124my\u001B[39m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;241m*\u001B[39m\u001B[38;5;241m*\u001B[39mcheck_y_params)\n\u001B[1;32m 2960\u001B[0m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[0;32m-> 2961\u001B[0m X, y \u001B[38;5;241m=\u001B[39m \u001B[43mcheck_X_y\u001B[49m\u001B[43m(\u001B[49m\u001B[43mX\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[43my\u001B[49m\u001B[43m,\u001B[49m\u001B[43m \u001B[49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[38;5;241;43m*\u001B[39;49m\u001B[43mcheck_params\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 2962\u001B[0m out \u001B[38;5;241m=\u001B[39m X, y\n\u001B[1;32m 2964\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m \u001B[38;5;129;01mnot\u001B[39;00m no_val_X \u001B[38;5;129;01mand\u001B[39;00m check_params\u001B[38;5;241m.\u001B[39mget(\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mensure_2d\u001B[39m\u001B[38;5;124m\"\u001B[39m, \u001B[38;5;28;01mTrue\u001B[39;00m):\n",
+ "File \u001B[0;32m/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/sklearn/utils/validation.py:1370\u001B[0m, in \u001B[0;36mcheck_X_y\u001B[0;34m(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)\u001B[0m\n\u001B[1;32m 1364\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mValueError\u001B[39;00m(\n\u001B[1;32m 1365\u001B[0m \u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;132;01m{\u001B[39;00mestimator_name\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m requires y to be passed, but the target y is None\u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1366\u001B[0m )\n\u001B[1;32m 1368\u001B[0m ensure_all_finite \u001B[38;5;241m=\u001B[39m _deprecate_force_all_finite(force_all_finite, ensure_all_finite)\n\u001B[0;32m-> 1370\u001B[0m X \u001B[38;5;241m=\u001B[39m \u001B[43mcheck_array\u001B[49m\u001B[43m(\u001B[49m\n\u001B[1;32m 1371\u001B[0m \u001B[43m \u001B[49m\u001B[43mX\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1372\u001B[0m \u001B[43m \u001B[49m\u001B[43maccept_sparse\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43maccept_sparse\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1373\u001B[0m \u001B[43m \u001B[49m\u001B[43maccept_large_sparse\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43maccept_large_sparse\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1374\u001B[0m \u001B[43m \u001B[49m\u001B[43mdtype\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mdtype\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1375\u001B[0m \u001B[43m \u001B[49m\u001B[43morder\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43morder\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1376\u001B[0m \u001B[43m \u001B[49m\u001B[43mcopy\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mcopy\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1377\u001B[0m \u001B[43m \u001B[49m\u001B[43mforce_writeable\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mforce_writeable\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1378\u001B[0m \u001B[43m \u001B[49m\u001B[43mensure_all_finite\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mensure_all_finite\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1379\u001B[0m \u001B[43m \u001B[49m\u001B[43mensure_2d\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mensure_2d\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1380\u001B[0m \u001B[43m \u001B[49m\u001B[43mallow_nd\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mallow_nd\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1381\u001B[0m \u001B[43m \u001B[49m\u001B[43mensure_min_samples\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mensure_min_samples\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1382\u001B[0m \u001B[43m \u001B[49m\u001B[43mensure_min_features\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mensure_min_features\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1383\u001B[0m \u001B[43m \u001B[49m\u001B[43mestimator\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[43mestimator\u001B[49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1384\u001B[0m \u001B[43m \u001B[49m\u001B[43minput_name\u001B[49m\u001B[38;5;241;43m=\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[38;5;124;43mX\u001B[39;49m\u001B[38;5;124;43m\"\u001B[39;49m\u001B[43m,\u001B[49m\n\u001B[1;32m 1385\u001B[0m \u001B[43m\u001B[49m\u001B[43m)\u001B[49m\n\u001B[1;32m 1387\u001B[0m y \u001B[38;5;241m=\u001B[39m _check_y(y, multi_output\u001B[38;5;241m=\u001B[39mmulti_output, y_numeric\u001B[38;5;241m=\u001B[39my_numeric, estimator\u001B[38;5;241m=\u001B[39mestimator)\n\u001B[1;32m 1389\u001B[0m check_consistent_length(X, y)\n",
+ "File \u001B[0;32m/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/sklearn/utils/validation.py:1093\u001B[0m, in \u001B[0;36mcheck_array\u001B[0;34m(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_writeable, force_all_finite, ensure_all_finite, ensure_non_negative, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)\u001B[0m\n\u001B[1;32m 1086\u001B[0m \u001B[38;5;28;01melse\u001B[39;00m:\n\u001B[1;32m 1087\u001B[0m msg \u001B[38;5;241m=\u001B[39m (\n\u001B[1;32m 1088\u001B[0m \u001B[38;5;124mf\u001B[39m\u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mExpected 2D array, got 1D array instead:\u001B[39m\u001B[38;5;130;01m\\n\u001B[39;00m\u001B[38;5;124marray=\u001B[39m\u001B[38;5;132;01m{\u001B[39;00marray\u001B[38;5;132;01m}\u001B[39;00m\u001B[38;5;124m.\u001B[39m\u001B[38;5;130;01m\\n\u001B[39;00m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1089\u001B[0m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mReshape your data either using array.reshape(-1, 1) if \u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1090\u001B[0m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124myour data has a single feature or array.reshape(1, -1) \u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1091\u001B[0m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mif it contains a single sample.\u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1092\u001B[0m )\n\u001B[0;32m-> 1093\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mValueError\u001B[39;00m(msg)\n\u001B[1;32m 1095\u001B[0m \u001B[38;5;28;01mif\u001B[39;00m dtype_numeric \u001B[38;5;129;01mand\u001B[39;00m \u001B[38;5;28mhasattr\u001B[39m(array\u001B[38;5;241m.\u001B[39mdtype, \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mkind\u001B[39m\u001B[38;5;124m\"\u001B[39m) \u001B[38;5;129;01mand\u001B[39;00m array\u001B[38;5;241m.\u001B[39mdtype\u001B[38;5;241m.\u001B[39mkind \u001B[38;5;129;01min\u001B[39;00m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mUSV\u001B[39m\u001B[38;5;124m\"\u001B[39m:\n\u001B[1;32m 1096\u001B[0m \u001B[38;5;28;01mraise\u001B[39;00m \u001B[38;5;167;01mValueError\u001B[39;00m(\n\u001B[1;32m 1097\u001B[0m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mdtype=\u001B[39m\u001B[38;5;124m'\u001B[39m\u001B[38;5;124mnumeric\u001B[39m\u001B[38;5;124m'\u001B[39m\u001B[38;5;124m is not compatible with arrays of bytes/strings.\u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1098\u001B[0m \u001B[38;5;124m\"\u001B[39m\u001B[38;5;124mConvert your data to numeric values explicitly instead.\u001B[39m\u001B[38;5;124m\"\u001B[39m\n\u001B[1;32m 1099\u001B[0m )\n",
+ "\u001B[0;31mValueError\u001B[0m: Expected 2D array, got 1D array instead:\narray=[-4.9263773 -4.77287927 -4.69182165 -4.56196234 -4.41697258 -4.36182744\n -4.12350081 -4.09952139 -4.05822652 -3.85469926 -3.81994098 -3.71886367\n -3.70078495 -3.60247516 -3.60203002 -3.47687897 -3.45710508 -3.38728221\n -3.3302708 -3.10528641 -3.05361292 -3.00091798 -2.85415327 -2.73090651\n -2.72761278 -2.66060514 -2.18616108 -2.11671896 -2.06406242 -1.98487911\n -1.96049902 -1.87633359 -1.74174642 -1.45474032 -1.38187389 -1.29540294\n -1.29201976 -1.18978774 -1.12521621 -0.93613139 -0.91471356 -0.63282611\n -0.62848081 -0.6112156 -0.56585801 -0.53843724 -0.49614062 -0.41084224\n -0.3812277 -0.33278996 -0.30444189 -0.28903794 -0.24295074 0.01044775\n 0.53579401 0.54584787 0.57032152 0.59207161 0.65236106 0.68741196\n 1.30282593 1.31664399 1.3471832 1.4386512 1.61916515 1.6431354\n 1.64850857 1.68402962 1.69813995 1.82495504 1.83048953 1.96320375\n 1.97368029 2.00265102 2.05165379 2.22359351 2.44762156 2.5808774\n 2.61139702 2.64998857 2.73956049 2.78383497 2.80729031 2.83898209\n 2.86064305 2.86924378 3.04764357 3.14020385 3.22761613 3.27631172\n 3.32259801 3.32678196 3.53403073 3.5859792 3.93121121 4.26764989\n 4.61897665 4.67509732 4.70698024 4.75622352].\nReshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
+ ]
+ }
+ ],
+ "execution_count": 17
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The same *problem* arises very often with sklearn : we will have to use `reshape` for one-dimensional features. Some extra info about `reshape`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:39:52.478784Z",
+ "start_time": "2025-01-22T09:39:52.472388Z"
+ }
+ },
+ "source": [
+ "a = np.arange(6).reshape(2, 3)\n",
+ "# 3 ways to reshape the array a into a 3 x 2 array \n",
+ "b = a.reshape(3, 2)\n",
+ "b2 = a.reshape(3, -1) # -1 means \"guess\" the number\n",
+ "b3 = a.reshape(-1, 2) # same\n",
+ "# obviously reshape(4,-1) would not work here as 2 X 3=6 is not\n",
+ "# divisible by 4...\n",
+ "print(a)\n",
+ "print(b)\n",
+ "print(b2)\n",
+ "print(b3)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[[0 1 2]\n",
+ " [3 4 5]]\n",
+ "[[0 1]\n",
+ " [2 3]\n",
+ " [4 5]]\n",
+ "[[0 1]\n",
+ " [2 3]\n",
+ " [4 5]]\n",
+ "[[0 1]\n",
+ " [2 3]\n",
+ " [4 5]]\n"
+ ]
+ }
+ ],
+ "execution_count": 20
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 5** : \n",
+ "\n",
+ "(1) Compute $\\hat{y}$ and the plot the line estimated by the model. \n",
+ " \n",
+ "Click here for a hint\n",
+ "You can use `lin_reg.predict`. \n",
+ "\n",
+ "\n",
+ "(2) Predict the value of $y$ for x=1. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:50:01.091807Z",
+ "start_time": "2025-01-22T09:50:00.964206Z"
+ }
+ },
+ "source": [
+ "y_predict = lin_reg.predict(X)\n",
+ "plt.plot(X, y_predict, color='r', label='Regression')\n",
+ "plt.scatter(X, y, label='Sample')\n",
+ "plt.legend()\n",
+ "plt.show()\n",
+ "\n",
+ "y_1 = lin_reg.predict([[1]])\n",
+ "print(\"The estimated value for x=1 is\", y_1)"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ],
+ "image/png": ""
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The estimated value for x=1 is [8.54372371]\n"
+ ]
+ }
+ ],
+ "execution_count": 52
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 6** : Display the coefficients $\\hat{a}$ and $\\hat{b}$ computed by `lin_reg`. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:42:26.316701Z",
+ "start_time": "2025-01-22T09:42:26.312959Z"
+ }
+ },
+ "source": "print(\"The estimated coefficients are a=\", lin_reg.coef_, \", b=\", lin_reg.intercept_)",
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The estimated coefficients are a= [0.6841386] , b= 7.8595851113619375\n"
+ ]
+ }
+ ],
+ "execution_count": 30
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 7** : Calculate the quadratic error $\\sum_{i=1}^{100}(y_i-\\hat{y}_i)^2$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:42:50.448815Z",
+ "start_time": "2025-01-22T09:42:50.445550Z"
+ }
+ },
+ "source": "print(np.sum((y - y_predict) ** 2))",
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "4906.471161398307\n"
+ ]
+ }
+ ],
+ "execution_count": 31
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 8** : What does the next line of code compute ? (see the doc : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html.)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:42:57.691432Z",
+ "start_time": "2025-01-22T09:42:57.686810Z"
+ }
+ },
+ "source": [
+ "lin_reg.score(X, y)"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0.06603572054935491"
+ ]
+ },
+ "execution_count": 32,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "execution_count": 32
+ },
+ {
+ "metadata": {},
+ "cell_type": "markdown",
+ "source": "The output is the $R^2$ coefficient. It is far from 1, which means that the model is not perfect."
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.Polynomial regression \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We seek 3 coefficients $\\hat{a}$, $\\hat{b}$ and $\\hat{c}$ such that $\\hat{f}(x)=\\hat{a}\\cdot x^{2}+\\hat{b}\\cdot x +\\hat{c}$ is close to $y$ with respect to the quadratic loss. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 9** : Create a new 2-dimensional array named `X2`, of dimension $100\\times$2, the first column of which is $x^2$, and the second column is $x$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:44:45.265060Z",
+ "start_time": "2025-01-22T09:44:45.261420Z"
+ }
+ },
+ "source": [
+ "X2 = np.array([x ** 2, x]).T\n",
+ "print(X2.shape)"
+ ],
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(100, 2)\n"
+ ]
+ }
+ ],
+ "execution_count": 36
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Exercise 10** : Fit a new model, named `poly2_reg`, which will be the linear regression of `y` on `X2` (which will correspond to 2nd degree polynomial regression of `y` on `X`)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "ExecuteTime": {
+ "end_time": "2025-01-22T09:45:05.543859Z",
+ "start_time": "2025-01-22T09:45:05.532828Z"
+ }
+ },
+ "source": [
+ "poly2_reg = LinearRegression()\n",
+ "poly2_reg.fit(X2, y)"
+ ],
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "LinearRegression()"
+ ],
+ "text/html": [
+ "
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.