Files
ArtStudies/M1/Statistical Learning/TP6_keras_intro.ipynb
Arthur DANJOU 8cf328e18a Refactor code in numerical methods notebooks
- Updated import order in Point_Fixe.ipynb for consistency.
- Changed lambda functions to regular function definitions for clarity in Point_Fixe.ipynb.
- Added numpy import in TP1_EDO_EulerExp.ipynb, TP2_Lokta_Volterra.ipynb, and TP3_Convergence.ipynb for better readability.
- Modified for loops in TP1_EDO_EulerExp.ipynb and TP2_Lokta_Volterra.ipynb to include strict=False for compatibility with future Python versions.
2025-09-01 16:14:53 +02:00

2267 lines
282 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
{
"cells": [
{
"cell_type": "markdown",
"id": "ced704ec-0d0d-45c3-9b55-2368396e5c77",
"metadata": {
"id": "ced704ec-0d0d-45c3-9b55-2368396e5c77"
},
"source": [
"# TP6 A short introduction to neural networks with Keras"
]
},
{
"cell_type": "markdown",
"id": "cd455d57-9a48-4a1c-82d6-2894f99cdb6d",
"metadata": {
"id": "cd455d57-9a48-4a1c-82d6-2894f99cdb6d"
},
"source": [
"We will use the Keras library, which serves as a high-level API for TensorFlow.\n",
"\n",
" Keras is a\n",
"deep-learning framework for Python that provides a convenient way to define and\n",
"train almost any kind of deep-learning model. Keras was initially developed for\n",
"researchers, with the aim of enabling fast experimentation.\n",
"Keras has the following key features:\n",
"- It allows the same code to run seamlessly on CPU or GPU.\n",
"- It has a user-friendly API that makes it easy to quickly prototype deep-learning\n",
"models.\n",
"- It has built-in support for convolutional networks (for computer vision), recurrent\n",
"networks (for sequence processing), and any combination of both.\n",
"- It supports arbitrary network architectures: multi-input or multi-output models,\n",
"layer sharing, model sharing, and so on.\n",
"\n",
"Extracted from the book \"Deep Learning with Python \", author : François Chollet.\n",
"\n",
"Remark : \n",
"\n",
" - PyTorch is more popular among researchers and academic practitioners for its flexibility and ease of use.\n",
"\n",
" - TensorFlow is preferred by industry professionals for large-scale applications and production deployment.\n"
]
},
{
"cell_type": "markdown",
"id": "ahhFpKIMWSxI",
"metadata": {
"id": "ahhFpKIMWSxI"
},
"source": [
"This tutorial uses the Fashion MNIST dataset which contains 70,000 grayscale images in 10 categories. The images show individual articles of clothing at low resolution (28 by 28 pixels). We will only use an MLP."
]
},
{
"cell_type": "markdown",
"id": "cT3zmP9N-Gfb",
"metadata": {
"id": "cT3zmP9N-Gfb"
},
"source": [
"Remark : in colab, try using the GPU instead of the CPU (Click on the \"Runtime\" menu at the top.\n",
"Select \"Change runtime type.\"\n",
"In the dialog box that appears, choose \"GPU\" under the Hardware accelerator dropdown menu.\n",
"Click \"Save.\" \n",
"\n",
"(GPU access is available with free Colab, but it comes with usage and performance limitations compared to the paid options)."
]
},
{
"cell_type": "markdown",
"id": "48a02457-d8ec-4c0a-b11f-7d4819c75e8d",
"metadata": {
"id": "48a02457-d8ec-4c0a-b11f-7d4819c75e8d"
},
"source": [
" Fashion MNIST is a slightly more challenging problem than regular MNIST. Both datasets are relatively small and are used to verify that an algorithm works as expected.\n",
"\n",
"Here, 60,000 images are used to train the network and 10,000 images to evaluate how accurately the network learned to classify images. You can access the Fashion MNIST directly from TensorFlow"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5260add2-2092-4849-b39b-0b4416d60275",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5260add2-2092-4849-b39b-0b4416d60275",
"outputId": "654e42f5-eb10-449a-f110-d62b0e81fc0a"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import tensorflow as tf\n",
"\n",
"tf.keras.utils.set_random_seed(42)\n",
"fashion_mnist = tf.keras.datasets.fashion_mnist\n",
"(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()"
]
},
{
"cell_type": "markdown",
"id": "1003ff8e-552e-425e-81df-85d623b062e3",
"metadata": {
"id": "1003ff8e-552e-425e-81df-85d623b062e3"
},
"source": [
"Let us take a look at the shape and the datatype of the training set :"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cf702fe0-4b88-441e-a6c1-73fd5c57111f",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cf702fe0-4b88-441e-a6c1-73fd5c57111f",
"outputId": "4f1ec97c-59a0-4eb7-ed01-8423374eb1e5"
},
"outputs": [
{
"data": {
"text/plain": [
"((60000, 28, 28), dtype('uint8'))"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X_train.shape, X_train.dtype"
]
},
{
"cell_type": "markdown",
"id": "oXlaIKYNYQIl",
"metadata": {
"id": "oXlaIKYNYQIl"
},
"source": [
"We will need a validation set during training. As the dataset is already shuffled, we will just use the last rows of the dataset for the validation set :"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "aoanwQnmYa3K",
"metadata": {
"id": "aoanwQnmYa3K"
},
"outputs": [],
"source": [
"X_val, y_val = X_train[-5000:], y_train[-5000:]\n",
"X_train, y_train = X_train[:-5000], y_train[:-5000]"
]
},
{
"cell_type": "markdown",
"id": "100d1fdb-d769-4dc7-aada-69d816b099aa",
"metadata": {
"id": "100d1fdb-d769-4dc7-aada-69d816b099aa"
},
"source": [
"Neural networks process inputs using small weight values, and inputs with large integer values can disrupt or slow down the learning process. As such it is good practice to normalize the pixel values. We do not really know the best way to scale the pixel values for modeling, but we know that some scaling will be required.\n",
"\n",
"A good starting point is to normalize the pixel values of grayscale images, e.g. rescale them to the range [0,1]. This involves first converting the data type from unsigned integers to floats, then dividing the pixel values by the maximum value.\n",
"\n",
"\n",
"\n",
"(If we used the sigmoid or tanh activation for the first layer, it would be even more important not to have too big input values, as they could cause saturation).\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "d5d925c1-6a53-4a9d-99ee-4ebb1a9c9026",
"metadata": {
"id": "d5d925c1-6a53-4a9d-99ee-4ebb1a9c9026"
},
"outputs": [],
"source": [
"X_train01, X_val01, X_test01 = X_train / 255.0, X_val / 255.0, X_test / 255.0"
]
},
{
"cell_type": "markdown",
"id": "0d393d58-bb7b-4a0b-90d2-2d1895300aa4",
"metadata": {
"id": "0d393d58-bb7b-4a0b-90d2-2d1895300aa4"
},
"source": [
"Remarks :\n",
" Normalizing the inputs and initializing the weights properly are particularly important in the context of deep learning (see e.g. the vanishing gradient problem or the exploding gradient problem). Here we have a shallow network, but the size of the inputs and of the weights matter anyway, we just have fewer problems.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "2e31918c-b394-410a-b879-474f95b25035",
"metadata": {
"id": "2e31918c-b394-410a-b879-474f95b25035"
},
"source": [
"Here are the class names (they are not included with the dataset) : "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "731ad9e7-57ae-47c5-b50d-5d77c28216bd",
"metadata": {
"id": "731ad9e7-57ae-47c5-b50d-5d77c28216bd"
},
"outputs": [],
"source": [
"class_names = [\n",
" \"T-shirt/top\",\n",
" \"Trouser\",\n",
" \"Pullover\",\n",
" \"Dress\",\n",
" \"Coat\",\n",
" \"Sandal\",\n",
" \"Shirt\",\n",
" \"Sneaker\",\n",
" \"Bag\",\n",
" \"Ankle boot\",\n",
"]"
]
},
{
"cell_type": "markdown",
"id": "7e3a55f3-d4b6-450e-93a2-2d6fb339c3b5",
"metadata": {
"id": "7e3a55f3-d4b6-450e-93a2-2d6fb339c3b5"
},
"source": [
"Q1. What does the first image of the training set represent ?"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "WDW-zdxKxv13",
"metadata": {
"id": "WDW-zdxKxv13"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1000x1000 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# answer Q1\n",
"import matplotlib.pyplot as plt\n",
"\n",
"plt.figure(figsize=(10, 10))\n",
"plt.imshow(X_train[0], cmap=\"gray\")\n",
"plt.xlabel(class_names[y_train[0]])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "244eb6d5-a902-41a6-ac33-c3d97036780a",
"metadata": {
"id": "244eb6d5-a902-41a6-ac33-c3d97036780a"
},
"source": [
"Let us display the first 25 images in the training set :"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "b2e30200-0700-435f-89cb-98e0ad0440bc",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 826
},
"id": "b2e30200-0700-435f-89cb-98e0ad0440bc",
"outputId": "ce7d30ed-d045-4673-b567-bf9699df5db6"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 1000x1000 with 25 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"plt.figure(figsize=(10, 10))\n",
"for i in range(25):\n",
" plt.subplot(5, 5, i + 1)\n",
" plt.xticks([])\n",
" plt.yticks([])\n",
" plt.grid(False)\n",
" plt.imshow(X_train[i], cmap=plt.cm.binary)\n",
" plt.xlabel(class_names[y_train[i]])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "f18ae0cd-588e-4087-add7-9d32b727f6e1",
"metadata": {
"id": "f18ae0cd-588e-4087-add7-9d32b727f6e1"
},
"source": [
"Now let us build the neural network. Here we will build a classification MLP (multi layer perceptron). "
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ed84911d-a6d5-484d-9ba6-97570069a4fb",
"metadata": {
"id": "ed84911d-a6d5-484d-9ba6-97570069a4fb"
},
"outputs": [],
"source": [
"model = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "d18862c4-b4b6-4aec-9312-8c2ee7644117",
"metadata": {
"id": "d18862c4-b4b6-4aec-9312-8c2ee7644117"
},
"source": [
"Q2.\n",
" a) What does the `Flatten` layer do ?\n",
"\n",
" b) What do the numbers 300, 100 and 10 represent ?\n",
"\n",
" c) How many hidden layers are there ?\n",
"\n",
" d) Why do we use the softmax activation function in the output layer and what do the outputs of the last layer represent ?\n",
"\n",
" e) What does \"sequential\" mean ?"
]
},
{
"cell_type": "markdown",
"id": "491ab5d1",
"metadata": {},
"source": [
"**Answer**:\n",
"\n",
"a) The flatten layer transforms the 2D input (28×28 pixels) into a 1D vector of size 784. This is necessary before feeding the data into fully connected (Dense) layers.\n",
"\n",
"b) They are the number of neurons in each Dense (fully connected) layer:\n",
"- 300 neurons in the first hidden layer\n",
"- 100 neurons in the second hidden layer\n",
"- 10 neurons in the output layer (one for each class, e.g., digits 09)\n",
"\n",
"c) There are 2 hidden layers (those with 300 and 100 neurons). The final layer is the output layer.\n",
"\n",
"d) Softmax turns the raw scores into probabilities that sum to 1. Each output represents the models confidence that the input belongs to a specific class (e.g., digit 09 in MNIST).\n",
"\n",
"e) It means the models layers are arranged in a linear sequence: each layer feeds directly into the next, with no branching or multiple inputs/outputs.\n",
"\n",
"f) A Dense layer (also called a fully connected layer) is a layer where every neuron is connected to all the neurons in the previous layer."
]
},
{
"cell_type": "markdown",
"id": "cf344a37-1423-4295-bcc9-7effe7a6e2b1",
"metadata": {
"id": "cf344a37-1423-4295-bcc9-7effe7a6e2b1"
},
"source": [
"The model's summary() method displays all the model's layers :"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b7d520b6-738e-413d-bf00-a47cc71c1c93",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 265
},
"id": "b7d520b6-738e-413d-bf00-a47cc71c1c93",
"outputId": "42afbab3-c5a5-4f85-ee83-85b9898ebdae"
},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\">Model: \"sequential\"</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1mModel: \"sequential\"\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
"┃<span style=\"font-weight: bold\"> Layer (type) </span>┃<span style=\"font-weight: bold\"> Output Shape </span>┃<span style=\"font-weight: bold\"> Param # </span>┃\n",
"┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
"│ flatten (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Flatten</span>) │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">784</span>) │ <span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> │\n",
"├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
"│ dense (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Dense</span>) │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">300</span>) │ <span style=\"color: #00af00; text-decoration-color: #00af00\">235,500</span> │\n",
"├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
"│ dense_1 (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Dense</span>) │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">100</span>) │ <span style=\"color: #00af00; text-decoration-color: #00af00\">30,100</span> │\n",
"├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
"│ dense_2 (<span style=\"color: #0087ff; text-decoration-color: #0087ff\">Dense</span>) │ (<span style=\"color: #00d7ff; text-decoration-color: #00d7ff\">None</span>, <span style=\"color: #00af00; text-decoration-color: #00af00\">10</span>) │ <span style=\"color: #00af00; text-decoration-color: #00af00\">1,010</span> │\n",
"└─────────────────────────────────┴────────────────────────┴───────────────┘\n",
"</pre>\n"
],
"text/plain": [
"┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n",
"┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n",
"┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n",
"│ flatten (\u001b[38;5;33mFlatten\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m784\u001b[0m) │ \u001b[38;5;34m0\u001b[0m │\n",
"├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
"│ dense (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m300\u001b[0m) │ \u001b[38;5;34m235,500\u001b[0m │\n",
"├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
"│ dense_1 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m100\u001b[0m) │ \u001b[38;5;34m30,100\u001b[0m │\n",
"├─────────────────────────────────┼────────────────────────┼───────────────┤\n",
"│ dense_2 (\u001b[38;5;33mDense\u001b[0m) │ (\u001b[38;5;45mNone\u001b[0m, \u001b[38;5;34m10\u001b[0m) │ \u001b[38;5;34m1,010\u001b[0m │\n",
"└─────────────────────────────────┴────────────────────────┴───────────────┘\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Total params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">266,610</span> (1.02 MB)\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1m Total params: \u001b[0m\u001b[38;5;34m266,610\u001b[0m (1.02 MB)\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Trainable params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">266,610</span> (1.02 MB)\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1m Trainable params: \u001b[0m\u001b[38;5;34m266,610\u001b[0m (1.02 MB)\n"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"font-weight: bold\"> Non-trainable params: </span><span style=\"color: #00af00; text-decoration-color: #00af00\">0</span> (0.00 B)\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1m Non-trainable params: \u001b[0m\u001b[38;5;34m0\u001b[0m (0.00 B)\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"model.summary()"
]
},
{
"cell_type": "markdown",
"id": "c8dbecd7-0695-4eda-a03e-ad3cedaf24a1",
"metadata": {
"id": "c8dbecd7-0695-4eda-a03e-ad3cedaf24a1"
},
"source": [
"Q3. a) What does 235,500 correspond to ?\n",
"\n",
"b) What is a \"non trainable\" parameter ?"
]
},
{
"cell_type": "markdown",
"id": "b86437d9",
"metadata": {},
"source": [
"**Answer**: \n",
"\n",
"a) `235 000` represents the total number of parameters (weights + biais) for the first layer ($(784+1)*300=235 500$).\n",
"b) The parameter is not modified by back-propagation"
]
},
{
"cell_type": "markdown",
"id": "mSmqvW3tBvey",
"metadata": {
"id": "mSmqvW3tBvey"
},
"source": [
"Q4. Display the weights and the biases of the first hidden layer."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "Khr8wuf_DKW-",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Khr8wuf_DKW-",
"outputId": "18d2cb28-ec1b-4ba5-ffbf-a27c4a8d558f"
},
"outputs": [
{
"data": {
"text/plain": [
"array([[-0.02708957, -0.01657043, -0.02541305, ..., -0.0117284 ,\n",
" -0.07759066, -0.04104815],\n",
" [ 0.03956839, 0.03968426, 0.11159597, ..., 0.00709551,\n",
" 0.10016222, 0.00951595],\n",
" [ 0.0368086 , -0.00992455, 0.00582458, ..., -0.05939534,\n",
" 0.00859205, 0.04936637],\n",
" ...,\n",
" [ 0.0014246 , -0.04466628, 0.00846922, ..., -0.05190999,\n",
" 0.03495238, 0.05571212],\n",
" [ 0.00618674, 0.00718611, -0.04097459, ..., -0.04273593,\n",
" 0.03054574, -0.05612838],\n",
" [ 0.02281788, 0.00126068, -0.07944933, ..., 0.08006408,\n",
" -0.02017345, 0.07210501]], shape=(784, 300), dtype=float32)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Answer for Q4.\n",
"weights, biaises = model.layers[1].get_weights()\n",
"weights"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "S__6iEM6NwHA",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "S__6iEM6NwHA",
"outputId": "e78e4fc9-64e4-497b-b944-b3e31408b614"
},
"outputs": [
{
"data": {
"text/plain": [
"array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
" 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"biaises"
]
},
{
"cell_type": "markdown",
"id": "w0aWN8LMwFS2",
"metadata": {
"id": "w0aWN8LMwFS2"
},
"source": [
"Q5.\n",
"\n",
"a) **Why don't we initialize all the weights to zero (as is done with the biases) ?**\n",
"\n",
"b) **What happens if we initialize all the weights and the biases of a layer with the same value ?**\n",
"\n",
"**Answer**: \n",
"\n",
"a) If all weights are initialized to zero, every neuron in the layer learns the same thing. Specifically:\n",
"\n",
"During forward propagation, all neurons in the same layer produce the same output.\n",
"\n",
"During backpropagation, they also receive the same gradients.\n",
"\n",
"As a result, they remain identical throughout training — this is called the symmetry problem.\n",
"\n",
"By contrast, biases can safely be initialized to zero because they are scalars per neuron and dont affect symmetry in the same way — they dont control how inputs are mixed.\n",
"\n",
"b) Same problem: no diversity in computation.\n",
"\n",
"All neurons in the layer will compute the same output (same weights + same bias).\n",
"\n",
"The network loses its capacity to learn different features, which defeats the purpose of having multiple neurons.\n",
"\n",
"The gradients with respect to each parameter are also the same → weights evolve identically, maintaining this symmetry forever.\n",
"\n",
"Thus, random initialization (usually with small values) breaks this symmetry and allows different neurons to specialize.\n",
"\n",
"----\n",
"\n",
"The initialization is usually random : the weights are sampled from a normal distribution or a uniform distribution, usually independently. In particular in the context of deep learning, the variances of these normal distributions are important to keep a relatively constant scale (in the neurons from layer to layer and in the gradients when doing the backpropagation).\n",
"\n",
"Remarks : when you create a multilayer perceptron (MLP) using Keras, each dense layer by default uses the Glorot Uniform initializer for its weight matrix and initializes the biases to zeros.\n",
"You can always override these defaults by specifying a different initializer in the layer's constructor if needed. Actually here, as we used a ReLU activation, we used the \"He\" intializer (it does not matter that much to use \"He\" or \"Gloriot\" here as the network is shallow. However, even for a shallow network, it is important for the weights not to be far too small or far too big.)\n",
"\n",
"See also :\n",
"\n",
"https://www.deeplearning.ai/ai-notes/initialization/index.html\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "b0bd0771-3463-46b8-bdef-5bcadbc07052",
"metadata": {
"id": "b0bd0771-3463-46b8-bdef-5bcadbc07052"
},
"source": [
"Now we need to \"compile the model\" : it means we will specify the loss function and the optimizer we use. Optionally, you can specify a list of extra metrics to compute during training and evaluation. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3fcfe918-d745-4c66-8350-dd89e34ac93c",
"metadata": {
"id": "3fcfe918-d745-4c66-8350-dd89e34ac93c"
},
"outputs": [],
"source": [
"model.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "5d06aed8-0611-461c-979c-9d14e3fe895f",
"metadata": {
"id": "5d06aed8-0611-461c-979c-9d14e3fe895f"
},
"source": [
"We used the `sparse_categorical_crossentropy` loss because we have \"sparse labels\" : for each instance, there is just a target class index : from 0 to 9. If instead we had one-hot vectors ( e.g. [0,0,1,0,0,0,0,0,0,0] to represent class 2), then we woud need to use the \"categorical_cross_entropy\" loss instead.\n",
"\n",
"Since it is a classifier, it is useful to measure its accuracy during training and evaluation, which is why we set `metrics=[\"accuracy\"]`. You can find the list of metrics proposed by keras here : https://www.tensorflow.org/api_docs/python/tf/keras/metrics\n",
"\n",
"Q6. What loss would we have chosen if we had a binary classification problem ? See : https://www.tensorflow.org/api_docs/python/tf/keras/losses\n",
"\n",
"Q7. What basic loss could we use for a regression problem ?\n"
]
},
{
"cell_type": "markdown",
"id": "813aff4b",
"metadata": {},
"source": [
"**Answer**:\n",
"\n",
"Q6) BinaryCrossentropy\n",
"\n",
"Q7) MeanSquareError or MeanAbsoluteError"
]
},
{
"cell_type": "markdown",
"id": "63c7c396-cd50-4822-bbf0-abcb0361fa01",
"metadata": {
"id": "63c7c396-cd50-4822-bbf0-abcb0361fa01"
},
"source": [
"Now the model is ready to be trained. For this we simply need to call its `fit` method :"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e784cc36-b04c-4aca-abfc-9fc081fd726b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "e784cc36-b04c-4aca-abfc-9fc081fd726b",
"outputId": "2b3ac988-ef08-4d0f-ece5-feb82c59c986"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 2ms/step - accuracy: 0.7850 - loss: 0.6068 - val_accuracy: 0.8392 - val_loss: 0.4062\n",
"Epoch 2/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8627 - loss: 0.3769 - val_accuracy: 0.8496 - val_loss: 0.3903\n",
"Epoch 3/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8785 - loss: 0.3303 - val_accuracy: 0.8546 - val_loss: 0.3747\n",
"Epoch 4/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8869 - loss: 0.3037 - val_accuracy: 0.8564 - val_loss: 0.3778\n",
"Epoch 5/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8955 - loss: 0.2810 - val_accuracy: 0.8696 - val_loss: 0.3513\n",
"Epoch 6/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9019 - loss: 0.2642 - val_accuracy: 0.8624 - val_loss: 0.3813\n",
"Epoch 7/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9075 - loss: 0.2505 - val_accuracy: 0.8692 - val_loss: 0.3840\n",
"Epoch 8/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9107 - loss: 0.2379 - val_accuracy: 0.8690 - val_loss: 0.3768\n",
"Epoch 9/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9154 - loss: 0.2250 - val_accuracy: 0.8708 - val_loss: 0.3813\n",
"Epoch 10/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9199 - loss: 0.2152 - val_accuracy: 0.8698 - val_loss: 0.4006\n",
"Epoch 11/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9217 - loss: 0.2051 - val_accuracy: 0.8742 - val_loss: 0.3909\n",
"Epoch 12/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9261 - loss: 0.1969 - val_accuracy: 0.8716 - val_loss: 0.4219\n",
"Epoch 13/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9292 - loss: 0.1890 - val_accuracy: 0.8752 - val_loss: 0.4213\n",
"Epoch 14/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9312 - loss: 0.1828 - val_accuracy: 0.8738 - val_loss: 0.4402\n",
"Epoch 15/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9324 - loss: 0.1763 - val_accuracy: 0.8736 - val_loss: 0.4419\n",
"Epoch 16/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m5s\u001b[0m 3ms/step - accuracy: 0.9362 - loss: 0.1691 - val_accuracy: 0.8732 - val_loss: 0.4577\n",
"Epoch 17/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9379 - loss: 0.1626 - val_accuracy: 0.8760 - val_loss: 0.4588\n",
"Epoch 18/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9375 - loss: 0.1652 - val_accuracy: 0.8720 - val_loss: 0.4908\n",
"Epoch 19/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9436 - loss: 0.1518 - val_accuracy: 0.8768 - val_loss: 0.5365\n",
"Epoch 20/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9417 - loss: 0.1511 - val_accuracy: 0.8756 - val_loss: 0.5036\n",
"Epoch 21/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9435 - loss: 0.1457 - val_accuracy: 0.8658 - val_loss: 0.6143\n",
"Epoch 22/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9448 - loss: 0.1445 - val_accuracy: 0.8808 - val_loss: 0.4766\n",
"Epoch 23/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.9471 - loss: 0.1395 - val_accuracy: 0.8732 - val_loss: 0.5531\n",
"Epoch 24/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m2s\u001b[0m 1ms/step - accuracy: 0.9481 - loss: 0.1316 - val_accuracy: 0.8692 - val_loss: 0.5819\n",
"Epoch 25/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9482 - loss: 0.1355 - val_accuracy: 0.8780 - val_loss: 0.5516\n",
"Epoch 26/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9478 - loss: 0.1340 - val_accuracy: 0.8716 - val_loss: 0.6001\n",
"Epoch 27/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9520 - loss: 0.1270 - val_accuracy: 0.8726 - val_loss: 0.5817\n",
"Epoch 28/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9529 - loss: 0.1225 - val_accuracy: 0.8728 - val_loss: 0.6118\n",
"Epoch 29/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9530 - loss: 0.1208 - val_accuracy: 0.8742 - val_loss: 0.6002\n",
"Epoch 30/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9545 - loss: 0.1167 - val_accuracy: 0.8748 - val_loss: 0.6255\n",
"Epoch 31/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9550 - loss: 0.1166 - val_accuracy: 0.8772 - val_loss: 0.6220\n",
"Epoch 32/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 2ms/step - accuracy: 0.9566 - loss: 0.1126 - val_accuracy: 0.8652 - val_loss: 0.6661\n",
"Epoch 33/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9586 - loss: 0.1101 - val_accuracy: 0.8742 - val_loss: 0.6406\n",
"Epoch 34/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.9555 - loss: 0.1156 - val_accuracy: 0.8818 - val_loss: 0.6403\n",
"Epoch 35/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.9604 - loss: 0.1054 - val_accuracy: 0.8742 - val_loss: 0.6165\n",
"Epoch 36/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9610 - loss: 0.1020 - val_accuracy: 0.8746 - val_loss: 0.6116\n",
"Epoch 37/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9621 - loss: 0.0999 - val_accuracy: 0.8772 - val_loss: 0.6669\n",
"Epoch 38/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.9618 - loss: 0.0983 - val_accuracy: 0.8732 - val_loss: 0.7090\n",
"Epoch 39/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.9616 - loss: 0.1001 - val_accuracy: 0.8822 - val_loss: 0.6457\n",
"Epoch 40/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9631 - loss: 0.0970 - val_accuracy: 0.8748 - val_loss: 0.7724\n",
"Epoch 41/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.9637 - loss: 0.0975 - val_accuracy: 0.8788 - val_loss: 0.6992\n",
"Epoch 42/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9637 - loss: 0.0961 - val_accuracy: 0.8720 - val_loss: 0.7026\n",
"Epoch 43/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9649 - loss: 0.0942 - val_accuracy: 0.8766 - val_loss: 0.7901\n",
"Epoch 44/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9660 - loss: 0.0868 - val_accuracy: 0.8822 - val_loss: 0.6867\n",
"Epoch 45/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9666 - loss: 0.0859 - val_accuracy: 0.8788 - val_loss: 0.7211\n",
"Epoch 46/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9650 - loss: 0.0945 - val_accuracy: 0.8808 - val_loss: 0.7023\n",
"Epoch 47/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9678 - loss: 0.0861 - val_accuracy: 0.8802 - val_loss: 0.7661\n",
"Epoch 48/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9673 - loss: 0.0903 - val_accuracy: 0.8754 - val_loss: 0.7853\n",
"Epoch 49/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9678 - loss: 0.0856 - val_accuracy: 0.8808 - val_loss: 0.7499\n",
"Epoch 50/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9682 - loss: 0.0821 - val_accuracy: 0.8824 - val_loss: 0.7406\n",
"Epoch 51/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9691 - loss: 0.0808 - val_accuracy: 0.8826 - val_loss: 0.7655\n",
"Epoch 52/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9710 - loss: 0.0757 - val_accuracy: 0.8822 - val_loss: 0.8760\n",
"Epoch 53/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9696 - loss: 0.0804 - val_accuracy: 0.8860 - val_loss: 0.7731\n",
"Epoch 54/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9731 - loss: 0.0747 - val_accuracy: 0.8806 - val_loss: 0.8113\n",
"Epoch 55/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9705 - loss: 0.0768 - val_accuracy: 0.8864 - val_loss: 0.8137\n",
"Epoch 56/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9703 - loss: 0.0798 - val_accuracy: 0.8814 - val_loss: 0.8953\n",
"Epoch 57/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9689 - loss: 0.0814 - val_accuracy: 0.8794 - val_loss: 0.8376\n",
"Epoch 58/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9722 - loss: 0.0756 - val_accuracy: 0.8806 - val_loss: 0.8223\n",
"Epoch 59/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9736 - loss: 0.0702 - val_accuracy: 0.8868 - val_loss: 0.8391\n",
"Epoch 60/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9720 - loss: 0.0726 - val_accuracy: 0.8838 - val_loss: 0.8715\n"
]
}
],
"source": [
"history = model.fit(X_train01, y_train, epochs=60, validation_data=(X_val01, y_val))"
]
},
{
"cell_type": "markdown",
"id": "pft_8LriRFk6",
"metadata": {
"id": "pft_8LriRFk6"
},
"source": [
"Remark : if you call the fit method again, keras continues the training where it left off.\n"
]
},
{
"cell_type": "markdown",
"id": "xkoeO0kLF3fp",
"metadata": {
"id": "xkoeO0kLF3fp"
},
"source": [
"Q8. Can you recall what is an \"epoch\" ?\n",
"\n",
"**Answer**: An epoch in training a neural network is one complete pass through the entire training dataset.\n",
"\n",
"During an epoch, the model sees every training example once and updates its weights accordingly.\n",
"\n",
"Usually, data is split into mini-batches, so multiple updates happen within a single epoch (this is mini-batch gradient descent).\n",
"\n",
"Training typically involves multiple epochs so the model can gradually improve.\n",
"\n",
"Think of an epoch as one full cycle through the training data to learn patterns and adjust weights."
]
},
{
"cell_type": "markdown",
"id": "eSKrY64nFqdj",
"metadata": {
"id": "eSKrY64nFqdj"
},
"source": [
"Q9. The `fit()` method also has the two arguments `class_weight` and `sample_weight` (not used here). When can these arguments be useful ?\n",
"\n",
"- `class_weight` is used to give more importance to underrepresented classes.\n",
"- `sample_weight` assigns weights to individual samples, useful for noisy data or time-sensitive importance."
]
},
{
"cell_type": "markdown",
"id": "a-6IJhqnMcnm",
"metadata": {
"id": "a-6IJhqnMcnm"
},
"source": [
"The `fit()` method returns a History object containing in particular a dictionary (`history.history`) containing the loss and extra metrics it measured at the end of each epoch on the training set and on the validation set (if any). Let us display the learning curves."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "_UQsOj8JPc3q",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 470
},
"id": "_UQsOj8JPc3q",
"outputId": "8f1d6388-d46b-4891-d73f-793ccf3b9d32"
},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"\n",
"pd.DataFrame(history.history).plot(\n",
" figsize=(8, 5),\n",
" xlim=[0, 59],\n",
" ylim=[0, 1],\n",
" grid=True,\n",
" xlabel=\"Epoch\",\n",
" style=[\"r--\", \"b--\", \"r\", \"b\"],\n",
")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "L2qi9QymVrXy",
"metadata": {
"id": "L2qi9QymVrXy"
},
"source": [
"Q10. 4 values are displayed : \"accuracy\", \"loss\", \"val_accuracy\" and \"val_loss\". Can you tell precisely what each value represents ?\n",
"\n",
"**Answer**: \n",
"\n",
"- `accuracy`: Correct prediction rate on the training set.\n",
"- `loss`: Loss function value on the training set.\n",
"- `val_accuracy`: Correct prediction rate on the validation set.\n",
"- `val_loss`: Loss function value on the validation set."
]
},
{
"cell_type": "markdown",
"id": "04TFG3NsRH0N",
"metadata": {
"id": "04TFG3NsRH0N"
},
"source": [
"Q11. Comment the curves.\n",
"\n",
"Remark : if we kept increasing the number of epochs, we would end up with a training loss close to zero and a training accuracy close to 100%. More precisely, after about 150 epochs, we get the following values\n",
"- validation loss : about 1.4\n",
"- training loss : about 0.03\n",
"- validation accuracy : about 89%\n",
"- training accuracy : about 99%\n",
"\n",
"**Answer**: There is overfitting (not so grave). This is due to the large number of parameters in the neural network (aobut 260k) while the number of parameters of the dataset is smaller (about 50k)"
]
},
{
"cell_type": "markdown",
"id": "nBnufjfFXqiI",
"metadata": {
"id": "nBnufjfFXqiI"
},
"source": [
"Q12. a) Find the indices, for the first 200 rows of the validation set, where the model (that we trained for 60 epochs) gets wrong.\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "MUnWm1MfyBBk",
"metadata": {
"id": "MUnWm1MfyBBk"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1m7/7\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 4ms/step \n"
]
}
],
"source": [
"probas_val_60 = model.predict(X_val01[:200])\n",
"classif_val_60 = probas_val_60.argmax(axis=1)\n",
"wrong_classif_val_60 = classif_val_60 != y_val[:200]"
]
},
{
"cell_type": "markdown",
"id": "vJ3wYKSDr2-L",
"metadata": {
"id": "vJ3wYKSDr2-L"
},
"source": [
"b) Now let us train the model for only 10 epochs and call this model `model_10`."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "DBsp72CAqRef",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DBsp72CAqRef",
"outputId": "ab6e22c3-19c2-4610-d028-02a4e940b209"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7866 - loss: 0.5999 - val_accuracy: 0.8326 - val_loss: 0.4162\n",
"Epoch 2/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8624 - loss: 0.3776 - val_accuracy: 0.8512 - val_loss: 0.3772\n",
"Epoch 3/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8763 - loss: 0.3341 - val_accuracy: 0.8502 - val_loss: 0.3915\n",
"Epoch 4/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.8875 - loss: 0.3049 - val_accuracy: 0.8592 - val_loss: 0.3675\n",
"Epoch 5/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8936 - loss: 0.2835 - val_accuracy: 0.8720 - val_loss: 0.3465\n",
"Epoch 6/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9017 - loss: 0.2663 - val_accuracy: 0.8752 - val_loss: 0.3437\n",
"Epoch 7/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9065 - loss: 0.2499 - val_accuracy: 0.8752 - val_loss: 0.3545\n",
"Epoch 8/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9102 - loss: 0.2404 - val_accuracy: 0.8810 - val_loss: 0.3369\n",
"Epoch 9/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9145 - loss: 0.2271 - val_accuracy: 0.8740 - val_loss: 0.3714\n",
"Epoch 10/10\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9161 - loss: 0.2195 - val_accuracy: 0.8816 - val_loss: 0.3601\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x31c633350>"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_10 = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"\n",
"model_10.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"\n",
"model_10.fit(X_train01, y_train, epochs=10, validation_data=(X_val01, y_val))"
]
},
{
"cell_type": "markdown",
"id": "jowzB0kI6a4t",
"metadata": {
"id": "jowzB0kI6a4t"
},
"source": [
"So the accuracy on the validation set is almost the same with 10 epochs as with 60 epochs (stagnant val accuracy, and close to the training accuracy) but the validation loss is much better (and rather close to the training loss). If we want to understand better the situation, let us look at some values for the predicted probabilities."
]
},
{
"cell_type": "markdown",
"id": "ESbfJeBnsOHl",
"metadata": {
"id": "ESbfJeBnsOHl"
},
"source": [
"c) Find the indices, for the first 200 rows of the validation set, where `model_10` (that we trained for 10 epochs) gets wrong."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "7w49W7ecsb42",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7w49W7ecsb42",
"outputId": "2cef9214-3a2c-4631-c8f2-6275baae3fbf"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1m7/7\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 5ms/step \n"
]
}
],
"source": [
"probas_val_10 = model_10.predict(X_val01[:200])\n",
"classif_val_10 = probas_val_10.argmax(axis=1)\n",
"wrong_classif_val_10 = classif_val_10 != y_val[:200]\n",
"\n",
"indices = np.where(wrong_classif_val_10 * wrong_classif_val_60)[0]"
]
},
{
"cell_type": "markdown",
"id": "T3uLsznrwP-c",
"metadata": {
"id": "T3uLsznrwP-c"
},
"source": [
"d) Display the estimated probabilities of the two models for the right class and for the indices where both models gets wrong (among the first 200 rows of the validation set). (Round to 3 decimal places) \n",
" "
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "Hf7QyWqMFnUs",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Hf7QyWqMFnUs",
"outputId": "55185cf2-1517-49b2-f1a2-82cf1820dca8"
},
"outputs": [
{
"data": {
"text/plain": [
"array([0.001, 0.393, 0.343, 0. , 0. , 0.001, 0.023, 0.189, 0. ,\n",
" 0. , 0. , 0.001, 0.11 , 0.002, 0. , 0.005, 0.006, 0.021],\n",
" dtype=float32)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"probas_val_60[indices, y_val[indices]].round(3)"
]
},
{
"cell_type": "markdown",
"id": "qrUnU77H7FAx",
"metadata": {
"id": "qrUnU77H7FAx"
},
"source": [
" We see that most of the time, in the overfitting situation (60 epochs), for an instance that the model does not classify correctly, the probability for the correct label tends to be lower : the model gets wrong for those instances and is \"overly confident while it is wrong\" (it indicates a high probability for a wrong label so the probability for the correct label gets lower making the loss increase).\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "71vLRKp0TZeo",
"metadata": {
"id": "71vLRKp0TZeo"
},
"source": [
" An idea to prevent this overfitting is to use **Early stopping** :"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "vYe04DM5cu4r",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vYe04DM5cu4r",
"outputId": "825bd44c-888d-4820-ca98-4fe2274b7d27"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7864 - loss: 0.5990 - val_accuracy: 0.8298 - val_loss: 0.4282\n",
"Epoch 2/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8627 - loss: 0.3785 - val_accuracy: 0.8450 - val_loss: 0.3963\n",
"Epoch 3/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8772 - loss: 0.3354 - val_accuracy: 0.8544 - val_loss: 0.3724\n",
"Epoch 4/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8889 - loss: 0.3041 - val_accuracy: 0.8632 - val_loss: 0.3656\n",
"Epoch 5/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8957 - loss: 0.2829 - val_accuracy: 0.8686 - val_loss: 0.3534\n",
"Epoch 6/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9019 - loss: 0.2651 - val_accuracy: 0.8746 - val_loss: 0.3442\n",
"Epoch 7/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9071 - loss: 0.2491 - val_accuracy: 0.8844 - val_loss: 0.3403\n",
"Epoch 8/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9088 - loss: 0.2387 - val_accuracy: 0.8840 - val_loss: 0.3357\n",
"Epoch 9/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9157 - loss: 0.2245 - val_accuracy: 0.8834 - val_loss: 0.3541\n",
"Epoch 10/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9198 - loss: 0.2157 - val_accuracy: 0.8714 - val_loss: 0.4132\n",
"Epoch 11/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9220 - loss: 0.2063 - val_accuracy: 0.8770 - val_loss: 0.4210\n",
"Epoch 12/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9256 - loss: 0.1986 - val_accuracy: 0.8768 - val_loss: 0.4228\n",
"Epoch 13/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9291 - loss: 0.1875 - val_accuracy: 0.8738 - val_loss: 0.4382\n"
]
}
],
"source": [
"early_stopping_cb = tf.keras.callbacks.EarlyStopping(\n",
" patience=5, restore_best_weights=True\n",
")\n",
"\n",
"model = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"\n",
"model.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"\n",
"history2 = model.fit(\n",
" X_train01,\n",
" y_train,\n",
" epochs=60,\n",
" validation_data=(X_val01, y_val),\n",
" callbacks=[early_stopping_cb],\n",
")"
]
},
{
"cell_type": "markdown",
"id": "MwTfIPcAeH6R",
"metadata": {
"id": "MwTfIPcAeH6R"
},
"source": [
"Q13. a) Explain what the previous cell does (first and last line).\n",
"\n",
" Hint : https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping\n",
"\n",
"b) Can you name another way to prevent overfitting ? (mentioned in the lecture notes).\n",
"\n",
"**Answer**: \n",
"\n",
"a) *First Line*: This creates a callback to stop training early if the validation loss doesn't improve for 5 consecutive epochs (patience=5). With restore_best_weights=True, the model will roll back to the weights from the epoch with the best validation loss, preventing overfitting.\n",
"*Last Line*: This trains the model on X_train01 and y_train for up to 60 epochs. It uses the validation set (X_val01, y_val) to monitor performance. The training may stop early thanks to the early_stopping_cb, and training history is saved in history2.\n",
"\n",
"b) We can use Dropout. It randomly \"drops\" (i.e. sets to zero) a fraction of the neurons during training, which helps the model generalize better by preventing co-adaptations of neurons. Example:\n",
"\n",
"```{python}\n",
"tf.keras.layers.Dropout(rate=0.5)\n",
"```\n",
"This can be inserted between dense layers to regularize the model.\n"
]
},
{
"cell_type": "markdown",
"id": "Anus7RlTV_bL",
"metadata": {
"id": "Anus7RlTV_bL"
},
"source": [
"Remark : if you are not satisfied with the performance of your model, you should tune the hyperparameters. There are a lot of hyperparameters that you can tune : learning rate, optimizer, number of hidden layers, number of neurons, batch size etc.\n",
"If you want to fine tune the hyperparameters, you can use the Keras Tuner library : https://www.tensorflow.org/tutorials/keras/keras_tuner"
]
},
{
"cell_type": "markdown",
"id": "qZ-0xPXzcglX",
"metadata": {
"id": "qZ-0xPXzcglX"
},
"source": [
"Now let us see the performance of our model on the test set."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "q6BXm1YJUmKK",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "q6BXm1YJUmKK",
"outputId": "6c7f3f57-fd81-4727-8422-a4480be423af"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[1m313/313\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 547us/step - accuracy: 0.8669 - loss: 55.3559\n"
]
},
{
"data": {
"text/plain": [
"[58.230831146240234, 0.864799976348877]"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model.evaluate(X_test, y_test)"
]
},
{
"cell_type": "markdown",
"id": "MexBSb7NCYCP",
"metadata": {
"id": "MexBSb7NCYCP"
},
"source": [
"**Optional 1** Train a logistic regression model : this model has far fewer parameters : you should not see the same overfitting problem when increasing the number of epochs."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "GUZldkyGyYwF",
"metadata": {
"id": "GUZldkyGyYwF"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 464us/step - accuracy: 0.7328 - loss: 0.7938 - val_accuracy: 0.8340 - val_loss: 0.4826\n",
"Epoch 2/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 430us/step - accuracy: 0.8391 - loss: 0.4794 - val_accuracy: 0.8422 - val_loss: 0.4516\n",
"Epoch 3/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 425us/step - accuracy: 0.8498 - loss: 0.4464 - val_accuracy: 0.8442 - val_loss: 0.4396\n",
"Epoch 4/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 425us/step - accuracy: 0.8539 - loss: 0.4302 - val_accuracy: 0.8468 - val_loss: 0.4333\n",
"Epoch 5/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 428us/step - accuracy: 0.8566 - loss: 0.4200 - val_accuracy: 0.8476 - val_loss: 0.4295\n",
"Epoch 6/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 427us/step - accuracy: 0.8587 - loss: 0.4128 - val_accuracy: 0.8486 - val_loss: 0.4271\n",
"Epoch 7/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 428us/step - accuracy: 0.8605 - loss: 0.4072 - val_accuracy: 0.8470 - val_loss: 0.4256\n",
"Epoch 8/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 461us/step - accuracy: 0.8622 - loss: 0.4028 - val_accuracy: 0.8486 - val_loss: 0.4246\n",
"Epoch 9/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 427us/step - accuracy: 0.8629 - loss: 0.3992 - val_accuracy: 0.8488 - val_loss: 0.4239\n",
"Epoch 10/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 427us/step - accuracy: 0.8634 - loss: 0.3960 - val_accuracy: 0.8482 - val_loss: 0.4236\n",
"Epoch 11/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 427us/step - accuracy: 0.8641 - loss: 0.3933 - val_accuracy: 0.8480 - val_loss: 0.4235\n",
"Epoch 12/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 428us/step - accuracy: 0.8648 - loss: 0.3910 - val_accuracy: 0.8476 - val_loss: 0.4235\n",
"Epoch 13/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 428us/step - accuracy: 0.8659 - loss: 0.3889 - val_accuracy: 0.8484 - val_loss: 0.4236\n",
"Epoch 14/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 423us/step - accuracy: 0.8665 - loss: 0.3870 - val_accuracy: 0.8490 - val_loss: 0.4238\n",
"Epoch 15/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 424us/step - accuracy: 0.8671 - loss: 0.3852 - val_accuracy: 0.8486 - val_loss: 0.4241\n",
"Epoch 16/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 423us/step - accuracy: 0.8677 - loss: 0.3837 - val_accuracy: 0.8486 - val_loss: 0.4244\n",
"Epoch 17/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 466us/step - accuracy: 0.8683 - loss: 0.3822 - val_accuracy: 0.8482 - val_loss: 0.4248\n",
"Epoch 18/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 439us/step - accuracy: 0.8689 - loss: 0.3809 - val_accuracy: 0.8486 - val_loss: 0.4252\n",
"Epoch 19/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 429us/step - accuracy: 0.8694 - loss: 0.3797 - val_accuracy: 0.8476 - val_loss: 0.4256\n",
"Epoch 20/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 424us/step - accuracy: 0.8698 - loss: 0.3785 - val_accuracy: 0.8472 - val_loss: 0.4260\n",
"Epoch 21/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 429us/step - accuracy: 0.8701 - loss: 0.3774 - val_accuracy: 0.8468 - val_loss: 0.4264\n",
"Epoch 22/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 426us/step - accuracy: 0.8704 - loss: 0.3764 - val_accuracy: 0.8468 - val_loss: 0.4269\n",
"Epoch 23/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 425us/step - accuracy: 0.8705 - loss: 0.3755 - val_accuracy: 0.8468 - val_loss: 0.4274\n",
"Epoch 24/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 426us/step - accuracy: 0.8705 - loss: 0.3746 - val_accuracy: 0.8470 - val_loss: 0.4278\n",
"Epoch 25/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 453us/step - accuracy: 0.8707 - loss: 0.3737 - val_accuracy: 0.8468 - val_loss: 0.4283\n",
"Epoch 26/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 448us/step - accuracy: 0.8708 - loss: 0.3729 - val_accuracy: 0.8470 - val_loss: 0.4288\n",
"Epoch 27/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 443us/step - accuracy: 0.8709 - loss: 0.3721 - val_accuracy: 0.8464 - val_loss: 0.4292\n",
"Epoch 28/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 443us/step - accuracy: 0.8709 - loss: 0.3714 - val_accuracy: 0.8464 - val_loss: 0.4297\n",
"Epoch 29/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 462us/step - accuracy: 0.8710 - loss: 0.3707 - val_accuracy: 0.8464 - val_loss: 0.4302\n",
"Epoch 30/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 453us/step - accuracy: 0.8711 - loss: 0.3700 - val_accuracy: 0.8468 - val_loss: 0.4307\n",
"Epoch 31/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 420us/step - accuracy: 0.8710 - loss: 0.3693 - val_accuracy: 0.8462 - val_loss: 0.4311\n",
"Epoch 32/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8714 - loss: 0.3687 - val_accuracy: 0.8458 - val_loss: 0.4316\n",
"Epoch 33/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 417us/step - accuracy: 0.8716 - loss: 0.3681 - val_accuracy: 0.8456 - val_loss: 0.4321\n",
"Epoch 34/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 420us/step - accuracy: 0.8721 - loss: 0.3675 - val_accuracy: 0.8452 - val_loss: 0.4326\n",
"Epoch 35/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 415us/step - accuracy: 0.8724 - loss: 0.3670 - val_accuracy: 0.8448 - val_loss: 0.4331\n",
"Epoch 36/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 420us/step - accuracy: 0.8726 - loss: 0.3665 - val_accuracy: 0.8450 - val_loss: 0.4335\n",
"Epoch 37/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8726 - loss: 0.3659 - val_accuracy: 0.8456 - val_loss: 0.4340\n",
"Epoch 38/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8727 - loss: 0.3654 - val_accuracy: 0.8450 - val_loss: 0.4345\n",
"Epoch 39/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 419us/step - accuracy: 0.8729 - loss: 0.3650 - val_accuracy: 0.8452 - val_loss: 0.4350\n",
"Epoch 40/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 415us/step - accuracy: 0.8731 - loss: 0.3645 - val_accuracy: 0.8448 - val_loss: 0.4354\n",
"Epoch 41/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 417us/step - accuracy: 0.8732 - loss: 0.3640 - val_accuracy: 0.8448 - val_loss: 0.4359\n",
"Epoch 42/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 417us/step - accuracy: 0.8733 - loss: 0.3636 - val_accuracy: 0.8446 - val_loss: 0.4364\n",
"Epoch 43/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 415us/step - accuracy: 0.8736 - loss: 0.3632 - val_accuracy: 0.8440 - val_loss: 0.4369\n",
"Epoch 44/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 453us/step - accuracy: 0.8738 - loss: 0.3628 - val_accuracy: 0.8434 - val_loss: 0.4373\n",
"Epoch 45/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 417us/step - accuracy: 0.8740 - loss: 0.3624 - val_accuracy: 0.8434 - val_loss: 0.4378\n",
"Epoch 46/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8740 - loss: 0.3620 - val_accuracy: 0.8426 - val_loss: 0.4382\n",
"Epoch 47/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8741 - loss: 0.3616 - val_accuracy: 0.8428 - val_loss: 0.4387\n",
"Epoch 48/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 448us/step - accuracy: 0.8743 - loss: 0.3612 - val_accuracy: 0.8428 - val_loss: 0.4392\n",
"Epoch 49/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 417us/step - accuracy: 0.8745 - loss: 0.3608 - val_accuracy: 0.8424 - val_loss: 0.4396\n",
"Epoch 50/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8746 - loss: 0.3605 - val_accuracy: 0.8420 - val_loss: 0.4401\n",
"Epoch 51/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8747 - loss: 0.3601 - val_accuracy: 0.8420 - val_loss: 0.4405\n",
"Epoch 52/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 417us/step - accuracy: 0.8747 - loss: 0.3598 - val_accuracy: 0.8422 - val_loss: 0.4410\n",
"Epoch 53/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 419us/step - accuracy: 0.8746 - loss: 0.3595 - val_accuracy: 0.8420 - val_loss: 0.4414\n",
"Epoch 54/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 414us/step - accuracy: 0.8748 - loss: 0.3592 - val_accuracy: 0.8418 - val_loss: 0.4419\n",
"Epoch 55/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8747 - loss: 0.3589 - val_accuracy: 0.8420 - val_loss: 0.4423\n",
"Epoch 56/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8748 - loss: 0.3585 - val_accuracy: 0.8422 - val_loss: 0.4428\n",
"Epoch 57/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 450us/step - accuracy: 0.8747 - loss: 0.3582 - val_accuracy: 0.8424 - val_loss: 0.4432\n",
"Epoch 58/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8749 - loss: 0.3580 - val_accuracy: 0.8426 - val_loss: 0.4436\n",
"Epoch 59/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8751 - loss: 0.3577 - val_accuracy: 0.8424 - val_loss: 0.4441\n",
"Epoch 60/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 414us/step - accuracy: 0.8752 - loss: 0.3574 - val_accuracy: 0.8422 - val_loss: 0.4445\n",
"Epoch 61/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8751 - loss: 0.3571 - val_accuracy: 0.8424 - val_loss: 0.4449\n",
"Epoch 62/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 419us/step - accuracy: 0.8752 - loss: 0.3568 - val_accuracy: 0.8420 - val_loss: 0.4453\n",
"Epoch 63/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 415us/step - accuracy: 0.8752 - loss: 0.3566 - val_accuracy: 0.8420 - val_loss: 0.4458\n",
"Epoch 64/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 424us/step - accuracy: 0.8754 - loss: 0.3563 - val_accuracy: 0.8418 - val_loss: 0.4462\n",
"Epoch 65/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 416us/step - accuracy: 0.8753 - loss: 0.3561 - val_accuracy: 0.8418 - val_loss: 0.4466\n",
"Epoch 66/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 508us/step - accuracy: 0.8754 - loss: 0.3558 - val_accuracy: 0.8416 - val_loss: 0.4470\n",
"Epoch 67/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 444us/step - accuracy: 0.8755 - loss: 0.3556 - val_accuracy: 0.8418 - val_loss: 0.4474\n",
"Epoch 68/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 435us/step - accuracy: 0.8755 - loss: 0.3553 - val_accuracy: 0.8416 - val_loss: 0.4478\n",
"Epoch 69/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 491us/step - accuracy: 0.8756 - loss: 0.3551 - val_accuracy: 0.8410 - val_loss: 0.4482\n",
"Epoch 70/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 435us/step - accuracy: 0.8756 - loss: 0.3549 - val_accuracy: 0.8408 - val_loss: 0.4486\n",
"Epoch 71/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 423us/step - accuracy: 0.8758 - loss: 0.3546 - val_accuracy: 0.8408 - val_loss: 0.4490\n",
"Epoch 72/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 422us/step - accuracy: 0.8758 - loss: 0.3544 - val_accuracy: 0.8408 - val_loss: 0.4494\n",
"Epoch 73/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8758 - loss: 0.3542 - val_accuracy: 0.8410 - val_loss: 0.4498\n",
"Epoch 74/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8759 - loss: 0.3540 - val_accuracy: 0.8410 - val_loss: 0.4502\n",
"Epoch 75/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 426us/step - accuracy: 0.8760 - loss: 0.3538 - val_accuracy: 0.8410 - val_loss: 0.4506\n",
"Epoch 76/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 471us/step - accuracy: 0.8760 - loss: 0.3536 - val_accuracy: 0.8410 - val_loss: 0.4510\n",
"Epoch 77/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 420us/step - accuracy: 0.8760 - loss: 0.3533 - val_accuracy: 0.8412 - val_loss: 0.4514\n",
"Epoch 78/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8761 - loss: 0.3531 - val_accuracy: 0.8412 - val_loss: 0.4518\n",
"Epoch 79/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 457us/step - accuracy: 0.8759 - loss: 0.3529 - val_accuracy: 0.8408 - val_loss: 0.4522\n",
"Epoch 80/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 420us/step - accuracy: 0.8761 - loss: 0.3527 - val_accuracy: 0.8406 - val_loss: 0.4526\n",
"Epoch 81/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8761 - loss: 0.3526 - val_accuracy: 0.8408 - val_loss: 0.4529\n",
"Epoch 82/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 418us/step - accuracy: 0.8763 - loss: 0.3524 - val_accuracy: 0.8404 - val_loss: 0.4533\n",
"Epoch 83/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8763 - loss: 0.3522 - val_accuracy: 0.8402 - val_loss: 0.4537\n",
"Epoch 84/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8763 - loss: 0.3520 - val_accuracy: 0.8400 - val_loss: 0.4541\n",
"Epoch 85/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 464us/step - accuracy: 0.8764 - loss: 0.3518 - val_accuracy: 0.8398 - val_loss: 0.4544\n",
"Epoch 86/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 421us/step - accuracy: 0.8765 - loss: 0.3516 - val_accuracy: 0.8398 - val_loss: 0.4548\n",
"Epoch 87/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 420us/step - accuracy: 0.8764 - loss: 0.3514 - val_accuracy: 0.8396 - val_loss: 0.4552\n",
"Epoch 88/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 450us/step - accuracy: 0.8764 - loss: 0.3513 - val_accuracy: 0.8394 - val_loss: 0.4555\n",
"Epoch 89/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 456us/step - accuracy: 0.8765 - loss: 0.3511 - val_accuracy: 0.8392 - val_loss: 0.4559\n",
"Epoch 90/90\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 419us/step - accuracy: 0.8765 - loss: 0.3509 - val_accuracy: 0.8390 - val_loss: 0.4563\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x17fa78dd0>"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reg_log = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"reg_log.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"reg_log.fit(X_train01, y_train, epochs=90, validation_data=(X_val01, y_val))"
]
},
{
"cell_type": "markdown",
"id": "GR2JJnwv3zUS",
"metadata": {
"id": "GR2JJnwv3zUS"
},
"source": [
"**Optional 2** : Train the model on the un-normalized dataset (use only 30 epochs as it takes time)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "zv-yV-xQyVd8",
"metadata": {
"id": "zv-yV-xQyVd8"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.6586 - loss: 7.7191 - val_accuracy: 0.7086 - val_loss: 0.7708\n",
"Epoch 2/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7257 - loss: 0.7339 - val_accuracy: 0.7668 - val_loss: 0.5891\n",
"Epoch 3/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7667 - loss: 0.6082 - val_accuracy: 0.7880 - val_loss: 0.5681\n",
"Epoch 4/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.8169 - loss: 0.5174 - val_accuracy: 0.7946 - val_loss: 0.5648\n",
"Epoch 5/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 1ms/step - accuracy: 0.8377 - loss: 0.4704 - val_accuracy: 0.8270 - val_loss: 0.5107\n",
"Epoch 6/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8448 - loss: 0.4490 - val_accuracy: 0.8330 - val_loss: 0.4713\n",
"Epoch 7/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8521 - loss: 0.4277 - val_accuracy: 0.8534 - val_loss: 0.4069\n",
"Epoch 8/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8628 - loss: 0.3925 - val_accuracy: 0.8542 - val_loss: 0.4263\n",
"Epoch 9/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8668 - loss: 0.3824 - val_accuracy: 0.8584 - val_loss: 0.4099\n",
"Epoch 10/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8705 - loss: 0.3706 - val_accuracy: 0.8496 - val_loss: 0.4153\n",
"Epoch 11/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8745 - loss: 0.3588 - val_accuracy: 0.8550 - val_loss: 0.4377\n",
"Epoch 12/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8766 - loss: 0.3548 - val_accuracy: 0.8652 - val_loss: 0.4027\n",
"Epoch 13/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8786 - loss: 0.3448 - val_accuracy: 0.8562 - val_loss: 0.4194\n",
"Epoch 14/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8769 - loss: 0.3622 - val_accuracy: 0.8440 - val_loss: 0.4315\n",
"Epoch 15/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8831 - loss: 0.3358 - val_accuracy: 0.8530 - val_loss: 0.4754\n",
"Epoch 16/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8853 - loss: 0.3256 - val_accuracy: 0.8578 - val_loss: 0.4148\n",
"Epoch 17/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8865 - loss: 0.3263 - val_accuracy: 0.8510 - val_loss: 0.4600\n",
"Epoch 18/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8854 - loss: 0.3193 - val_accuracy: 0.8552 - val_loss: 0.4512\n",
"Epoch 19/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8879 - loss: 0.3197 - val_accuracy: 0.8530 - val_loss: 0.4577\n",
"Epoch 20/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8876 - loss: 0.3229 - val_accuracy: 0.8572 - val_loss: 0.4659\n",
"Epoch 21/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8892 - loss: 0.3099 - val_accuracy: 0.8506 - val_loss: 0.4705\n",
"Epoch 22/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8880 - loss: 0.3213 - val_accuracy: 0.8626 - val_loss: 0.4255\n",
"Epoch 23/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8911 - loss: 0.3110 - val_accuracy: 0.8508 - val_loss: 0.4534\n",
"Epoch 24/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8951 - loss: 0.2982 - val_accuracy: 0.8580 - val_loss: 0.4383\n",
"Epoch 25/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8930 - loss: 0.3023 - val_accuracy: 0.8578 - val_loss: 0.4875\n",
"Epoch 26/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8929 - loss: 0.3012 - val_accuracy: 0.8634 - val_loss: 0.4421\n",
"Epoch 27/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8950 - loss: 0.2971 - val_accuracy: 0.8592 - val_loss: 0.4446\n",
"Epoch 28/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8946 - loss: 0.2928 - val_accuracy: 0.8622 - val_loss: 0.4505\n",
"Epoch 29/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8993 - loss: 0.2810 - val_accuracy: 0.8548 - val_loss: 0.5223\n",
"Epoch 30/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8961 - loss: 0.2942 - val_accuracy: 0.8574 - val_loss: 0.4575\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x17f78e8a0>"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_ter = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"model_ter.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"model_ter.fit(X_train, y_train, epochs=30, validation_data=(X_val, y_val))"
]
},
{
"cell_type": "markdown",
"id": "bsTVF2Dy6cGo",
"metadata": {
"id": "bsTVF2Dy6cGo"
},
"source": [
"**Optional 3** : Train the model on the dataset with a different normalization : divide by 25000 so that the inputs are much smaller (use only 30 epochs as it takes time). Compare with the case where you normalize by dividing by 255.0 (you can also try dividing by 2500.0, you should not see such a big difference with the case where we divide by 255.0. Remember that we have a shallow network here)."
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "1-KComPFy1wS",
"metadata": {
"id": "1-KComPFy1wS"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.6645 - loss: 0.9781 - val_accuracy: 0.8202 - val_loss: 0.4931\n",
"Epoch 2/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8294 - loss: 0.4769 - val_accuracy: 0.8336 - val_loss: 0.4414\n",
"Epoch 3/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8493 - loss: 0.4246 - val_accuracy: 0.8432 - val_loss: 0.4204\n",
"Epoch 4/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8587 - loss: 0.3927 - val_accuracy: 0.8468 - val_loss: 0.4029\n",
"Epoch 5/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8652 - loss: 0.3678 - val_accuracy: 0.8494 - val_loss: 0.3902\n",
"Epoch 6/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8718 - loss: 0.3475 - val_accuracy: 0.8546 - val_loss: 0.3777\n",
"Epoch 7/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8785 - loss: 0.3303 - val_accuracy: 0.8578 - val_loss: 0.3656\n",
"Epoch 8/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8828 - loss: 0.3153 - val_accuracy: 0.8598 - val_loss: 0.3589\n",
"Epoch 9/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8879 - loss: 0.3020 - val_accuracy: 0.8626 - val_loss: 0.3541\n",
"Epoch 10/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8918 - loss: 0.2908 - val_accuracy: 0.8648 - val_loss: 0.3486\n",
"Epoch 11/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8953 - loss: 0.2805 - val_accuracy: 0.8674 - val_loss: 0.3464\n",
"Epoch 12/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8993 - loss: 0.2703 - val_accuracy: 0.8688 - val_loss: 0.3441\n",
"Epoch 13/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9020 - loss: 0.2616 - val_accuracy: 0.8708 - val_loss: 0.3416\n",
"Epoch 14/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9050 - loss: 0.2533 - val_accuracy: 0.8732 - val_loss: 0.3379\n",
"Epoch 15/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9088 - loss: 0.2455 - val_accuracy: 0.8752 - val_loss: 0.3380\n",
"Epoch 16/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9120 - loss: 0.2377 - val_accuracy: 0.8758 - val_loss: 0.3348\n",
"Epoch 17/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9158 - loss: 0.2304 - val_accuracy: 0.8762 - val_loss: 0.3331\n",
"Epoch 18/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9181 - loss: 0.2232 - val_accuracy: 0.8756 - val_loss: 0.3341\n",
"Epoch 19/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9216 - loss: 0.2164 - val_accuracy: 0.8782 - val_loss: 0.3340\n",
"Epoch 20/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9231 - loss: 0.2097 - val_accuracy: 0.8802 - val_loss: 0.3352\n",
"Epoch 21/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9264 - loss: 0.2034 - val_accuracy: 0.8824 - val_loss: 0.3378\n",
"Epoch 22/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9279 - loss: 0.1971 - val_accuracy: 0.8824 - val_loss: 0.3412\n",
"Epoch 23/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9308 - loss: 0.1910 - val_accuracy: 0.8812 - val_loss: 0.3456\n",
"Epoch 24/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9327 - loss: 0.1850 - val_accuracy: 0.8844 - val_loss: 0.3487\n",
"Epoch 25/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9347 - loss: 0.1800 - val_accuracy: 0.8840 - val_loss: 0.3529\n",
"Epoch 26/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9370 - loss: 0.1740 - val_accuracy: 0.8836 - val_loss: 0.3559\n",
"Epoch 27/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9393 - loss: 0.1683 - val_accuracy: 0.8836 - val_loss: 0.3620\n",
"Epoch 28/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9410 - loss: 0.1630 - val_accuracy: 0.8830 - val_loss: 0.3677\n",
"Epoch 29/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9425 - loss: 0.1580 - val_accuracy: 0.8828 - val_loss: 0.3725\n",
"Epoch 30/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9439 - loss: 0.1531 - val_accuracy: 0.8850 - val_loss: 0.3794\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x17f68e870>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_5 = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(300, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(100, activation=\"relu\", kernel_initializer=\"he_normal\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"model_5.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"\n",
"X_train_far_too_small, X_val_far_too_small = X_train / 25500.0, X_val / 25500.0\n",
"\n",
"model_5.fit(\n",
" X_train_far_too_small,\n",
" y_train,\n",
" epochs=30,\n",
" validation_data=(X_val_far_too_small, y_val),\n",
")"
]
},
{
"cell_type": "markdown",
"id": "kH6PsYnL9Tmz",
"metadata": {
"id": "kH6PsYnL9Tmz"
},
"source": [
"**Optional 4** : try using the sigmoid activation for the hidden layers instead of the ReLU. First train the model on normalized data then on un-normalized data."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "dShxjtDny8HD",
"metadata": {
"id": "dShxjtDny8HD"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7407 - loss: 0.7888 - val_accuracy: 0.8360 - val_loss: 0.4605\n",
"Epoch 2/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8553 - loss: 0.3999 - val_accuracy: 0.8386 - val_loss: 0.4300\n",
"Epoch 3/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8715 - loss: 0.3539 - val_accuracy: 0.8498 - val_loss: 0.4042\n",
"Epoch 4/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8813 - loss: 0.3239 - val_accuracy: 0.8580 - val_loss: 0.3892\n",
"Epoch 5/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8894 - loss: 0.3013 - val_accuracy: 0.8626 - val_loss: 0.3787\n",
"Epoch 6/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8959 - loss: 0.2828 - val_accuracy: 0.8678 - val_loss: 0.3787\n",
"Epoch 7/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9021 - loss: 0.2669 - val_accuracy: 0.8678 - val_loss: 0.3790\n",
"Epoch 8/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9069 - loss: 0.2533 - val_accuracy: 0.8710 - val_loss: 0.3780\n",
"Epoch 9/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9120 - loss: 0.2399 - val_accuracy: 0.8704 - val_loss: 0.3782\n",
"Epoch 10/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9176 - loss: 0.2273 - val_accuracy: 0.8710 - val_loss: 0.3808\n",
"Epoch 11/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9214 - loss: 0.2158 - val_accuracy: 0.8726 - val_loss: 0.3805\n",
"Epoch 12/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9257 - loss: 0.2049 - val_accuracy: 0.8694 - val_loss: 0.3859\n",
"Epoch 13/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9304 - loss: 0.1945 - val_accuracy: 0.8720 - val_loss: 0.3934\n",
"Epoch 14/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9336 - loss: 0.1846 - val_accuracy: 0.8716 - val_loss: 0.4023\n",
"Epoch 15/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9372 - loss: 0.1754 - val_accuracy: 0.8676 - val_loss: 0.4147\n",
"Epoch 16/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9404 - loss: 0.1666 - val_accuracy: 0.8676 - val_loss: 0.4186\n",
"Epoch 17/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9441 - loss: 0.1582 - val_accuracy: 0.8696 - val_loss: 0.4338\n",
"Epoch 18/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9460 - loss: 0.1509 - val_accuracy: 0.8670 - val_loss: 0.4443\n",
"Epoch 19/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9500 - loss: 0.1429 - val_accuracy: 0.8706 - val_loss: 0.4454\n",
"Epoch 20/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9526 - loss: 0.1362 - val_accuracy: 0.8686 - val_loss: 0.4618\n",
"Epoch 21/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9554 - loss: 0.1292 - val_accuracy: 0.8656 - val_loss: 0.4863\n",
"Epoch 22/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9577 - loss: 0.1220 - val_accuracy: 0.8662 - val_loss: 0.4932\n",
"Epoch 23/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m4s\u001b[0m 2ms/step - accuracy: 0.9597 - loss: 0.1170 - val_accuracy: 0.8658 - val_loss: 0.5143\n",
"Epoch 24/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9599 - loss: 0.1135 - val_accuracy: 0.8648 - val_loss: 0.5415\n",
"Epoch 25/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9629 - loss: 0.1068 - val_accuracy: 0.8620 - val_loss: 0.5629\n",
"Epoch 26/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9644 - loss: 0.1021 - val_accuracy: 0.8590 - val_loss: 0.6076\n",
"Epoch 27/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9667 - loss: 0.0971 - val_accuracy: 0.8640 - val_loss: 0.6027\n",
"Epoch 28/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9673 - loss: 0.0953 - val_accuracy: 0.8594 - val_loss: 0.6309\n",
"Epoch 29/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9688 - loss: 0.0910 - val_accuracy: 0.8630 - val_loss: 0.6526\n",
"Epoch 30/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9710 - loss: 0.0864 - val_accuracy: 0.8640 - val_loss: 0.6396\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x31e048dd0>"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# sigmoid activation, normalized data (scale : [0,1])\n",
"model_sig_norm = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(\n",
" 300, activation=\"sigmoid\", kernel_initializer=\"he_normal\"\n",
" ),\n",
" tf.keras.layers.Dense(\n",
" 100, activation=\"sigmoid\", kernel_initializer=\"he_normal\"\n",
" ),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"model_sig_norm.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"model_sig_norm.fit(X_train01, y_train, epochs=30, validation_data=(X_val, y_val))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "1O32YLVuy8k3",
"metadata": {
"id": "1O32YLVuy8k3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.6551 - loss: 1.0137 - val_accuracy: 0.6920 - val_loss: 0.7548\n",
"Epoch 2/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7053 - loss: 0.7489 - val_accuracy: 0.7290 - val_loss: 0.7050\n",
"Epoch 3/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7162 - loss: 0.7261 - val_accuracy: 0.7220 - val_loss: 0.7043\n",
"Epoch 4/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7244 - loss: 0.6950 - val_accuracy: 0.7410 - val_loss: 0.6906\n",
"Epoch 5/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7329 - loss: 0.6854 - val_accuracy: 0.7522 - val_loss: 0.6677\n",
"Epoch 6/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7434 - loss: 0.6751 - val_accuracy: 0.7538 - val_loss: 0.6704\n",
"Epoch 7/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7419 - loss: 0.6836 - val_accuracy: 0.7470 - val_loss: 0.6619\n",
"Epoch 8/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7357 - loss: 0.6719 - val_accuracy: 0.7456 - val_loss: 0.6474\n",
"Epoch 9/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7386 - loss: 0.6690 - val_accuracy: 0.7386 - val_loss: 0.6632\n",
"Epoch 10/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7397 - loss: 0.6665 - val_accuracy: 0.7442 - val_loss: 0.6630\n",
"Epoch 11/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7473 - loss: 0.6595 - val_accuracy: 0.7466 - val_loss: 0.6551\n",
"Epoch 12/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7376 - loss: 0.6605 - val_accuracy: 0.7662 - val_loss: 0.6222\n",
"Epoch 13/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7563 - loss: 0.6315 - val_accuracy: 0.7732 - val_loss: 0.5941\n",
"Epoch 14/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7706 - loss: 0.6042 - val_accuracy: 0.7624 - val_loss: 0.6283\n",
"Epoch 15/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7615 - loss: 0.6226 - val_accuracy: 0.7598 - val_loss: 0.6130\n",
"Epoch 16/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7657 - loss: 0.6080 - val_accuracy: 0.7798 - val_loss: 0.5883\n",
"Epoch 17/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7628 - loss: 0.6154 - val_accuracy: 0.7702 - val_loss: 0.6045\n",
"Epoch 18/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7724 - loss: 0.5999 - val_accuracy: 0.7810 - val_loss: 0.5828\n",
"Epoch 19/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7839 - loss: 0.5758 - val_accuracy: 0.7930 - val_loss: 0.5618\n",
"Epoch 20/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7901 - loss: 0.5718 - val_accuracy: 0.7860 - val_loss: 0.5895\n",
"Epoch 21/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7845 - loss: 0.5713 - val_accuracy: 0.7808 - val_loss: 0.5898\n",
"Epoch 22/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7779 - loss: 0.5845 - val_accuracy: 0.7874 - val_loss: 0.5695\n",
"Epoch 23/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7872 - loss: 0.5686 - val_accuracy: 0.7676 - val_loss: 0.5934\n",
"Epoch 24/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7815 - loss: 0.5739 - val_accuracy: 0.7920 - val_loss: 0.5528\n",
"Epoch 25/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7892 - loss: 0.5618 - val_accuracy: 0.7928 - val_loss: 0.5675\n",
"Epoch 26/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7882 - loss: 0.5590 - val_accuracy: 0.7988 - val_loss: 0.5464\n",
"Epoch 27/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7866 - loss: 0.5598 - val_accuracy: 0.7764 - val_loss: 0.5784\n",
"Epoch 28/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7838 - loss: 0.5673 - val_accuracy: 0.7848 - val_loss: 0.5651\n",
"Epoch 29/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7886 - loss: 0.5563 - val_accuracy: 0.8008 - val_loss: 0.5436\n",
"Epoch 30/30\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7971 - loss: 0.5387 - val_accuracy: 0.8010 - val_loss: 0.5349\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x361d65190>"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_sig_un_norm = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(\n",
" 300, activation=\"sigmoid\", kernel_initializer=\"he_normal\"\n",
" ),\n",
" tf.keras.layers.Dense(\n",
" 100, activation=\"sigmoid\", kernel_initializer=\"he_normal\"\n",
" ),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"model_sig_un_norm.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"model_sig_un_norm.fit(X_train, y_train, epochs=30, validation_data=(X_val, y_val))"
]
},
{
"cell_type": "markdown",
"id": "z9Tmg3PRUXCe",
"metadata": {
"id": "z9Tmg3PRUXCe"
},
"source": [
"**Optional 5** Use the ReLU again and try initializing the weights with normally distributed and independent weigths but with a high variance."
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "p5nkk6t8zIzz",
"metadata": {
"id": "p5nkk6t8zIzz"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.6039 - loss: 86.8092 - val_accuracy: 0.7452 - val_loss: 19.3635\n",
"Epoch 2/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7728 - loss: 14.7505 - val_accuracy: 0.7648 - val_loss: 8.7317\n",
"Epoch 3/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7867 - loss: 6.8569 - val_accuracy: 0.7916 - val_loss: 3.7573\n",
"Epoch 4/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7893 - loss: 3.0035 - val_accuracy: 0.7756 - val_loss: 1.6859\n",
"Epoch 5/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7805 - loss: 1.3666 - val_accuracy: 0.7798 - val_loss: 1.0656\n",
"Epoch 6/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7814 - loss: 0.9145 - val_accuracy: 0.7768 - val_loss: 0.9132\n",
"Epoch 7/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7917 - loss: 0.7225 - val_accuracy: 0.7832 - val_loss: 0.8141\n",
"Epoch 8/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.7993 - loss: 0.6555 - val_accuracy: 0.7934 - val_loss: 0.7410\n",
"Epoch 9/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8125 - loss: 0.5778 - val_accuracy: 0.7986 - val_loss: 0.6734\n",
"Epoch 10/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8231 - loss: 0.5296 - val_accuracy: 0.8092 - val_loss: 0.6434\n",
"Epoch 11/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8347 - loss: 0.4960 - val_accuracy: 0.8220 - val_loss: 0.6397\n",
"Epoch 12/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8415 - loss: 0.4696 - val_accuracy: 0.8220 - val_loss: 0.6570\n",
"Epoch 13/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8482 - loss: 0.4438 - val_accuracy: 0.8270 - val_loss: 0.6685\n",
"Epoch 14/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8525 - loss: 0.4270 - val_accuracy: 0.8238 - val_loss: 0.6539\n",
"Epoch 15/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8581 - loss: 0.4113 - val_accuracy: 0.8290 - val_loss: 0.6831\n",
"Epoch 16/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8640 - loss: 0.3913 - val_accuracy: 0.8298 - val_loss: 0.7134\n",
"Epoch 17/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8691 - loss: 0.3778 - val_accuracy: 0.8262 - val_loss: 0.7223\n",
"Epoch 18/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8705 - loss: 0.3728 - val_accuracy: 0.8280 - val_loss: 0.7357\n",
"Epoch 19/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8743 - loss: 0.3598 - val_accuracy: 0.8288 - val_loss: 0.7622\n",
"Epoch 20/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8787 - loss: 0.3488 - val_accuracy: 0.8250 - val_loss: 0.7780\n",
"Epoch 21/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8801 - loss: 0.3455 - val_accuracy: 0.8354 - val_loss: 0.7774\n",
"Epoch 22/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8834 - loss: 0.3304 - val_accuracy: 0.8282 - val_loss: 0.7972\n",
"Epoch 23/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8846 - loss: 0.3250 - val_accuracy: 0.8316 - val_loss: 0.8399\n",
"Epoch 24/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8864 - loss: 0.3191 - val_accuracy: 0.8364 - val_loss: 0.8428\n",
"Epoch 25/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8881 - loss: 0.3186 - val_accuracy: 0.8266 - val_loss: 0.9485\n",
"Epoch 26/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8901 - loss: 0.3106 - val_accuracy: 0.8376 - val_loss: 0.9145\n",
"Epoch 27/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8923 - loss: 0.3062 - val_accuracy: 0.8310 - val_loss: 0.9605\n",
"Epoch 28/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8931 - loss: 0.3016 - val_accuracy: 0.8420 - val_loss: 0.9259\n",
"Epoch 29/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8948 - loss: 0.2966 - val_accuracy: 0.8424 - val_loss: 0.9897\n",
"Epoch 30/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8977 - loss: 0.2851 - val_accuracy: 0.8406 - val_loss: 1.0251\n",
"Epoch 31/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8985 - loss: 0.2840 - val_accuracy: 0.8340 - val_loss: 1.0227\n",
"Epoch 32/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.8979 - loss: 0.2931 - val_accuracy: 0.8336 - val_loss: 1.0029\n",
"Epoch 33/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9007 - loss: 0.2807 - val_accuracy: 0.8366 - val_loss: 1.0235\n",
"Epoch 34/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9013 - loss: 0.2741 - val_accuracy: 0.8410 - val_loss: 0.9453\n",
"Epoch 35/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9049 - loss: 0.2732 - val_accuracy: 0.8376 - val_loss: 1.0164\n",
"Epoch 36/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9045 - loss: 0.2665 - val_accuracy: 0.8408 - val_loss: 1.0273\n",
"Epoch 37/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9060 - loss: 0.2627 - val_accuracy: 0.8380 - val_loss: 1.0743\n",
"Epoch 38/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9054 - loss: 0.2668 - val_accuracy: 0.8386 - val_loss: 1.0879\n",
"Epoch 39/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9070 - loss: 0.2561 - val_accuracy: 0.8424 - val_loss: 1.0748\n",
"Epoch 40/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9097 - loss: 0.2540 - val_accuracy: 0.8308 - val_loss: 1.1934\n",
"Epoch 41/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9112 - loss: 0.2570 - val_accuracy: 0.8412 - val_loss: 1.0743\n",
"Epoch 42/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9100 - loss: 0.2514 - val_accuracy: 0.8400 - val_loss: 1.1461\n",
"Epoch 43/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9093 - loss: 0.2524 - val_accuracy: 0.8370 - val_loss: 1.1910\n",
"Epoch 44/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9118 - loss: 0.2487 - val_accuracy: 0.8394 - val_loss: 1.1850\n",
"Epoch 45/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9132 - loss: 0.2391 - val_accuracy: 0.8398 - val_loss: 1.2205\n",
"Epoch 46/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9142 - loss: 0.2433 - val_accuracy: 0.8356 - val_loss: 1.3211\n",
"Epoch 47/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9157 - loss: 0.2344 - val_accuracy: 0.8372 - val_loss: 1.2488\n",
"Epoch 48/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9160 - loss: 0.2348 - val_accuracy: 0.8420 - val_loss: 1.2971\n",
"Epoch 49/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9162 - loss: 0.2360 - val_accuracy: 0.8468 - val_loss: 1.2902\n",
"Epoch 50/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9179 - loss: 0.2328 - val_accuracy: 0.8336 - val_loss: 1.2355\n",
"Epoch 51/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9202 - loss: 0.2245 - val_accuracy: 0.8302 - val_loss: 1.4387\n",
"Epoch 52/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9182 - loss: 0.2329 - val_accuracy: 0.8346 - val_loss: 1.3602\n",
"Epoch 53/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9196 - loss: 0.2244 - val_accuracy: 0.8370 - val_loss: 1.2779\n",
"Epoch 54/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9206 - loss: 0.2242 - val_accuracy: 0.8444 - val_loss: 1.3331\n",
"Epoch 55/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9234 - loss: 0.2143 - val_accuracy: 0.8542 - val_loss: 1.2873\n",
"Epoch 56/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9205 - loss: 0.2204 - val_accuracy: 0.8406 - val_loss: 1.3728\n",
"Epoch 57/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9207 - loss: 0.2259 - val_accuracy: 0.8414 - val_loss: 1.3721\n",
"Epoch 58/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9237 - loss: 0.2151 - val_accuracy: 0.8474 - val_loss: 1.3719\n",
"Epoch 59/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9228 - loss: 0.2189 - val_accuracy: 0.8414 - val_loss: 1.3527\n",
"Epoch 60/60\n",
"\u001b[1m1719/1719\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m3s\u001b[0m 2ms/step - accuracy: 0.9254 - loss: 0.2101 - val_accuracy: 0.8394 - val_loss: 1.4407\n"
]
},
{
"data": {
"text/plain": [
"<keras.src.callbacks.history.History at 0x36270fad0>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"model_high_variance = tf.keras.Sequential(\n",
" [\n",
" tf.keras.layers.Input(shape=[28, 28]),\n",
" tf.keras.layers.Flatten(),\n",
" tf.keras.layers.Dense(300, activation=\"relu\"),\n",
" tf.keras.layers.Dense(100, activation=\"relu\"),\n",
" tf.keras.layers.Dense(10, activation=\"softmax\"),\n",
" ]\n",
")\n",
"model_high_variance.layers[1].set_weights(\n",
" [200 * np.random.randn(28 * 28, 300) / 100, np.zeros(300)]\n",
")\n",
"model_high_variance.layers[2].set_weights(\n",
" [200 * np.random.randn(300, 100) / 100, np.zeros(100)]\n",
")\n",
"\n",
"model_high_variance.compile(\n",
" loss=\"sparse_categorical_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"]\n",
")\n",
"\n",
"model_high_variance.fit(X_train01, y_train, epochs=60, validation_data=(X_val01, y_val))"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "studies",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}