diff --git a/knn.ipynb b/knn.ipynb
index 6ef404f..d4ee2e1 100644
--- a/knn.ipynb
+++ b/knn.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 1,
    "id": "4e6f6cb1",
    "metadata": {},
    "outputs": [],
@@ -20,7 +20,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 2,
    "id": "4dd5223b",
    "metadata": {},
    "outputs": [],
@@ -31,7 +31,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 3,
    "id": "c1ab7ec9",
    "metadata": {},
    "outputs": [
@@ -86,7 +86,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 4,
    "id": "754dce9b",
    "metadata": {},
    "outputs": [
@@ -95,13 +95,15 @@
      "output_type": "stream",
      "text": [
       "Number of samples: 116\n",
-      "Number of features: 9\n"
+      "Number of features: 9\n",
+      "Number of classes: 2\n"
      ]
     }
    ],
    "source": [
     "print(\"Number of samples:\", X.shape[0])\n",
-    "print(\"Number of features:\", X.shape[1])"
+    "print(\"Number of features:\", X.shape[1])\n",
+    "print(\"Number of classes:\", len(np.unique(y)))"
    ]
   },
   {
@@ -122,7 +124,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 5,
    "id": "b2e03ac1",
    "metadata": {},
    "outputs": [
@@ -150,6 +152,16 @@
     "print(\"The best k for k-NN is k =\", k_optimal)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "698d03a8",
+   "metadata": {},
+   "source": [
+    "In k-NN classification, to achieve the best prediction performance, we need to find the optimal number of neighbors that maximizes the evaluation score of our models. Here, we use the $f1\\_score$ from sklearn.metrics, as it provides a good balance between precision (e.g., correctly predicting a sick patient as sick, or a healthy patient as healthy) and recall (e.g., correctly identifying sick patients among all those predicted as sick).\n",
+    "\n",
+    "To determine this hyperparameter, we use 5-fold cross-validation. We chose 5 folds instead of 10 due to the limited amount of data, as this provides a better balance between the sizes of the training and validation sets. After cross-validation, it turns out that the optimal number of neighbors, $k$, is $k = 23$."
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "9f74eaee",
@@ -168,28 +180,19 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 6,
    "id": "70281897",
    "metadata": {},
    "outputs": [],
    "source": [
     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # train/test split with 70% training and 30% testing\n",
     "\n",
-    "\n",
     "# Feature scaling\n",
     "scaler = StandardScaler()\n",
     "\n",
-    "X_train_scaled = pd.DataFrame(\n",
-    "    scaler.fit_transform(X_train),\n",
-    "    columns=X_train.columns,\n",
-    "    index=X_train.index\n",
-    ")\n",
+    "X_train_scaled = scaler.fit_transform(X_train)\n",
     "\n",
-    "X_test_scaled = pd.DataFrame(\n",
-    "    scaler.transform(X_test),\n",
-    "    columns=X_test.columns,\n",
-    "    index=X_test.index\n",
-    ")"
+    "X_test_scaled = scaler.transform(X_test)"
    ]
   },
   {
@@ -202,7 +205,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 7,
    "id": "064a5aa7",
    "metadata": {},
    "outputs": [
@@ -247,7 +250,6 @@
     }
    ],
    "source": [
-    "\n",
     "knn = KNeighborsClassifier(n_neighbors=k_optimal) # using the best k founded earlier\n",
     "knn.fit(X_train_scaled, y_train)\n",
     "\n",
@@ -295,12 +297,12 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c158b385",
+   "cell_type": "markdown",
+   "id": "9bf7ed62",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "In this optimized k-NN classification, we aim to maximize recall while maintaining good accuracy, in order to minimize the number of misclassifications, particularly cases where a sick patient is incorrectly predicted as healthy. We achieve this goal with a recall of 89% and an accuracy of 80%."
+   ]
   }
  ],
  "metadata": {