mirror of
https://github.com/ArthurDanjou/ml_exercises.git
synced 2026-01-14 12:14:38 +01:00
further explanations in the notebooks
This commit is contained in:
@@ -5,7 +5,10 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Color Quantization using K-Means\n",
|
||||
"In this notebook, we want to transform a regular RGB image (where each pixel is represented as a Red-Green-Blue triplet) into a [compressed representation](https://en.wikipedia.org/wiki/Color_quantization), where each pixel is represented as a single number (color index) together with a limited color palette (RGB triplets corresponding to the color indices). "
|
||||
"In this notebook, we want to transform a regular RGB image (where each pixel is represented as a Red-Green-Blue triplet) into a [compressed representation](https://en.wikipedia.org/wiki/Color_quantization), where each pixel is represented as a single number (color index) together with a limited color palette (RGB triplets corresponding to the color indices). \n",
|
||||
"\n",
|
||||
"Example from Wikipedia (original image and after quantization):\n",
|
||||
"<img src=\"https://upload.wikimedia.org/wikipedia/commons/e/e3/Dithering_example_undithered.png\" alt=\"\" /> <img src=\"https://upload.wikimedia.org/wikipedia/en/4/48/Dithering_example_undithered_16color_palette.png\" alt=\"\" />"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -18,10 +21,7 @@
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from PIL import Image # library for loading image files\n",
|
||||
"from sklearn.cluster import KMeans\n",
|
||||
"from sklearn.utils import shuffle\n",
|
||||
"\n",
|
||||
"%load_ext autoreload\n",
|
||||
"%autoreload 2"
|
||||
"from sklearn.utils import shuffle"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -68,7 +68,7 @@
|
||||
"X_sample = shuffle(X, random_state=0)[:1000]\n",
|
||||
"# initialize k-means and set n_clusters to the number of colors you want in your image (e.g. 10)\n",
|
||||
"kmeans = ...\n",
|
||||
"# fit the model on the data (i.e. find the cluster indices)\n",
|
||||
"# fit the model on the data (i.e., find the cluster indices)\n",
|
||||
"kmeans.fit(X_sample)"
|
||||
]
|
||||
},
|
||||
@@ -88,7 +88,7 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# use the predict function of kmeans to compute the cluster index for each data point (i.e. pixel) in X\n",
|
||||
"# use the predict function of kmeans to compute the cluster index for each data point (i.e., pixel) in X\n",
|
||||
"# (cluster indices together with the color palette would be the compressed representation of the image)\n",
|
||||
"cluster_idx = ...\n",
|
||||
"print(cluster_idx.shape) # same first dimension as X"
|
||||
@@ -142,17 +142,17 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Heuristic to determine the number of clusters _k_\n",
|
||||
"### Heuristic to determine the number of clusters *k*\n",
|
||||
"\n",
|
||||
"The objective that k-means internally optimizes is the average distance of the samples to their assigned cluster centers, i.e., it tries to find clusters such that all the points in the cluster are very close to the respective cluster center.\n",
|
||||
"\n",
|
||||
"After fitting k-means, the final value of this objective function can be computed with the `score` function on the dataset (this actually gives you the negative value, since this is more convenient for the some optimization algorithms).\n",
|
||||
"\n",
|
||||
"We can now simply fit k-means with different settings for _k_ and observe how the value of the score function changes as we increase the number of clusters.\n",
|
||||
"We can now simply fit k-means with different settings for *k* and observe how the value of the score function changes as we increase the number of clusters.\n",
|
||||
"\n",
|
||||
"#### Questions: \n",
|
||||
"* What would happen (i.e. what would the score be) if you set _k_ to a very large value, e.g., the number of data points? \n",
|
||||
"* Based on the plot that we compute below, what do you think might be a good value for _k_? (Of course, this will be different for every dataset, i.e., in this example, a different image might need more or less colors to look ok.)"
|
||||
"* What would happen (i.e., what would the score be) if you set *k* to a very large value, e.g., the number of data points? \n",
|
||||
"* Based on the plot that we compute below, what do you think might be a good value for *k*? (Of course, this will be different for every dataset, i.e., in this example, a different image might need more or less colors to look ok.)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user