# Color Quantization using K-Means
In this notebook, we want to transform a regular [RGB image](https://en.wikipedia.org/wiki/RGB_color_model#Numeric_representations) (where each pixel is represented as a Red-Green-Blue triplet) into a [compressed representation](https://en.wikipedia.org/wiki/Color_quantization), where each pixel is represented as a single number (color index) together with a limited color palette (RGB triplets corresponding to the color indices). 

Example from Wikipedia (original image and after quantization):
"" ""

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image # library for loading image files
from sklearn.cluster import KMeans
from sklearn.utils import shuffle

In [None]:
# load the original image -> change the path to an image of your choice
img_org = Image.open("../data/cat.jpg")
img_org

In [None]:
# transform the image into a numpy array
img_array = np.asarray(img_org)
print(img_array.shape) # height x width x 3 (RGB channels)

In [None]:
# reshape image into a matrix with RGB values for each pixel
h, w, d = img_array.shape
X = new_X.reshape(h*w, d)
# 1 pixel = 1 data point; RGB values = features
print(X.shape)

In [None]:
# to speed things up a little, we only take a random subsample of the original pixels
X_sample = shuffle(X, random_state=0)[:1000]
# initialize k-means and set n_clusters to the number of colors you want in your image (e.g. 10)
kmeans = ...
# fit the model on the data (i.e., find the cluster indices)
kmeans.fit(X_sample)

In [None]:
# the cluster centers now contain the RGB triplets for each cluster, i.e., our new color palette
kmeans.cluster_centers_

In [None]:
# use the predict function of kmeans to compute the cluster index for each data point (i.e., pixel) in X
# (cluster indices together with the color palette would be the compressed representation of the image)
cluster_idx = ...
print(cluster_idx.shape) # same first dimension as X

In [None]:
# to visualize what the compressed image looks like, map each pixel to the corresponding new color
new_X = kmeans.cluster_centers_[cluster_idx]
print(new_X.shape)

In [None]:
# cast as integers to get proper RGB values
new_X = np.array(new_X, dtype=np.uint8)

In [None]:
# reshape back into image format
img_new = ... # TODO: reshape new_X such that img_new is a matrix of shape height x width x 3 RGB channels
print(img_new.shape)

In [None]:
# transform into PIL image (and possibly save)
img_new = Image.fromarray(img_new)
# img_new.save("cat_new.png") # -> save & share your image with the other participants
img_new

### Heuristic to determine the number of clusters *k*

The objective that k-means internally optimizes is the average distance of the samples to their assigned cluster centers, i.e., it tries to find clusters such that all the points in the cluster are very close to the respective cluster center.

After fitting k-means, the final value of this objective function can be computed with the `score` function on the dataset (this actually gives you the negative value, since this is more convenient for some optimization algorithms).

We can now simply fit k-means with different settings for *k* and observe how the value of the score function changes as we increase the number of clusters.

#### Questions: 
* What would happen (i.e., what would the score be) if you set *k* to a very large value, e.g., the number of data points? 
* Based on the plot that we compute below, what do you think might be a good value for *k*? (Of course, this will be different for every dataset, i.e., in this example, a different image might need more or less colors to look ok.)

In [None]:
# how many clusters (i.e. distinct colors) are needed?
scores = []
for n in range(1, 16):
 # compute the value of the k-means objective function for the current k
 kmeans = KMeans(n_clusters=n, random_state=0).fit(X_sample)
 scores.append(kmeans.score(X_sample))
# check out how much the score improves as we use more clusters
plt.figure()
plt.plot(range(1, 16), scores)
plt.xlabel("number of clusters")
plt.ylabel("score");