Recently, I discovered this neat little algorithm called “self-organizing maps” that can be used to create a low-dimensional “map” (as in cartography) of high-dimensional data.

The algorithm is very simple. Say you have a set of high-dimensional vectors and you want to represent them in an image, such that each vector is associated with a pixel of that image, and the similarity between vectors should correspond to the distance between pixels.

As a first step, we associate a random vector with each pixel in our map. Then we go through the input vectors and compute the “best matching unit” (BMU), which is the vector in our map that is closest to the input.

Now we update the BMU, and vectors that are close to it on the map, to be more like the input vector. How strongly we shift the vectors may depend on the distance from the BMU, and with each new input, the sphere of influence should decrease.

For example, we can take the RGB vectors of 1000 random colors as inputs, and create a 2D color map:

FIGURE 1: Self-organized color map.

Here is a simple (non-optimized) python source code that generated the image above:

import numpy as np
class SelfOrganizingMap(object):
def __init__(self, map_dims, input_dim):
self.map_dims = map_dims
self.input_dim = input_dim
self.weights = np.random.uniform(size=(map_dims + [input_dim]), low=0.0, high=1.0)
def __call__(self, input_batch, training=False):
# Compute how similar each vector in the map is to the input vectors
activations = np.array([np.sum(np.square(self.weights - b), axis=-1) for b in input_batch])
if training:
num_batch = np.shape(input_batch)[0]
num_indices = np.product(self.map_dims)
# Loop through all vectors on the input batch
for b in range(num_batch):
# Find which point on the map has the vector that is closest to the input
best_matching_unit = np.argmax(activations[b], axis=None)
x0 = np.unravel_index(best_matching_unit, self.map_dims)
# Determine a radius of influence
r = 5.0 * np.exp(-1.0 * b / num_batch)
# Update all vectors associated with the points in the vicinity of the best-matching-unit
for i in range(num_indices):
# Compute the squared Euclidean distance between points on the map
x = np.unravel_index(i, self.map_dims)
dist2 = np.sum(np.square(np.array(list(x)) - np.array(list(x0))))
# Set the update strength to be a Gaussian, centered at the best matching unit
w = 1.0 * np.exp(-dist2 / (2 * r**2))
# Update weights
self.weights[x] = (1.0 - w) * self.weights[x] + w * input_batch[b]
return activations
if __name__ == '__main__':
# Example: Self-organizing map for 1000 colors (reducing 3 to 2 dimensions)
import matplotlib.pyplot as plt
som = SelfOrganizingMap([12, 12], 3)
colors = np.random.uniform(0.0, 1.0, size=[1000, 3])
som(colors, training=True)
plt.imshow(som.weights)
plt.show()
view raw simple_som.py hosted with ❤ by GitHub