Kohonen Network

From Scholarpedia
Teuvo Kohonen and Timo Honkela (2007), Scholarpedia, 2(1):1568. doi:10.4249/scholarpedia.1568 revision #127841 [link to/cite this article]
(Redirected from Self-organizing map)
Jump to: navigation, search
Post-publication activity

Curator: Teuvo Kohonen

Figure 1: The array of nodes in a two-dimensional SOM grid.

The Self-Organizing Map (SOM), commonly also known as Kohonen network (Kohonen 1982, Kohonen 2001) is a computational method for the visualization and analysis of high-dimensional data, especially experimentally acquired information.

Contents

Introduction

The Self-Organizing Map defines an ordered mapping, a kind of projection from a set of given data items onto a regular, usually two-dimensional grid. A model \(m_i\) is associated with each grid node (Figure 1). These models are computed by the SOM algorithm. A data item will be mapped into the node whose model is most similar to the data item, i.e., has the smallest distance from the data item in some metric.

Like a codebook vector in vector quantization, the model is then usually a certain weighted local average of the given data items in the data space. But in addition to that, when the models are computed by the SOM algorithm, they are more similar at the nearby nodes than between nodes located farther away from each other on the grid. In this way the set of the models can be regarded to constitute a similarity graph and structured 'skeleton' of the distribution of the given data items.

The SOM was originally developed for the visualization of distributions of metric vectors, such as ordered sets of measurement values or statistical attributes, but it can be shown that a SOM-type mapping can be defined for any data items, the mutual pairwise distances of which can be defined. Examples of non-vectorial data that are amenable to this method are strings of symbols and sequences of segments in organic molecules (Kohonen and Somervuo 2002).

History

The SOM algorithm grew out of early neural network models, especially models of associative memory and adaptive learning (cf. Kohonen 1984). A new incentive was to explain the spatial organization of the brain's functions, as observed especially in the cerebral cortex. Nonetheless the SOM was not the first step in that direction: one has to mention at least the spatially ordered line detectors of von der Malsburg (1973) and the neural field model of Amari (1980). However, the self-organizing power of these early models was rather weak. The crucial invention of Kohonen was to introduce a system model that is composed of at least two interacting subsystems of different natures. One of these subsystems is a competitive neural network that implements the winner-take-all function, but there is also another subsystem that is controlled by the neural network and which modifies the local synaptic plasticity of the neurons in learning. The learning is restricted spatially to the local neighborhood of the most active neurons. The plasticity-control subsystem could be based on nonspecific neural interactions, but more probably it is a chemical control effect. Only by means of the separation of the neural signal transfer and the plasticity control has it become possible to implement an effective and robust self-organizing system.

Figure 2: Left image: Models of acoustic spectra of Finnish phonemes, organized on an SOM.
Right image: Calibration of the models into phonemic classes. The symbol # stands for the stop consonants /k,p,t/.

Nonetheless, the SOM principle can also be expressed mathematically in a pure abstract form, without reference to any underlying neural or other components. The first application area of the SOM was speech recognition (see Figure 2). In its abstract form, the SOM has come into widespread use in data analysis and data exploration (Kaski et al. 1998, Oja et al. 2003, Pöllä et al. 2007).

Mathematical definition of the SOM

Consider first data items that are \(n\)-dimensional Euclidean vectors \[ x(t) = [\xi_1 (t), \xi_2 (t), \ldots , \xi_n (t)] \ .\]

Here \(t\) is the index of the data item in a given sequence. Let the \(i\)th model be \(m_i (t) = [\mu_{i1} (t), \mu_{i2} (t), \ldots , \mu_{in} (t)]\ ,\) where now \(t\) denotes the index in the sequence in which the models are generated. This sequence is defined as a smoothing-type process in which the new value \(m_i (t+1)\) is computed iteratively from the old value \(m_i (t)\) and the new data item \(x (t)\) as

\[ m_i(t+1) = m_i(t) + \alpha(t) h_{ci}(t) [ x(t)-m_i(t) ].\]

Here \(\alpha (t)\) is a scalar factor that defines the size of the correction; its value decreases with the step index \(t.\) The index \(i\) refers to the model under processing, and \(c\) is the index of the model that has the smallest distance from \(x(t)\) in the Euclidean signal space. The factor \(h_{ci} (t)\) is a kind of smoothing kernel, also called the neighborhood function. It is equal to 1 when \(i = c\) and its value decreases when the distance between the models \(m_i\) and \(m_c\) on the grid increases. Moreover, the spatial width of the kernel on the grid decreases with the step index \(t\ .\) These functions of the step index, which determine the convergence, must be chosen very delicately: cf., e.g., (Kohonen 2001). The initialization of the \(m_i(1)\) is a problem also discussed in (Kohonen 2001).

Although the above iterative algorithm has been used with success in numerous applications, it has turned out that the scheme termed the Batch Map produces essentially similar results but is an order of magnitude faster. The basic idea is that for every node \(j\) in the grid, the average \(\overline{x}_j\) of all those input items \(x(t)\) is formed that have \(m_j\) as the closest model. After that, new models are computed as

\[m_i = \sum_j{n_j h_{ji}} \overline{x}_j / \sum_j{n_j h_{ji}}\]

where \(n_j\) is the number of input items mapped into the node \(j\ ,\) and the index \(j\) runs over the nodes in the neighborhood of node \(i\ .\) For the updated \(m_i\ ,\) this scheme is iterated a few times, always using the same batch of input data items to determine the updated \( \overline{x}_j\ .\)

The mathematical theory of the SOM is very complicated and only the one-dimensional case has been analyzed completely (Fort 2006). Only for a modified adaptation rule, in which the matching is averaged over \(h_{ci}\ ,\) can the SOM be derived from a specific potential function (Heskes 2001). Apparently, the SOM belongs to the "ill posed" problems in mathematics.

Useful instructions when applying the SOM

It is advisable to pay attention at least to the following recommendations in order that the resulting mappings are stable, well oriented, and topologically correct.

Form of the array: A hexagonal grid of nodes is to be preferred for visual inspection. The edges of the array ought to be rather rectangular than square, because the "elastic network" formed of the nodel vectors \(m_i\) must be oriented along with the distribution of the input \(x\) in the signal space. On the other hand, since the \(m_i\) have to approximate to the distribution of the \(x\ ,\) it is further desirable that the width and the height of the array at least roughly correspond to the major dimensions (e.g. the principal components) of the distribution of \(x\ .\)

Scaling of the vector components is a very subtle problem. One may easily realize that the orientation, or ordered "regression" of the model vectors in the input space depends on the scaling of the components (or dimensions) of the input data vectors. Especially if the data elements represent variables of different kinds and are therefore given in different scales, there does not exist any simple rule to determine what kind of rescaling is best before entering the training data into the learning algorithm. One may try heuristically justifiable rescalings and check the quality of the resulting maps, e.g., by means of Sammon's mapping (Sammon 1969) or the average quantization error. Often the normalization of all input variables such that, e.g., their variances become equal, is a useful strategy.

Learning with a small number of available training samples: In some problems, e.g., when making maps for items defined by a finite set of attributes, one does not have a large set of training samples. However, for a good statistical accuracy, the learning process may require a rather large number of training steps, and if the number of available samples is much smaller, the samples can be used reiteratively in training. The samples may be applied cyclically or in a randomly permuted order, or picked up at random from the basic set with replacement (this is called bootstrap sampling). It has turned out in practice that ordered cyclic application of the available samples is not noticeably worse than the other, statistically stringent sampling methods.

Quality of learning: Different learning processes will be obtained when starting with different initial values \(m_i(1)\ ,\) and applying different sequences of the training vectors \({x(t)}\) and different learning parameters. Of course it would be desirable for the resulting map to always be the least ambiguous one. It is obvious that some optimal map for the same input data must exist. When comparing maps with the same array structure and definition of \(h_{ci}\ ,\) the mean of \(|| x - m_c ||\) (quantization error) over the training data is a useful performance index. Therefore, an appreciable number (say, several dozen) of random initializations of the \(m_i(1)\) may be tried, and the map with the least quantization error selected.

Notice, however, that there would be no sense in comparing quantization errors for maps with different structures or \(h_{ci}\ ,\) because, e.g., it is a trivial fact that the error is minimum if \(h_{ci} = \delta_{ci}\) (Kronecker delta). When using this singular kernel, however, there is no self-organizing power left, because the algorithm will be reduced to classical vector quantization.

Software packages

Data analysis, clustering and visualization by the SOM can be done using either public domain, commercial, or self-coded software. The use of self-coded software is not encouraged as there are many subtle aspects that need to be taken into account and which affect the convergence and accuracy of the algorithm.

The SOM_PAK is a collection of tools for the correct application of the SOM. This original program package was created by the SOM Programming Team of the Helsinki University of Technology especially for very large problems and it is available for download at http://www.cis.hut.fi/research/som_lvq_pak.shtml.

The SOM Toolbox is a more flexible, general-purpose software library for MATLAB implementation of the SOM algorithm, and it uses the MATLAB graphics. The SOM Toolbox software package is available for download at http://www.cis.hut.fi/projects/somtoolbox/.

Although many commercial implementations of the SOM exist for specific applications, it must be cautioned that in many of them there are fatal flaws, because all of the above precautions have not been taken into account.

Extensions of SOM

Numerous variants of the basic SOM have been suggested (cf. Kaski et al. 1998, Oja et al. 2003, and Pöllä et al. 2007). They are too many to be reviewed here. The main alternatives for the definition of the SOM models are the Generative Topographic Map (Bishop et al. 1998) and the information-based computation of the SOM models (Van Hulle 2000).

The structure of the SOM array can be made adaptive ("growing maps", Fritzke 1991), or alternatively the neighborhood relations can be defined in the input space (Martinetz et al. 1993).

The SOM principle can be modified to analyze subspace relations (Kohonen et al. 1997) or dynamic patterns (Hammer et al. 2004).

In its basic form, the SOM described in this article, as contrasted with its many variants, is in a special position because it continues the classical methods such as principal component analysis (Hotelling 1933) or multidimensional scaling (Kruskal and Wish 1978), and directly visualizes the clustering results as an easily comprehensible geometric display.

SOM variants are often introduced in the WSOM (Workshop on Self-Organizing Map) conference series that is dedicated to reporting research related to the Self-Organizing Map. Latest publications include (Laaksonen and Honkela 2011) and (Príncipe and Miikkulainen 2009).

References

Amari, S. (1980). Topographic organization of nerve fields. Bulletin of Mathematical Biology, 42:339-364.

Bishop, C. M.; Svensén, M.; and Williams, C. K. I. (1998). GTM: The Generative Topographic Mapping. Neural Computation 10(1): 215-234.

Fort, J.-C. (2006). SOM’s mathematics. Neural Networks, 19(6–7): 812–816.

Fritzke, B. (1991). Let it grow - self-organizing feature maps with problem dependent cell structure. Proceedings of ICANN 1991, Amsterdam: North-Holland, pp. 403-308.

Hammer, B.; Micheli, A.; Sperduti, A.; and Strickert, M. (2004). Recursive self-organizing network models. Neural Networks, 17(8-9):1061-1086.

Heskes, T. (2001). Self-organizing maps, vector quantization, and mixture modeling. IEEE Transactions on Neural Networks, 12:1299-1305.

Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24: 417-441, 498-520.

Kaski, S.; Kangas, J.; and Kohonen, T. (1998). Bibliography of self-organizing map (SOM) papers: 1981-1997. Neural Computing Surveys, 1: 102-350.

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59-69.

Kohonen T. (1984). Self-Organization and Associative Memory. Berlin: Springer.

Kohonen, T. (2001). Self-Organizing Maps. Third, Extended Edition. Springer Series in Information Sciences vol. 30, Berlin, Germany: Springer-Verlag, ISBN 978-3-540-67921-9

Kohonen, T.; Kaski, S.; and Lappalainen, H. (1997). Self-organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Computation, 9: 1321-1344.

Kohonen, T.; and Somervuo, P. (2002). How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9): 945-952.

Kruskal, J. B.; and Wish, M. (1978). Multidimensional Scaling. Beverly Hills, CA: Sage Publications.

Laaksonen, J.; and Honkela, T. (eds.) (2011). "Advances in Self-Organizing Maps, WSOM 2011", Berlin: Springer.

von der Malsburg, C. (1973). Self-organization of orientation sensitive cells in the striate cortex. Kybernetik, 14:85-100.

Martinetz, T. M.; Berkovich, S. G.; and Schulten, K. (1993). "Neural gas" for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks, 4: 558-569.

Oja, M.; Kaski, S.; and Kohonen, T. (2003). Bibliography of Self-Organizing Map (SOM) Papers: 1998-2001 Addendum. Neural Computing Surveys, 3: 1-156.

Pöllä, M.; Honkela, T.; and Kohonen, T. (2007). Bibliography of Self-Organizing Map (SOM) Papers: 2002-2005 Addendum. Neural Computing Surveys, forthcoming.

Príncipe, J. C.; and Miikkulainen, R. (eds.) (2009). "Advances in Self-Organizing Maps, WSOM 2009", Berlin: Springer.

Sammon, J. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computing, 18(5):401-409.

Van Hulle, M. M. (2000). Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization. New York: John Wiley & Sons.

Internal references

  • Braitenberg, Valentino (2007). Brain. Scholarpedia, 2(11):2918.
  • Schreiber, Rob (2007). MATLAB. Scholarpedia, 2(7):2929.

External links

See Also

Adaptive Resonance Theory, Associative Memory, Bayesian Learning, Neural Networks, Self Organization

Personal tools
Namespaces

Variants
Actions
Navigation
Focal areas
Activity
Tools