To be able to run the SOM algorithm, you have to load the package called 
SOMbrero. The function used to run it is called trainSOM() and is 
detailed below.
This documentation only considers the case of contingency tables.
The trainSOM function has several arguments, but only the first one is
required. This argument is x.data which is the dataset used to train the 
SOM. In this documentation, it is passed to the function as a matrix or a data
frame. This set must be a contingency table, i.e., it must contain either 0 or 
positive integers. Column and row names must be supplied.
The other arguments are the same as the arguments passed to the initSOM
function (they are parameters defining the algorithm, see help(initSOM)
for further details).
The trainSOM function returns an object of class somRes (see 
help(trainSOM) for further details on this class).
presidentielles2002 data setThe presidentielles2002 data set provides the number of votes at the first
round of the 2002 French presidential election for each of the 16 candidates in
all of the 106 French administrative districts called “departements”. Further
details about this data set and the 2002 French presidential election are given
with help(presidentielles2002).
data(presidentielles2002)
apply(presidentielles2002, 2, sum)
##      MEGRET      LEPAGE  GLUCKSTEIN      BAYROU      CHIRAC      LE_PEN 
##      667043      535875      132696     1949219     5666021     4804772 
##     TAUBIRA SAINT_JOSSE      MAMERE      JOSPIN      BOUTIN         HUE 
##      660515     1204801     1495774     4610267      339157      960548 
## CHEVENEMENT     MADELIN   LAGUILLER  BESANCENOT 
##     1518568     1113551     1630118     1210562
(the two candidates that ran the second round of the election were Jacques Chirac and the far-right candidate Jean-Marie Le Pen)
set.seed(4031719)
korresp.som <- trainSOM(x.data=presidentielles2002, dimension=c(8,8),
                        type="korresp", scaling="chi2", nb.save=10)
korresp.som
##       Self-Organizing Map object...
##          online learning, type: korresp 
##          8 x 8 grid with square topology
##          neighbourhood type: gaussian 
##          distance type: euclidean
As the energy is registered during the intermediate backups, we can have a look at its evolution
plot(korresp.som, what="energy")
 
which is stabilized during the last 100 iterations.
The clustering component contains the final classification of the dataset. As both row and column variables are classified, the length of the resulting vector is equal to the sum of the number of rows and the number of columns.
NB: The clustering component shows first the column variables (here, the candidates) and then the row variables (here, the departements).
The following table indicates which graphics are available for a korresp SOM.
| Type | Energy | Obs | Prototypes | Add | Super Cluster | 
|---|---|---|---|---|---|
| no type | x | ||||
| hitmap | x | x | |||
| color | x2 | x2 | |||
| lines | x2 | x2 | |||
| barplot | x | ||||
| radar | x | ||||
| pie | |||||
| boxplot | |||||
| 3d | x2 | ||||
| poly.dist | x | x | |||
| umatrix | x | ||||
| smooth.dist | x | ||||
| words | |||||
| names | x | ||||
| graph | |||||
| mds | x | x | |||
| grid.dist | x | ||||
| grid | x | ||||
| dendrogram | x | ||||
| dendro3d | x | 
In the column “Prototypes”, a plot marked “x2” means that this plot is available for both row and column variables. In the “Super Cluster” column, a “x2” cell means the plot is available for both data set variables and additional variables.
korresp.som$clustering
##                   MEGRET                   LEPAGE               GLUCKSTEIN 
##                        8                       57                       46 
##                   BAYROU                   CHIRAC                   LE_PEN 
##                       57                       42                        8 
##                  TAUBIRA              SAINT_JOSSE                   MAMERE 
##                        1                       64                       57 
##                   JOSPIN                   BOUTIN                      HUE 
##                       60                       57                       48 
##              CHEVENEMENT                  MADELIN                LAGUILLER 
##                       57                       57                       47 
##               BESANCENOT                      ain                    aisne 
##                       62                        8                       24 
##                   allier  alpes_de_haute_provence             hautes_alpes 
##                       48                       64                       64 
##          alpes_maritimes                  ardeche                 ardennes 
##                        8                       64                       16 
##                   ariege                     aube                     aude 
##                       64                       24                       48 
##                  aveyron         bouches_du_rhone                 calvados 
##                       63                        8                       64 
##                   cantal                 charente        charente_maritime 
##                       62                       64                       64 
##                     cher                  correze                corse_sud 
##                       48                       53                       63 
##              haute_corse                cote_d'or            cotes_d'armor 
##                       21                        8                       62 
##                   creuse                 dordogne                    doubs 
##                       63                       64                        8 
##                    drome                     eure             eure_et_loir 
##                        8                       24                       24 
##                finistere                     gard            haute_garonne 
##                       61                       16                       23 
##                     gers                  gironde                  herault 
##                       64                       64                       24 
##          ille_et_vilaine                    indre          indre_et_loire_ 
##                       60                       64                       62 
##                    isere                     jura                   landes 
##                        8                        8                       64 
##             loir_et_cher                    loire              haute_loire 
##                       56                        8                       32 
##         loire_atlantique                   loiret                      lot 
##                       61                       16                       64 
##          lot_et_garonne_                   lozere          maine_et_loire_ 
##                       64                       63                       61 
##                   manche                    marne              haute_marne 
##                       63                       16                       16 
##                  mayenne       meurthe_et_moselle                    meuse 
##                       60                        8                       24 
##                 morbihan                  moselle                   nievre 
##                       61                        8                       40 
##                     nord                     oise                     orne 
##                       24                        8                       55 
##            pas_de_calais              puy_de_dome     pyrenees_atlantiques 
##                       48                       39                       64 
##          hautes_pyrenees      pyrenees_orientales                 bas_rhin 
##                       64                       32                        8 
##                haut_rhin                    rhone              haute_saone 
##                        8                        8                        8 
##          saone_et_loire_                   sarthe                   savoie 
##                       24                       46                        8 
##             haute_savoie                    paris          seine_maritime_ 
##                        8                       57                       32 
##          seine_et_marne_                 yvelines              deux_sevres 
##                        7                       58                       62 
##                    somme                     tarn          tarn_et_garonne 
##                       64                       64                       64 
##                      var                 vaucluse                   vendee 
##                        8                        8                       61 
##                   vienne             haute_vienne                   vosges 
##                       63                       63                        8 
##                    yonne    territoire_de_belfort                  essonne 
##                        8                        8                        6 
##          hauts_de_seine_        seine_saint-denis             val_de_marne 
##                       57                        6                       21 
##               val_d'oise               guadeloupe               martinique 
##                        6                        1                        1 
##                   guyane               la_reunion                  mayotte 
##                        1                       41                       33 
##       nouvelle_caledonie      polynesie_francaise saint_pierre_et_miquelon 
##                       41                       41                       27 
##         wallis_et_futuna   francais_de_l'etranger 
##                       41                       49
The resulting distribution of the clustering on the map can also be visualized by a hitmap:
plot(korresp.som, what="obs", type="hitmap")
 
For a more precise view, "names" plot is implemented: it prints, 
in each neuron, the names of the variables assigned to it ; in the korresp SOM, 
both row and column variable names are printed.
plot(korresp.som, what="obs", type="names", scale=c(0.9,0.5))
 
The map is organized as follows: the bottom left side of the map is associated to the candidate “Taubira” who obtained her better vote scoring in the overseas departements “Guadeloupe”, “Martinique” and “Guyane”.
These candidates are opposed to the top left hand side of the map (cluster 8) which is associated to the far-right candidates “Le Pen” and “Megret” who traditionally obtain higher voting scores in some South of France departements “Vaucluse” and some North Est departements as “Haut Rhin”. The top right hand side of the map is composed of clustered characterized by far-left candidates (“HUE”, “LAGUILLER”, “BESANCENOT”) and progressively goes to the traditionnal left candidates in the right part of the map (“JOSPIN”) and finally to the traditional right candidates in the bottom right corner of the map (“CHIRAC”, “BAYROU”). It is to be noted that the vote for far-right candidates is more similar to the vote for far-left candidates than for traditional right candidates. The cluster with the largest number of departement classified inside is cluster 8 at the top left corner of the map, which is also Le Pen's cluster: in this election, the far-right candidate actually succeeded for the first time to run the second round of the presidential election.
Some graphics from the numeric SOM algorithm are still available in the korresp 
case. They are detailed below. As the resulting clustering provides the 
classification for both rows and columns, a new argument view is used to 
specify which one should be considered. Its possible values are either 
"r" for row variables (the default value) or "c" for column 
variables.
Three representations are available:
view 
argument is used)# plot the line prototypes (106 French departements)
plot(korresp.som, what="prototypes", type="lines", view="r", print.title=TRUE)
 
# plot the column prototypes (16 candidates)
plot(korresp.som, what="prototypes", type="lines", view="c", print.title=TRUE)
 
The peaks in neurons 1, 2 and 9 correspond, in the row view, to the overseas departements and, in the column view, to the candidate “Taubira”. In the column views, the two peaks clearly identified in the right side clusters correspond to the two “main” tranditional candidates “Jospin” and “Chirac” (respectively, left and right candidates).
A more precise individual view are given with the graphics “color” and “3d”, here drawn, as an example for the candidate “Le Pen” and for the departement “Martinique”.
variable) is represented on the map;"color".par(mfrow=c(1,2))
plot(korresp.som, what="prototypes", type="color", variable="LE_PEN")
plot(korresp.som, what="prototypes", type="3d", variable="martinique")
 
The first graphic shows that “Le Pen” obtained its best scores in the departements located in the top left hand side of the map and its lowest scores in the departements located in the bottom left side of the map (overseas departement).
The second graphic shows that the candidates that obtained the higher scores in Martinique are located in the bottom right hand side of the map (mainly Taubira).
The graphics can also be drawn by giving the variable number and its type, either “r” or “c” (here, as an example, “Chirac” which is the 5th candidate):
par(mfrow=c(1,2))
plot(korresp.som, what="prototypes", type="color", variable=5, view="c")
plot(korresp.som, what="prototypes", type="3d", variable=5, view="c")
 
Hence “Chirac” is located at the bottom right corner the map and more generally in the bottom of the map (he traditionnally also have high votes in the overseas departements).
These graphics are exactly the same as in the numerical case:
"poly.dist" represents the distances between neighboring prototypes with
polygons plotted for each cell of the grid. The smaller the distance between 
a polygon's vertex and a cell border, the closer the pair of prototypes.
The colors indicates the number of observations in the neuron (white=empty);
"umatrix" fills the neurons of the grid using colors that represent
the average distance between the current prototype and its neighbors;
"smooth.dist" plots the mean distance between the current prototype and 
its neighbors with a color gradation;
"mds" plots the number of the neuron on a map according to a Multi
Dimensional Scaling (MDS) projection;
"grid.dist" plots a point for each pair of prototypes, with x 
coordinates representing the distance between the prototypes in the 
input space, and y coordinates representing the distance between the 
corresponding neurons on the grid.
plot(korresp.som, what="prototypes", type="poly.dist", print.title=TRUE)
 
plot(korresp.som, what="prototypes", type="umatrix", print.title=TRUE)
 
plot(korresp.som, what="prototypes", type="smooth.dist", print.title=TRUE)
 
plot(korresp.som, what="prototypes", type="mds")
 
plot(korresp.som, what="prototypes", type="grid.dist")
 
Three neurons (1, 9 and 2) have been already picked out in the section Clustering interpretation for having prototypes rather different than the rest of the map. The graphics just above confirm this hypothesis: there a noticeable peak in prototype distances around these three neurons. The MDS vizualisation also shows that these three prototypes are clearly different.
quality(korresp.som)
## $topographic
## [1] 0.009433962
## 
## $quantization
## [1] 0.1493657
By default, the quality function calculates both quantization and topographic 
errors. It is also possible to specify which one you want to obtain, by using
the argument quality.type.
The topographic error value varies between 0 (good projection quality) and 1 (poor projection quality). Here, the topographic quality of the mapping is quite good with a topographic error equal to 0.009.
The quantization error is an unbounded positive number. The closer from 0 it is, the better the projection quality is.
In the SOM algorithm, the number of clusters is necessarily close to the number of neurons on the grid (not necessarily equal as some neurons may have no observations assigned to them). This - quite large - number may not suit the original data for a clustering purpose.
A usual way to address clustering with SOM is to perform a hierarchical
clustering on the prototypes. This clustering is directly available in the
package SOMbrero using the function superClass. To do so, you can
first have a quick overview to decide on the number of super clusters which 
suits your data.
plot(superClass(korresp.som))
## Warning in plot.somSC(superClass(korresp.som)): Impossible to plot the rectangles: no super clusters.
 
By default, the function plots both a dendrogram and the evolution of the
percentage of explained variance. Here, 3 super clusters seem to be a good
choice. The output of superClass is a somSC class object.
Basic functions have been defined for this class:
my.sc <- superClass(korresp.som, k=3)
summary(my.sc)
## 
##    SOM Super Classes
##      Initial number of clusters :  64 
##      Number of super clusters   :  3 
## 
## 
##   Frequency table
##  1  2  3 
##  8 24 32 
## 
##   Clustering
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
##  1  1  1  2  3  3  3  3  1  1  1  2  3  3  3  3  1  1  2  2  3  3  3  3  2 
## 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
##  2  2  2  3  3  3  3  2  2  2  2  3  3  3  3  2  2  2  2  3  3  3  3  2  2 
## 51 52 53 54 55 56 57 58 59 60 61 62 63 64 
##  2  2  3  3  3  3  2  2  2  2  3  3  3  3
plot(my.sc, plot.var=FALSE)
 
Like plot.somRes, the function plot.somSC has an 
argument 'type' which offers many different plots and can thus be 
combined with most of the graphics produced by plot.somSC:
Case "grid" fills the grid with colors according to the super clustering 
(and can provide a legend).
Case "dendro3d" plots a 3d dendrogram.
plot(my.sc, type="grid", plot.legend=TRUE)
 
plot(my.sc, type="dendro3d")
 
The three super-clusters correspond to overseas votes (super-cluster 1), traditional votes (super-cluster 2) and far-left/right votes (super-cluster 2). The 3 different neurons mentionned earlier have been gathered together in the super cluster 1.
A couple of plots from plot.somRes are also available for the super 
clustering. Some identify the super clusters with colors:
plot(my.sc, type="hitmap", plot.legend=TRUE)
 
plot(my.sc, type="lines", print.title=TRUE)
 
plot(my.sc, type="lines", print.title=TRUE, view="c")
 
plot(my.sc, type="mds", plot.legend=TRUE)
 
And some others identify the super clusters with titles:
plot(my.sc, type="color", view="r", variable="correze")
 
plot(my.sc, type="color", view="c", variable="JOSPIN")
 
plot(my.sc, type="poly.dist")
 
Let us consider the first super cluster. It contains 3 departements and 1 candidate:
## [1] "TAUBIRA"    "guadeloupe" "martinique" "guyane"
The departments are the 3 biggest overseas departements. These departements, regarding history and culture, are different from metropolitan France thus they also have a different but common election behaviour. Particularly, during the 2002 French presidential election, they strongly promoted Christine Taubira, who is actually the candidate assigned to this super cluster, a woman originated from one of the overseas departements.