rSDI

Mehmet Gençer

1 Spatial Dispersion Index (SDI) for Analysis of Activity Outreach in Spatial and Geographic Networks

Consider a network of movements or exchanges between places. This is commonplace in socio-economic activities. For example: when you order something from Amazon, the movement of your package from one warehouse to the next is part of Amazon’s shipment network, or even part of the global shipment network. Our commutes to work in the morning can be considered a commute network between neighborhoods/cities/offices. Each of these cases can be considered as an specialized instance of the mathematical concept of ‘graph’ called spatial graph: a graph consisting of vertices with fixed locations, and arcs/edges connecting these vertices.

In social and economic sciences analysis of relations in such a network is very interesting, and with the recent availability and coverage of spatial network data, very useful for managerial planning in private firms and policy decisions such as urban planning in public agencies. On the other hand metrics concerning spatial aspects of networks are almost always problem specific and not general. Spatial Dispersion Index (SDI) is a generalized measurement index, or rather a family of indices to evaluate spatial distances of movements in a network in a problem neutral way, thus aims to address this problem. rSDI computes and optionally visualizes this index with minimal hassle:

library(rSDI)
SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes, variant="vow") %>% plotSDI(variant="vow")

The core idea of the SDI index was conceived as part of a large scale government commissioned study report, in Turkish (Gençer et al. 2020), whose results are yearly updated with new data and is available as a live analysis at https://yersis.gov.tr/web. The SDI index was later was generalized and published on its own merit, and explained in detail in the paper (Gençer 2023).

rSDI package provides functions to compute SDI family of indices for spatial graphs in conjunction with its definition the paper (Gençer 2023). rSDI also provides some convenience functions to visualize SDI index measurements. While this is not its primary reason of existence it is often very practical for the user to have some preliminary visualization at arm’s length. In sections 2 and 3 below we first explain the concept of spatial networks and their data, then review mathematical graph formalism to represent spatial networks. Then we introduce the SDI index family’s calculation, its interpretation, and thumb rules for choosing an index for your analyses. The last two sections provide a run through of index calculation then visualization features of the rSDI package using an example data set provided by the package, on human migration between provinces of Turkiye.

2 Spatial networks and their data

Spatial networks are represented as a particular type of graph where the graph nodes (vertices) are fixed locations and each graph arc/edge represent a flow/relation between two of these nodes. In most real life cases these networks represent varying flows of people (e.g. transportation), good (e.g. trade, shopping), or information (e.g. Internet data transfer, phone call). Thus the graph is weighted and directed and has arcs, rather than edges. Also in most cases the network is geospatial. In geospatial networks the locations of vertices in the representing graph are, for example, cities, airports, etc., and are defined with their latitude and longitude. This is the case for most examples of movements related to trade, migration, education, services, etc. In other cases the spatial network may span a smaller space and is rather measured on its own Cartesian references; for example in the case of student movement on a campus, or movement of parts in a production facility. In those latter cases vertices (e.g. campus library, a welding station) have an x-y position defined with respect to a chosen corner or center of the campus, production facility, etc.

Spatial network data consists of two data frames: one representing the flows and the other detailing the locations, and possibly labels of nodes in the network. The following is a simple, imaginary spatial network data:

Data frames providing flows (left) and nodes (right) for an imaginary spatial network
from to weight
A B 10
B A 20
A C 5
id x y
A 0 3
B 4 0
C 0 0
D 4 3

This spatial network is visualized below, showing node locations as well as flow amounts (weights) on lines representing edges:

3 Graph notation to represent spatial networks

A spatial network, \(N\), is represented with the mathematical concept of graph, which consists of vertices, \(V\), representing the locations/nodes in the spatial network and ties/edges, \(E\) representing flows tying them together into a network, thus \(N=(V,E)\). To capture a flow over an edge \(e_{ij}\) from vertex \(i\) to vertex \(j\) let us denote the amount of flow on the edge as edge weight \(w_{i\rightarrow j}\). In graph theoretic terms this corresponds to a directed and weighted graph.

To capture spatial aspects of the network let \(p_i=<x_i,y_i>\) and \(p_j=<x_j, y_j>\) denote locations of vertices \(i\) and \(j\), respectively, in some two dimensional space such as Cartesian or geographic locations. In the latter, the coordinates \(x\) and \(y\) would denote the longitude and latitude of a geographical location, respectively. One can now speak of a spatial distance, \(\delta_{ij}\), between any two vertices. In the case of geographical networks Haversine distance would be appropriate for determining spherical distances between two locations:

\[\begin{equation} \delta^{H}_{ij}=2R\arcsin\left(\sqrt{\sin^{2}\left({\frac{y_j-y_i}{2}}\right)+\cos(\varphi_{i})\cos(\varphi_{j})\sin^{2}\left({\frac{x_j-x_i}{2}}\right)}\right) \end{equation}\] Where \(R\) is the radius of the Earth, which is roughly \(6,371\) km.

In the case of a more local spatial network we would probably have Cartesian coordinates, e.g. x-y coordinates within a production plant, of which we analyse flows of parts between stations. In those cases an Euclidean distance can be used instead: \[\begin{equation} \delta^E_{ij}=\sqrt{(x_i-x_j)^2+(y_i-y_j)^2} \end{equation}\]

In our toy example from the previous section the Euclidean distances can be easily calculated (since it is a simple 3-4-5 triangle) for each edge as follows (defaulting to Euclidean distance for Middle Earth, since we have no latitude/longitude information about it):

The flow data with distances between the source and target location of flows added
from to weight distance
A B 10 5
B A 20 5
A C 5 3

4 SDI: Definition and uses

In order to quantify spatial reach of the flows in a spatial network, the spatial distance of two nodes should be incorporated with the flow between the nodes. The Spatial Dispersion Index here is a direct translation of this idea and is broadly defined as the weighted average distances the network flows span, wighted by flow amounts. The key idea was conceived by the author, explained thoroughly and put into use in a broader field study report (Gençer 2023). A brief discussion and definition is presented here.

SDI is a family of indices rather than a single index. The reason for its variants is related to differential research interests when analyzing spatial networks. Here we explain these variations. Further below we introduce the three letter, XXX, notation to symbolize corresponding SDI variants:

As an illustrative example, network level, weighted SDI index would be computed as follows 1: \[\begin{equation} \textrm{SDI}^w(N)=\frac{\sum_{i \rightarrow j \in E}{(w_{i\rightarrow j} \cdot \delta_{ij})}}{\sum_{i\rightarrow j \in E}{w_{i\rightarrow j}}} \end{equation}\] For our toy problem this could be computed as: \(\textrm{SDI}^w(N)=(10*5+20*5+5*3)/(10+20+5)\)

Whereas a node level, unweighted, out-flows only index would be computed by replacing all weights with 1s: \[\begin{equation} \textrm{SDI}^u_{+}(i)=\frac{\sum_{i\rightarrow j \in E}{(1 \cdot \delta_{ij})}}{\sum_{i\rightarrow j \in E}{1}} \end{equation}\] Which is simply the average of distances of the flows towards the focal node. For our toy problem’s node A, this can be computed as \(\textrm{SDI}^u_{+}(A)=(5+3)/2\)

Please consult the source paper, Gençer (2023), and help pages for an extensive description of index calculation for the above cases.

SDI computation uses a three letter index variant code to represent a variant of the index. The LDS code corresponds to usage of Level-Direction-and-Strength of network ties, respectively. For example an LDS code of “nuw” would mean a network level, undirected, and weighted SDI variant. Each part of the LDS code can take the following values:

5 A simple example

rSDI functions consume an igraph object and return their output as an igraph which has additional edge, vertex, and/or graph attributes. Let us start with an example involving the helper function dist_calc(). This function is not neded to be called explicitly in a normal workflow, but normally invoked by SDI(), the main entry point of SDI calculations. It computes the distances between pairs of nodes which are connected by each graph edge. The computed distances are returned as edge attributes of the returned graph. Consider the following spatial network data frames for the fictional spatial network above:

flows<-data.frame(from=c("A","B","A"), to=c("B","A","C"), weight=c(10,20,5))
nodes<-data.frame(id=c("A","B","C","D"),x=c(0,4,0,4),y=c(3,0,0,3))
library(igraph)
toyGraph <- graph_from_data_frame(flows, directed=TRUE, vertices=nodes)

The edges of the graph has only the ‘weight’ attribute:

#> [1] "weight"

rSDI’s main function is SDI(). SDI() function works in a similar fashion and adds its output as graph and vertex attribute (in addition to computing and adding edge distance attributes if they are missing, which is a prerequisite for all SDI metrics):

toyGraphWithSDI <- SDI(toyGraph) #same as SDI(toyGraph, level="vertex", directionality="undirected", weight.use="weighted")
edge_attr_names(toyGraphWithSDI)
#> [1] "weight"   "distance"
vertex_attr_names(toyGraphWithSDI)
#> [1] "name"    "x"       "y"       "SDI_vuw"

To help its user follow the theoretical distinctions explained in the previous section, rSDI letter codes the index measurements it measures şn accordance with that classification. In the the example above, call to SDI function computes (1) vertex level, (2) undirected,, and (3) weighted SDI index, which are the defaults. Thus to each vertex of its input graph it adds and attribute named ‘SDI_vuw’. The attribute is added to each vertex even if the index cannot be computed. This is the case for vertex D which has an NA value stored in its ‘SDI_vuw’ attribute:

vertex_attr(toyGraphWithSDI, "SDI_vuw")
#> [1] 4.714286 5.000000 3.000000       NA

If the index is computed at the network level the vertices will not have additional attributes but the graph itself will, following the same convention:

toyGraphWithNetworkSDI <- SDI(toyGraph, level="network", directionality="undirected", weight.use="weighted")
graph_attr_names(toyGraphWithNetworkSDI) 
#> [1] "SDI_nuw"
graph_attr(toyGraphWithNetworkSDI,"SDI_nuw")
#> [1] 4.714286

Once you are comfortable with this convention you can shorten your calls to SDI() using the ‘variant’ parameter as follows, which is equivalent to the call in the example above:

toyGraphWithNetworkSDI <- SDI(toyGraph, variant="nuw")

SDI will leave previously computed indices untouched. Thus, for example, you can compute several indices in a pipe:

toyGraph %>% 
  SDI(variant="nuw") %>%
  SDI(variant="niu") %>% # nuu?
  SDI(variant="vuw") %>%
  SDI(variant="vuu") -> toyGraphWithSeveralSDI
graph_attr_names(toyGraphWithSeveralSDI)
#> [1] "SDI_nuw" "SDI_nuu"
vertex_attr_names(toyGraphWithSeveralSDI)
#> [1] "name"    "x"       "y"       "SDI_vuw" "SDI_vuu"

The same can be achieved by using a vector of variants in a single call:

toyGraphWithSeveralSDI <- SDI(toyGraph, variant=c("nuw","niu","vuw","vuu"))
graph_attr_names(toyGraphWithSeveralSDI)
#> [1] "SDI_nuw" "SDI_nuu"
vertex_attr_names(toyGraphWithSeveralSDI)
#> [1] "name"    "x"       "y"       "SDI_vuw" "SDI_vuu"

Note that for the generalized SDI variant you must provide the additional \(\alpha\) parameter:

toyGraphWithGeneralizedSDI <- SDI(toyGraph, variant="vug", alpha=0.5) 
vertex_attr_names(toyGraphWithGeneralizedSDI) 
#> [1] "name"    "x"       "y"       "SDI_vug"
vertex_attr(toyGraphWithGeneralizedSDI,"SDI_vug")
#> [1] 4.252907 4.472136 3.464102       NA

5.1 Optional distance calculation

Calling the dist_calc() helper function adds a distance attribute to an input graph. This is automatically performed when SDI() is called, but you may facilitate it separately if needed. For the example in the previous section the call is made as follows:

toyGraphWithDistances <- dist_calc(toyGraph)
edge_attr_names(toyGraphWithDistances)
#> [1] "weight"   "distance"

Having seen the coordinate attributes as ‘x’ and ‘y’ (rather than as ‘latitude’ and ‘longitude’) the function opts for a Euclidean distance calculation and returns the 3-4-5 triangle distances:

edge_attr(toyGraphWithDistances, "distance")
#> [1] 5 5 3

6 Example: Computing and plotting SDI for a geospatial network

rSDI package comes with a real world data set consisting of two data frames: TurkiyeMigration.flows contains the data on migration of people between Türkiye’s provinces in the period 2016-2017-2018, a consolidated version of raw data from Turkish Statistical Institute. TurkiyeMigration.nodes contains labels and geographic coordinates (latitute&longitude) of provinces:

head(TurkiyeMigration.flows)
#>    from    to    weight
#> 1 TRC12 TR621  737.0000
#> 2 TR332 TR621  319.6667
#> 3 TRA21 TR621  213.0000
#> 4 TR712 TR621  412.6667
#> 5 TR834 TR621  158.3333
#> 6 TR510 TR621 2594.6667
head(TurkiyeMigration.nodes)
#>      id            label longitude latitude
#> 1 TR100   \\u0130stanbul  28.96711 41.00893
#> 2 TR211   Tekirda\\u011f  27.51167 40.97809
#> 3 TR212           Edirne  26.55596 41.67717
#> 4 TR213 K\\u0131rklareli  27.22437 41.73547
#> 5 TR221  Bal\\u0131kesir  27.88834 39.65046
#> 6 TR222  \\u00c7anakkale  26.40859 40.14672

You may call the SDI() function either with an igraph object you compose yourself from flow and node data, or directly giving them to SDI, as follows:

TMSDI <- SDI(TurkiyeMigration.flows, TurkiyeMigration.nodes, variant="vuw")
#   -- OR --
library(igraph)
TMgraph <- graph_from_data_frame(TurkiyeMigration.flows, directed=TRUE, TurkiyeMigration.nodes)
TMSDI <- SDI(TMgraph, variant="vuw")

rSDI plotting functions make use of available open map packages in the R ecosystem to make a geographical plot of SDI measurements. The plotSDI() function produces a visualization where the circles for each note has an area proportional to the node’s selected SDI measure. The function will try to optimize the circle sizes as best as it can, but you can customize circle sizes, fill colors, etc. by overriding its parameters. For example you can scale the circles sizes relative to its default as:

plotSDI(TMSDI, variant="vuw", circle.size.scale=1)

Please refer to documentation of plotSDI() fur further fine grained control of its plotting parameters.

You may want to visualize the network flows along with the SDI index measurements. This particular combination is provided as a convenience. You can turn on the displaying of network edges using the ‘edges’ argument to SDO plotter:

plotSDI(TMSDI, variant="vuw", edges=TRUE)

Please note that this combination is based on several graph visualization and geospatial packages. If you need a fine control over all these underlying visualization layers you are recommended to go for a home made solution using packages such as ggraph, sf, and naturalearth.

7 Custom visualization capabilities

The visualization features of rSDI mainly leverages the fact that spatial graphs are often geospatial, and thus one can make use of geospatial libraries in R in combination with graph plotting to visualize these networks on a map. Current version of rSDI does not provide a capability to use your own map, for example when working with a network of flows within a production plant, a schoolyard, etc. Following example is provided for your convenience which can be adapted to your use case. It uses a custom visual as the background of network plot:

flows<-data.frame(from=c("A","B","A"), to=c("B","A","C"), weight=c(10,20,5))
nodes<-data.frame(id=c("A","B","C","D"),x=c(0,4,0,4),y=c(3,0,0,3))
g <- SDI(flows,nodes, variant="vuw")
library(ggplot2)
library(ggraph)
library(ggimage)
url<-"https://static.wikia.nocookie.net/lotr/images/5/59/Middle-earth.jpg/revision/latest?cb=20060726004750"

lay <- create_layout(g, 'manual', x=V(g)$x, y=V(g)$y)
p<-ggraph(lay) +
  geom_edge_bend(aes(label=E(g)$weight), label_size=10,strength=0.4,edge_width=3,alpha=0.3,arrow = arrow(length = unit(10, 'mm')))+
  #geom_node_point(size = 10, aes(color="yellow"),alpha=0.4) +
  geom_node_point(aes(size=V(g)$SDI_vuw),color="red")+
  geom_node_text(label=V(g)$name, size=10, vjust=-0.7,hjust=1)+
  xlim(-3,5)+ylim(-2,4)
p<-ggbackground(p, url) 
p

References

Gençer, Mehmet. 2023. “An Index for Measuring Spatial Graph Dispersion in Socio-Economic Networks.” Applied Spatial Analysis and Policy, 1–21. https://dx.doi.org/10.1007/s12061-023-09545-8.
Gençer, Mehmet, Mustafa Işık, M. Caner Meydan, Leyla Bilen Kazancık, Zeyneb Ersayın, Adnan Saygılı, Yatmaz Fulya, et al. 2020. İller Ve Bölgeler Arası Sosyo-Ekonomik Ağ i̇lişkileri Raporu. Turkish Ministry of Industry; Technology, Development Agency. https://www.kalkinmakutuphanesi.gov.tr/dokuman/yer-sis-iller-ve-bolgeler-arasi-sosyo-ekonomik-ag-iliskileri-raporu/2591.
Opsahl, Tore, Filip Agneessens, and John Skvoretz. 2010. Node centrality in weighted networks: Generalizing degree and shortest paths.” Social Networks 32 (3): 245–51. https://doi.org/10.1016/j.socnet.2010.03.006.

  1. please note that when run over the whole network, directionality makes no difference, so we omit the \(\textrm{SDI}_{\pm}(\ldots)\) notation in this one↩︎