OpenStreetMap (OSM) data has a unique structure that is not directly reconcilable with other modes of representing spatial data, notably including the widely-adopted Simple Features (SF) scheme of the Open Geospatial Consortium (OGC). The three primary spatial objects of OSM are:
nodes, which are directly translatable to spatial
points
ways, which may be closed, in which case they form
polygons, or unclosed, in which case they are (non-polygonal)
lines.
relations which are higher-level objects used to
specify relationships between collections of ways and nodes. While there
are several recognised categories of relations, in spatial
terms these may be reduced to a binary distinction between:
multipolygon relations, which specify relationships
between an exterior polygon (through designating
role='outer') and possible inner polygons
(role='inner'). These may or may not be designated with
type=multipolygon. Political boundaries, for example, often
have type=boundary rather than explicit
type-multipolygon. osmdata identifies
multipolygons as those relation objects having at least one
member with role=outer or role=inner.
In the absence of inner and outer
roles, an OSM relation is assumed to be non-polygonal, and to instead
form a collection of non-enclosing lines.
The representation of spatial objects as Simple Features is described at length by the OGC, with this document merely reviewing relevant aspects. The SF system assumes that spatial features can be represented in one of seven distinct primary classes, which by convention are referred to in all capital letters. Relevant classes for OSM data are:
(The seventh primary class is GEOMETRYCOLLECTION, which
contains several objects with different geometries.) An SF (where that
acronym may connote both singular and plural) consists of a sequence of
spatial coordinates, which for OSM data are only ever XY
coordinates represented as strings enclosed within brackets. In addition
to coordinate data and associated coordinate reference systems, an SF
may include any number of additional data which quantify or qualify the
feature of interest. In the sf extension
to R, for example, a single SF is represented by one
row of a data.frame, with the geometry stored in a single
column, and any number of other columns containing these additional
data.
Simple Feature geometries are referred to in this vignette using all
capital letters (such as POLYGON), while OSM geometries use
lower case (such as polygon). Similarly, the Simple
Features standard of the OGC is referred to as SF, while
the R package of the same name is referred to as
R::sf–upper case R followed by lower case
sf. Much functionality of R::sf is determined
by the underlying Geospatial Data Abstraction
Library (GDAL; described below). Representations of
data are often discussed here with reference to GDAL/sf, in
which case it may always be assumed that the translation and
representation of data are determined by GDAL and not
directly by the creators of R::sf.
osmdata translates OSM into Simple FeaturesOSM nodes translate directly into SF::POINT objects,
with all OSM key-value pairs stored in additional
data.frame columns.
OSM ways may be either polygons or (non-polygonal) lines.
osmdata translates these into SF::LINESTRING
and SF::POLYGON objects, respectively. Although polygonal
and non-polygonal ways may have systematically different
key fields, they are conflated here to the single set of
key values common to all way objects
regardless of shape. This enables direct comparison and uniform
operation on both SF::LINESTRING and
SF::POLYGON objects.
OSM relations comprising members with role=outer or
role=inner are translated into
SF::MULTIPOLYGON objects; otherwise they form
SF::MULTILINESTRING objects. As in the preceding case of
OSM ways, potentially systematic differences between OSM
key fields for multipolygon and other
relation objects are ignored in favour of returning
identical key fields in both cases, whether or not
value fields for those keys exist.
An OSM multipolygon is translated by osmdata into a
single SF::MULTIPOLYGON object which has an additional
column specifying num_members. The SF geometry
thus consists of a list (an R::List object) of this number
of polygons, the first of which is the outer polygon, with
all subsequent members forming closed inner rings (either individually
or in combination).
Each of these inner polygons are also represented as one or more OSM
objects, which will generally include detailed data on the individual
components not able to be represented in the single
multipolygon representation. Each inner polygon is therefore
additionally stored in the sf::MULTIPOLYGON
data.frame along with all associated data. Thus the row
containing a multipolygon of num_polygon polygons is
followed by num_polygon - 1 rows containing the data for
each inner polygon.
Note that OSM relation objects generally have fewer (or
different) key-value pairs than do OSM way
objects. In the OSM system, data describing the detailed properties of
the constituent ways of a given OSM relation are stored with those
ways rather than with the relation.
osmdata follows this general principle, and stored the
geometry of all ways of a relation with the
relation itself (that is, as part of the MULTIPOLYGON or
MULTILINESTRING object), while those ways are also stored
themselves as LINESTRING (or potentially
POLYGON) objects, from where their additional
key-value data may be accessed.
OSM relations that are not multipolygons
are translated into SF::MULTILINESTRING objects. Each
member of any OSM relation is attributed a role, which may
be empty. osmdata collates all ways within a relation
according to their role attributes. Thus, unlike
multipolygon relations which are always translated into a single
sf::MULTIPOLYGON object, multilinestring relations are
translated by omsdata into potentially several
sf::MULTILINESTRING objects, one for each unique role.
This is particularly useful because relations are often
used to designated extended highways (for example,
designated bicycle routes or motorways), yet these often exist in
primary and alternative forms, with these
categories specified in roles. Separating these roles enables ready
access to any desired role.
These multilinestring objects also have a column specifying
num_members, as for multipolygons, with the primary member
followed by num_members rows, one for each member of the
multilinestring.
GDAL Translation of OSM into Simple FeaturesThe R package sf provide
an R implementation of Spatial Features, and provides a
wrapper around GDAL for reading geospatial data. GDAL
provides a ‘driver’ to
read OSM data, and thus sf can also be used to read
OSM data in R, as
detailed in the main osmdata vignette. However, the
GDAL translation of OSM data differs in several important
ways from the osmdata translation.
The primary difference is that GDAL only returns unique
objects of each spatial (SF) type. Thus sf::POINT objects
consist of only those points that are not otherwise members of some
‘higher’ object (line, polygon, or
relation objects). Although a given set of OSM data may
actually contain a great many points, attempting to load these with
sf::st_read (file, layer = "points")
will generally return surprisingly few points.
Apart from the numerical difference arising through
osmdata returning an sf::POINTS structure
containing all nodes within a given set of OSM data,
while sf::st_read (file, layer='points') returns only those
points not represented in other structure, the representation of points
remains otherwise broadly similar. The only other major difference is
that osmdata retains all key-value pairs
present in a given set of OSM data, whereas GDAL/sf only
retains a select few of these. Moreover, the keys returned
by GDAL/sf are pre-defined and invariant, meaning that data
returned from sf::st_read (...) may often contain
key columns in the resultant data.frame which
contain no (non-NA) data. This difference is illustrated in
an example repeated here from the
main
osmdata vignette, with the same principles applying to
all of the following classes of OSM data.
The following three lines define a query and download the resultant
data to an XML file.
q <- opq (bbox = "Trentham, Australia")
q <- add_osm_feature (q, key = "name") # any named objects
osmdata_xml (q, "trentham.osm")
These data may then be converted into SF representations using either
R::sf or osmdata, with OSM keys
being the column names of the resultant data.frame
objects.
names (sf::st_read ("trentham.osm", layer = "points", quiet = TRUE))
## [1] "osm_id" "name" "barrier" "highway" "ref"
## [6] "address" "is_in" "place" "man_made" "other_tags"
## [11] "geometry"
names (osmdata_sf (q, "trentham.osm")$osm_points)
## [1] "osm_id" "name" "X_description_" "X_waypoint_"
## [5] "addr.city" "addr.housenumber" "addr.postcode" "addr.street"
## [9] "amenity" "barrier" "denomination" "foot"
## [13] "ford" "highway" "leisure" "note_1"
## [17] "phone" "place" "railway" "railway.historic"
## [21] "ref" "religion" "shop" "source"
## [25] "tourism" "waterway" "geometry"
osmdata returns far more key fields than
does GDAL/sf. More importantly, however,
GDAL/sf returns pre-defined key fields
regardless of whether they contain any data:
addr <- sf::st_read ("trentham.osm", layer = "points", quiet = TRUE)$address
all (is.na (addr))
## [1] TRUE
In contrast, osmdata returns only those key
fields which contain data (and so excludes address in the
above example).
As for points, GDAL/sf only returns those ways that are
not represented or contained in ‘higher’ objects (OSM relations
interpreted as SF::MULTIPOLYGON or
SF::MULTILINESTRING objects). osmdata returns
all ways, and thus enables, for example, examination of the full
attributes of any member of a multigeometry object. This is not possible
with the GDAL/sf translation. As for points, the only
additional difference between osmdata and
GDAL/sf is that osmdata retains all
key-value pairs, whereas GDAL retains only a
select few.
Translation of OSM relations into Simple Features differs more
significantly between osmdata and GDAL/sf.
As indicated above, multipolygon relations are translated in broadly
comparable ways by both osmdata and sf/GDAL.
Note, however, the way members of an OSM relation may be
specified in arbitrary order, and the multipolygonal way may not
necessarily be traced through simply following the segments in the order
returned by sf/GDAL.
Linestring relations are simply read by GDAL directly in terms of the
their constituent ways, resulting in a single
SF::MULTILINESTRING object that contains exactly the same
number of lines as the ways in the OSM relation, regardless of their
role attributes. Note that roles are
frequently used to specify alternative multi-way routes
through a single OSM relation. Such distinctions between primary and
alternative are erased with GDAL/sf reading.
Navigable paths, routes, and ways are all tagged within OSM as
highway, readily enabling an overpass query to
return only ways that can be used for routing purposes.
Routes are nevertheless commonly assembled within OSM relations,
particularly where they form major, designated transport ways such as
long-distance foot or bicycle paths or major motorways.
sf/GDALA query for key=highway translated through
GDAL/sf will return those ways not part of any ‘higher’
structure as SF::LINESTRING objects, but components of an
entire transport network might also be returned as:
SF::MULTIPOLYGON objects, holding all single ways which
form simple polygons (that is, in which start and end points are the
same);SF::MULTIPOLYGON objects holding all single
(non-polygonal) ways which combine to form an
OSM multipolygon relation (that is, in which the collection
of ways ultimately forms a closed role=outer polygon).SF::MULTILINESTRING objects holding all single
(non-polygonal) ways which combine to form an OSM relation that is not a
multipolygon.Translating these data into a single form usable for routing purposes
is not simple. A particular problem that is extremely difficult to
resolve is reconciling the SF::MULTIPOLYGON objects with
the geometry of the SF::LINESTRING objects. Highway
components contained in SF::MULTIPOLYGON objects need to be
re-connected with the network represented by the
SF::LINESTRING objects, yet the OSM identifiers of the
MULTIPOLYGON components are removed by
sf/GDAL, preventing these components from being directly
re-connected. The only way to ensure connection would be to re-connect
those geographic points sharing identical coordinates. This would
require code too long and complicated to be worthwhile demonstrating
here.
osmdataosmdata retains all of the underlying ways of ‘higher’
structures (SF::MULTIPOLYGON or
SF::MULTILINESTRING objects) as SF::LINESTRING
or SF::POLYGON objects. The geometries of the latter
objects duplicate those of the ‘higher’ relations, yet contain
additional key-value pairs corresponding to each way. Most
importantly, the OSM ID values for all members of a
relation are stored within that relation, readily enabling
the individual ways (LINESTRING or POLYGON
objects) to be identified from the relation
(MULTIPOLYGON or MULTILINESTRING object).
The osmdata translation thus readily enables a
singularly complete network to be reconstructed by simply combining the
SF::LINESTRING layer with the SF::POLYGON
layer. These layers will always contain entirely independent members,
and so will always be able to be directly combined without duplicating
any objects.