Useful resources

  1. Geocomputation with R: a comprehensive, but very practical book on GIS with R. It is freely available online at this link.
  2. r-spatial.org

Installing sf package

Install the package using: {r, echo=TRUE, eval=FALSE} ) install.packages(sf)

The installation should work out of the box, but one might get compatibility issues with the gdal library. Depending on the error, googling it should do the job.

After the installation, we simply load the package, and also set the R working directory appropriately.

library(sf)

setwd(file.path(path.expand('~'), 'workplace', 'gis-slovakia'))

Maps

In order to be able to work with maps, we need obtain the files that have the geometrical representation of the maps, with the corresponding coordinate system. When working with global data, it is useful to use a coordinate system which takes into account the curvature of the Earth, but for analysis that involves small area (like Slovakia), it is practical to worth with planar coordinates where we can use Euclidean geometry that we are familiar with. In our analysis we will use the krovak coordinate system which covers Czechia and Slovak Republic. The legally-binding system that the cadastre uses is coded as EPSG:5513.

This website is useful for navigating the map and reading off the coordinates for a particular point. To change the coordinate system, change the coordinates system code.

File formats

The map files come in different formats. One of the most common ones are shapefiles (.shp), geodatabase (.gdb), geopackage (.gpk), geoJSON (.geojson).

Obtaining Slovakia maps

This page in the geoportal.sk has the necessary files in different formats. We stick to the shapefiles (.shp). The portal allows downloading the maps at different resolutions. In this guide, we will be working with the most precise one, which is referred to as ‘Základná úroveň/ ZBGIS - Administratívne hranice’.

Loading the maps

sf package provides st_read() function with two parameters: 1. dsn: this stands for data source name, but it is effectively a file name. 2. layer: certain file formats allow saving multiple layers, we will work with single layer files (it does not mean we can’t have multiple indicators/statistic for each location).

We start off with the maps at the municipality level. We simply run:

mapa.obec.sf <- st_read(dsn = file.path(getwd(), 'maps', 'ah_shp_0', 'obec_0.shp'))
## Reading layer `obec_0' from data source `/Users/jp2011/workplace/gis-slovakia/maps/ah_shp_0/obec_0.shp' using driver `ESRI Shapefile'
## Simple feature collection with 2927 features and 14 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -591311.3 ymin: -1334762 xmax: -165436.9 ymax: -1132697
## epsg (SRID):    NA
## proj4string:    +proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs

Now that, we loaded the object, we familiarise ourselves with it.

Coordinates system

To see what coordinate system it uses, we run:

st_crs(mapa.obec.sf)
## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs"

We see that the CRS used is krovak, but we do not get the code, e.g. 5513. Since, the krovak CRS comes in different flavours, we do a transformation to the one that the cadastre office in Slovakia uses by using st_transform() function. We also save that code into a constant for future use.

CRS.CODE.KU.SK <- 5513    # save the cadastre CRS to a constant
mapa.obec.sf <- st_transform(x = mapa.obec.sf, crs = CRS.CODE.KU.SK)
st_crs(mapa.obec.sf)                   # 
## Coordinate Reference System:
##   EPSG: 5513 
##   proj4string: "+proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813972222222 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +towgs84=589,76,480,0,0,0,0 +units=m +no_defs"

Now, we see that the EPSG code for our coordinates system is set correctly.

Data associated with the maps

Now, we explore the data that the maps comes with. First we check the column names:

names(mapa.obec.sf)
##  [1] "DOW"        "AUT"        "ACH"        "SOI"        "FACC"      
##  [6] "IDN4"       "NM4"        "IDN3"       "NM3"        "IDN2"      
## [11] "NM2"        "VYMERA"     "Shape_Leng" "Shape_Area" "geometry"

Some of these names are intuitive, but have a one more peek at the data to understand more. R’s built-in command summary() is a useful start.

summary(mapa.obec.sf)
##       DOW                  AUT         ACH         SOI       FACC     
##  Min.   :2019-04-30   Min.   :1   Min.   :1   Min.   :9   FA004:2927  
##  1st Qu.:2019-04-30   1st Qu.:1   1st Qu.:1   1st Qu.:9               
##  Median :2019-04-30   Median :1   Median :1   Median :9               
##  Mean   :2019-04-30   Mean   :1   Mean   :1   Mean   :9               
##  3rd Qu.:2019-04-30   3rd Qu.:1   3rd Qu.:1   3rd Qu.:9               
##  Max.   :2019-04-30   Max.   :1   Max.   :1   Max.   :9               
##                                                                       
##       IDN4              NM4            IDN3                    NM3      
##  Min.   :500011   Lúčka   :   4   Min.   :101.0   Košice - okolie: 114  
##  1st Qu.:508989   Porúbka :   4   1st Qu.:402.0   Rimavská Sobota: 107  
##  Median :518204   Dúbrava :   3   Median :606.0   Prešov         :  91  
##  Mean   :521206   Lúčky   :   3   Mean   :547.7   Levice         :  89  
##  3rd Qu.:526138   Petrovce:   3   3rd Qu.:708.0   Bardejov       :  86  
##  Max.   :599972   Píla    :   3   Max.   :811.0   Trebišov       :  82  
##                   (Other) :2907                   (Other)        :2358  
##       IDN2                    NM2          VYMERA         
##  Min.   :1.000   Prešovský      :665   Min.   :   361255  
##  1st Qu.:4.000   Banskobystrický:516   1st Qu.:  7207245  
##  Median :6.000   Košický        :461   Median : 11655767  
##  Mean   :5.415   Nitriansky     :354   Mean   : 16752332  
##  3rd Qu.:7.000   Žilinský       :315   3rd Qu.: 19769322  
##  Max.   :8.000   Trenčiansky    :276   Max.   :359784254  
##                  (Other)        :340                      
##    Shape_Leng       Shape_Area                 geometry   
##  Min.   :  3211   Min.   :   354208   MULTIPOLYGON :2927  
##  1st Qu.: 14031   1st Qu.:  7207035   epsg:5513    :   0  
##  Median : 18582   Median : 11647728   +proj=krov...:   0  
##  Mean   : 21367   Mean   : 16747701                       
##  3rd Qu.: 25503   3rd Qu.: 19794868                       
##  Max.   :204284   Max.   :359936992                       
## 

From this, it is obvious that: 1. IDN4 is a unique identifier for each municipality. 2. NM4 is the name of the municipality. 3. IDN3 is the identifier for okres. 4. NM3 is the name for okres. 5. IDN2 is the identifier for kraj. 5. NM2 is the name for kraj. 7. Shape_Leng and Shape_Area are the perimeter and the area of the municipality. 8. It’s not clear to me what VYMERA is given that the values slightly deviate from the Shape_Area.

The other columns are not relevant for this tutorial.

In the visualisation section, we will stick to these values even though they are not particularly interesting. In later sections we will add more interesting data.

Before, we proceed, we rename them to more meaningful names:

library(dplyr)
mapa.obec.sf <- mapa.obec.sf %>%
  rename(obec = NM4) %>%
  rename(obec_id = IDN4) %>%
  rename(okres = NM3) %>%
  rename(okres_id = IDN3) %>%
  rename(kraj = NM2) %>%
  rename(kraj_id = IDN2) %>%
  rename(obvod_uzemia = Shape_Leng) %>%
  rename(plocha_uzemia = Shape_Area)

Maps for other geometries

For convenience, we also load other geometries: okres, kraj. We also transform them into the correct CRS.

mapa.okres.sf <- st_read(dsn = file.path(getwd(), 'maps', 'ah_shp_0', 'okres_0.shp'))
## Reading layer `okres_0' from data source `/Users/jp2011/workplace/gis-slovakia/maps/ah_shp_0/okres_0.shp' using driver `ESRI Shapefile'
## Simple feature collection with 79 features and 12 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -591311.3 ymin: -1334762 xmax: -165436.9 ymax: -1132697
## epsg (SRID):    NA
## proj4string:    +proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs
mapa.okres.sf <- st_transform(x = mapa.okres.sf, crs = CRS.CODE.KU.SK)

mapa.kraj.sf <- st_read(dsn = file.path(getwd(), 'maps', 'ah_shp_0', 'kraj_0.shp'))
## Reading layer `kraj_0' from data source `/Users/jp2011/workplace/gis-slovakia/maps/ah_shp_0/kraj_0.shp' using driver `ESRI Shapefile'
## Simple feature collection with 8 features and 10 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -591311.3 ymin: -1334762 xmax: -165436.9 ymax: -1132697
## epsg (SRID):    NA
## proj4string:    +proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs
mapa.kraj.sf <- st_transform(x = mapa.kraj.sf, crs = CRS.CODE.KU.SK)

And we do the renaming as before (not shown here).

Visualisations

One of the mosti important aspects of GIS is data visualisation. There are different way of going about it, here is a selection of what I’ve used personally.

Default method

Instinctively, the first thing one does when exploring an unknown object in R is call the plot() method. Indeed, there is a default method that does surprisingly good job with little effort.

plot(mapa.okres.sf)
## Warning: plotting the first 9 out of 12 attributes; use max.plot = 12 to
## plot all

The command tried to plot all the columns, but by default it takes the first 9. Usually, we want to have more control over what we plot, so we choose what we are interested in. In this case, we plot the area and the perimeter.

plot(x = mapa.okres.sf[c('obvod_uzemia', 'plocha_uzemia')],
     key.pos = 1) # this add the legend to the bottom (1)

This is a good place to start when doing exploratory analysis, but it’s better to use more sophisticated functions to produce better-looking plots.

ggplot

ggplot is the tool of choice for general X-Y plots. It is possible to

library(ggplot2)  # Load the ggplot library
library(viridis)  # Load the library for a nice colour palette
ggplot(mapa.okres.sf) +
  geom_sf(aes(fill = plocha_uzemia)) +
  scale_fill_viridis("Plocha m\u00B2") +
  ggtitle("Plocha okresov") +
  theme_bw()

Whatever applied to ggplot craft, applies here as well.

tmap

This is a very versatile library that allows integration with Open Street Map data and interactive maps on top of regular plotting. What we show here is only scratching the surface, for more examples, please visit the tmap’s github page.

Before we proceed, we install the libraries and load them.

install.packages('tmap')
library(tmap)
tmap_mode("plot")
## tmap mode set to plotting
tm_shape(mapa.okres.sf) +
  tm_fill(col = "plocha_uzemia", style = "cont", title="Plocha m\u00B2") + 
  tm_style('gray', title = 'Plocha okresov m\u00B2')

For example, we can do a breakdown by kraj.

tmap_mode("plot")
## tmap mode set to plotting
tm_shape(mapa.okres.sf) +
  tm_fill(col = "plocha_uzemia", style = "cont", title="Plocha m\u00B2") + 
  tm_style('gray', title = 'Okresy SR') + 
  tm_facets('kraj')

For plots with fewer components (e.g. kraj), we might want to add labels. Be careful with the overlaps of the labels and overflows outside the frame. This post gives a good solution for the latter.

tmap_mode("plot")
## tmap mode set to plotting
tm_shape(mapa.kraj.sf) +
  tm_fill(col = "plocha_uzemia", style = "cont", title="Plocha m\u00B2") + 
  tm_style('gray', title = 'Okresy SR') + 
  tm_text('kraj', auto.placement = TRUE)

To save the map, use tmap_save() function.

mapa.okres.sf.gps <- st_transform(x = mapa.okres.sf, crs=4326)
osm_tiles = tmaptools::read_osm(st_bbox(mapa.okres.sf.gps), type = "osm")
tmap.object <- qtm(osm_tiles, raster.alpha = 0.6) + 
  tm_shape(mapa.okres.sf.gps) +
  tm_fill(col = "Shape_Area", style = "cont", title="Plocha m\u00B2") + 
  tm_style('gray', title = 'Okresy SR')

Interactive maps

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(mapa.okres.sf) +
  tm_fill(col = "plocha_uzemia", style = "cont", title="Plocha m\u00B2", scale = 0.8, alpha = 0.7) + 
  tm_style('gray', title = 'Okresy SR')

The interactive maps can be saved in the html format and can be easily shared. The background maps are loaded dynamically from the internet.

Operations

Subsetting

Distance metrics

Working with different geometries

Working with grids and the administrative districts.

Case study: prezidentske volby, prve kolo.

  1. Download the raw dataset of the results from here.
  2. Load the results, merge it with the geography you are interested in (obec, kraj)
  3. Plot the results.

First, load up the data from an Excel sheet. We check manually that first 4 rows are

library(readxl)
results.obec.file.path <- file.path(getwd(), 'data', "raw", 
                                    "PRE_2019_KOLO1_xlsx",
                                    "PRE_2019_KOLO1_tab03e.xlsx")
results.obec <- read_excel(results.obec.file.path)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ... and 8 more problems
results.obec <- results.obec[-(1:4), ]
names(results.obec) <- c('kraj_id', 'kraj', 'uzemny_obvod_id', 'uzemny_obvod',
                         'okres_id', 'okres', 'obec_id', 'obec',
                         'candidate_id', 'candidate_fname', 'candidate_sname',
                         'platne_hlasy_count', 'platne_hlasy_percent',
                         'pullouts')
results.obec <- results.obec %>%
  dplyr::mutate_at(c("platne_hlasy_count", "platne_hlasy_percent"), as.numeric)

Now, we join the data with the map.

results.obec.sf <- mapa.obec.sf %>%
  dplyr::select(obec_id) %>%
  dplyr::mutate(obec_id = as.character(obec_id)) %>%
  dplyr::full_join(results.obec, by = c("obec_id" = "obec_id")) %>%
  dplyr::mutate(name = paste(candidate_fname, candidate_sname, sep = " "))

Since we are working with ‘obec’ level, we focus only on one ‘kraj’ region and we only compare top 4 candidates in that region.

zilinsky.kraj.results.sf <- results.obec.sf %>%
  dplyr::filter(kraj_id == 5)  # zilinsky kraj

# select top 4 candidates in Zilinsky kraj
zilinsky.kraj.top.candidates  <- zilinsky.kraj.results.sf %>%
  dplyr::group_by(name) %>%
  dplyr::summarise(vote_total = sum(platne_hlasy_count)) %>%
  dplyr::arrange(vote_total) %>%
  dplyr::top_n(4) %>%
  dplyr::pull(name)
## Selecting by geometry
# filter out only the 4 candidates
zilinsky.kraj.results.top.candidates.sf <- zilinsky.kraj.results.sf %>%
  dplyr::filter(name %in% zilinsky.kraj.top.candidates)

tmap_mode("plot")
## tmap mode set to plotting
tm_shape(zilinsky.kraj.results.top.candidates.sf) +
  tm_fill(col = "platne_hlasy_percent", style = "cont", title="Podiel hlasov %") + 
  tm_style('gray', title = 'Žilinsky kraj') + 
  tm_facets('name')

As we can see this visualisation is not ideal, so we go for the interactive map where we can zoom in/out as we please.

library(tidyr)
zilinsky.results.wide.sf <- zilinsky.kraj.results.top.candidates.sf %>%
  dplyr::select(name, platne_hlasy_percent, obec) %>%
  tidyr::spread(name, platne_hlasy_percent)

tmap_mode("view")
## tmap mode set to interactive viewing
tm <- tm_shape(zilinsky.results.wide.sf) + 
  tm_polygons(zilinsky.kraj.top.candidates) + 
  tm_facets(sync = TRUE, ncol = 2)
tm