# GIS with R: examples from Slovakia

## Geospatial analysis. Why are we doing this?

1. Most of our data we collect are a recording of some phenomena that happened at a specific place and time.
2. Good visualisations and methods from spatio-temporal analysis field can help us understand our data.

## This tutorial

1. Spatial data and coordinate systems.
2. Introduction to `sf` package.
3. Visualisations.
4. Example of analysis: spatial autocorrelation.

## Extra resources

Recommended textbooks:

• Geocomputation with R. Available online.
• Gelfand, Alan E., ed. 2010. Handbook of Spatial Statistics. Chapman & Hall/CRC Handbooks of Modern Statistical Methods. Boca Raton: CRC Press.
• Wikle, Christopher K., Andrew Zammit-Mangion, and Noel Cressie. 2019. Spatio-Temporal Statistics with R. 1st ed. Boca Raton, Florida : CRC Press, : Chapman and Hall/CRC. https://doi.org/10.1201/9781351769723.

Other sources:

• r-spatial.org is a portal for people interested in spatial modelling.
• Packages like `sf`, `sp` tend to be well-documented, and with worked examples.

## Spatial data

• Data frames, but now each row has a geometry associated with it (usually a point or an area).
• We need to be careful with the coordinate systems:
• If working at macro scale (at world level), we need to take into account curvature of the Earth.
• We stick to coordinate systems where we can work with Euclidean geometry.
• The cadaster office in Slovakia uses krovak system, which is the legally binding coordinate system (coded as `EPSG:5513`).
• This website is useful for reading off the coordinates for a particular point.

## Obtaining geometry (map) files

• The map files come in different formats. One of the most common ones are shapefiles (`.shp`), geodatabase (`.gdb`), geopackage (`.gpk`), geoJSON (`.geojson`).
• This page in the geoportal.sk has the necessary files in different formats. We stick to the shapefiles (`.shp`) format. The portal allows downloading the maps at different resolutions. In this guide, we will be working with the most precise one, which is referred to as ‘Základná úroveň/ ZBGIS - Administratívne hranice’.

## `sf` package

Install the package using:

``install.packages('sf')``
``library('sf')``

## `st_read()` to load maps

``st_read(dsn, layer)``
• `dsn`: file name of the map file.
• `layer`: certain file formats allow saving multiple layers, we will work with single layer files (it does not mean we can’t have multiple indicators/statistic for each location).

Load the map of Slovakia at the municipality level:

``````mapa.obec.sf <- st_read(dsn = file.path(getwd(),
'maps',
'ah_shp_0',
'obec_0.shp'))``````

## Inspecting the map object - coordinates

To see what coordinate system it uses, `st_crs()` function can be used:

``st_crs(mapa.obec.sf)``
``````## Coordinate Reference System:
##   No EPSG code
##   proj4string: "+proj=krovak +lat_0=49.5 +lon_0=24.83333333333333 +alpha=30.28813975277778 +k=0.9999 +x_0=0 +y_0=0 +ellps=bessel +units=m +no_defs"``````

Due to the missing code, we do a transformation to the official code using `st_transform()` function.

``````crsCodeSkCadastreOffice <- 5513  # save the cadastre CRS to a constant
mapa.obec.sf <- st_transform(x = mapa.obec.sf,
crs = crsCodeSkCadastreOffice)
st_crs(mapa.obec.sf)``````

## Inspecting the map object - data

``summary(mapa.obec.sf)``
``````##       DOW                  AUT         ACH         SOI       FACC           IDN4              NM4            IDN3
##  Min.   :2019-04-30   Min.   :1   Min.   :1   Min.   :9   FA004:2927   Min.   :500011   Lúčka   :   4   Min.   :101.0
##  1st Qu.:2019-04-30   1st Qu.:1   1st Qu.:1   1st Qu.:9                1st Qu.:508989   Porúbka :   4   1st Qu.:402.0
##  Median :2019-04-30   Median :1   Median :1   Median :9                Median :518204   Dúbrava :   3   Median :606.0
##  Mean   :2019-04-30   Mean   :1   Mean   :1   Mean   :9                Mean   :521206   Lúčky   :   3   Mean   :547.7
##  3rd Qu.:2019-04-30   3rd Qu.:1   3rd Qu.:1   3rd Qu.:9                3rd Qu.:526138   Petrovce:   3   3rd Qu.:708.0
##  Max.   :2019-04-30   Max.   :1   Max.   :1   Max.   :9                Max.   :599972   Píla    :   3   Max.   :811.0
##                                                                                         (Other) :2907
##               NM3            IDN2                    NM2          VYMERA            Shape_Leng       Shape_Area
##  Košice - okolie: 114   Min.   :1.000   Prešovský      :665   Min.   :   361255   Min.   :  3211   Min.   :   354208
##  Rimavská Sobota: 107   1st Qu.:4.000   Banskobystrický:516   1st Qu.:  7207245   1st Qu.: 14031   1st Qu.:  7207035
##  Prešov         :  91   Median :6.000   Košický        :461   Median : 11655767   Median : 18582   Median : 11647728
##  Levice         :  89   Mean   :5.415   Nitriansky     :354   Mean   : 16752332   Mean   : 21367   Mean   : 16747701
##  Bardejov       :  86   3rd Qu.:7.000   Žilinský       :315   3rd Qu.: 19769322   3rd Qu.: 25503   3rd Qu.: 19794868
##  Trebišov       :  82   Max.   :8.000   Trenčiansky    :276   Max.   :359784254   Max.   :204284   Max.   :359936992
##  (Other)        :2358                   (Other)        :340
##           geometry
##  MULTIPOLYGON :2927
##  epsg:5513    :   0
##  +proj=krov...:   0
##
##
##
## ``````

## Inspecting the map object - data

From the columns included in the shapefiles, we see that:

1. `IDN4` is a unique identifier for each municipality.
2. `NM4` is the name of the municipality.
3. `IDN3` is the identifier for okres.
4. `NM3` is the name for okres.
5. `IDN2` is the identifier for kraj.
6. `NM2` is the name for kraj.
7. `Shape_Leng` and `Shape_Area` are the perimeter and the area of a municipality.
8. It’s not clear to me what `VYMERA` is given that the values slightly deviate from the `Shape_Area`.

The other columns are not relevant for this tutorial.

## Data preparation

Before, we proceed, we rename the columns to more meaningful names:

``````library(dplyr)
mapa.obec.sf <- mapa.obec.sf %>%
rename(obec = NM4) %>%
rename(obec_id = IDN4) %>%
rename(okres = NM3) %>%
rename(okres_id = IDN3) %>%
rename(kraj = NM2) %>%
rename(kraj_id = IDN2) %>%
rename(obvod_uzemia = Shape_Leng) %>%
rename(plocha_uzemia = Shape_Area)``````

## Visualisation - options

Some of the most popular options are:

1. default `plot()` method in the `sf` package,
2. `ggplot`,
3. `tmap`.

We will give examples of all throughout the rest of this tutorial.

## Example: migration in Slovakia

1. Load the migration and population data.
2. Merge the data with the map files.
3. Visualise the data: inflow / outflow.

## Example: migration in Slovakia

Load migration and population data

``````migration.district.df <- read_excel(file.path(getwd(), 'data', "raw", "migration_by_district.xlsx"),
skip=8, col_names=FALSE)
colnames(migration.district.df) <- c('district', 'inflow_sk', 'inflow_abroad', 'outflow_sk', 'outflow_abroad')
migration.district.df <- migration.district.df %>%
dplyr::mutate(district = gsub("^District of (.+)\$", "\\1", district))

# load population data and clean them
population.municipality.df <- read_excel(file.path(getwd(), 'data', "raw", "population_municipality.xlsx"),
skip=7, col_names=FALSE)
colnames(population.municipality.df) <- c('district', 'municipality', 'population')
population.district.df <- population.municipality.df  %>%
dplyr::mutate(district = gsub("^District of (.+)\$", "\\1", district)) %>%
dplyr::group_by(district) %>%
dplyr::summarise(population = sum(population))``````

## Example: migration in Slovakia

Merge the data with maps

``````library(stringi) # for removing diacritics for joins
pop.migration.district.df <- migration.district.df %>%
dplyr::full_join(population.district.df, by = 'district') %>%
dplyr::mutate(district_ascii = stri_trans_general(str = district, id = "Latin-ASCII"))

pop.migration.district.sf <- mapa.okres.sf %>%
dplyr::mutate(district_ascii = stri_trans_general(str = okres, id = "Latin-ASCII")) %>%
dplyr::left_join(pop.migration.district.df, by = "district_ascii")

pop.migration.district.sf <- pop.migration.district.sf %>%
dplyr::mutate(outflow_abroad_percent = 100*(outflow_abroad / population)) %>%
dplyr::mutate(outflow_sk_percent = 100*(outflow_sk / population)) %>%
dplyr::mutate(inflow_abroad_percent = 100*(inflow_abroad / population)) %>%
dplyr::mutate(inflow_sk_percent = 100*(inflow_sk / population))``````

## Example: migration in Slovakia

Plot using the default `plot()` method

``````plot(x = pop.migration.district.sf[c('outflow_abroad_percent')],
key.pos = 1, main='% of population moved abroad')``````