Data Exploration with Leaflet

Leaflet Data Analysis Data Visualization Spatial Mapping

Using interactive maps for exploratory data analysis with PMA GPS data

Finn Roberts https://www.linkedin.com/in/finn-roberts-b04b98148/ (ISRDI Senior Data Analyst)
2022-09-15

Several recent posts have introduced techniques for integrating spatial data sources with PMA survey data. We’ve explored precipitation patterns in Burkina Faso using terra, considered the evolution of conflict areas in Uganda from 2019-2020, and created multi-country faceted maps using cowplot.

In each of these cases, we’ve relied on the familiar ggplot2 and ggspatial packages as our cartographic workhorses. ggplot2 can be a powerful tool when making maps, and its standardized syntax for both spatial and non-spatial data visualization is appealing, but in practice it is best for producing static maps.

In many — probably most — cases, a static map is all that we need to display the spatial patterns present in our data, but there are situations in which it can be beneficial to be able to interact with the map directly. Interactivity is often thought of as a feature to be employed in a final visualization product, but it can be extremely useful during data exploration, as well.

By allowing the reader to click to receive specific information about certain spatial features, zoom and pan to see both macro- and micro-level patterns, toggle between different metrics, and more, interactive maps can make it easier to:

In this post, we’ll use the leaflet package to explore the spatial distribution of two competing metrics for urbanism using PMA panel data from Kenya. IPUMS PMA has a variable indicating whether an enumeration area is urban or rural (URBAN), but the definition of this variable is not well-standardized across countries, so an external data source may provide a more reliable indication of urbanism.

The leaflet package provides an R interface with the leaflet.js JavaScript library. While not all of the functionality available in the JavaScript version is available in R, the package makes it surprisingly easy to quickly assemble professional-quality interactive maps for data exploration and presentation.

Setup

To showcase the advantages of interactive maps, we’ll use the variable PANELBIRTH, which indicates whether a woman gave birth in the year between Phase 1 and Phase 2 of the Kenya panel study. As you might imagine, PMA surveys already contain a number of key predictors for childbirth. We’ll feature only a few variables representing each woman’s

As we’ll see, an interactive map allows the user to zoom-in on each PMA sample cluster - known as an enumeration area (EA) - and dynamically render a summary table for the average age, marital status, and wealth of reproductive age women in that specific location.

Mapping also allows us to examine the relationship between PMA variables and data from external sources. We mentioned above that IPUMS PMA includes the variable URBAN, which indicates whether each country’s statistical bureau classifies a woman’s EA as “urban” or “rural”. Because this definition varies by country, PMA researchers like Zhang (2022) sometimes use road network density as a standardized proxy for urbanism. To demonstrate, we’ll obtain road network data from Geofabrik and plot local road density together with PMA summary data for each EA.

To get started, we’ll load the following packages in R. (If you haven’t installed all of these yet, you can do so with install.packages first.)

Next, we’ll obtain a long-format longitudinal extract with members of the Kenya panel study (Female Respondents only). To keep things simple, we’ll filter to include only records from the Phase 2 interview.1

dat <- read_ipums_micro(
  ddi = "data/pma_00149.xml",
  data = "data/pma_00149.dat.gz"
)

dat <- dat %>% filter(YEAR == 2020 & RESULTFQ == 1)

We’ve mentioned previously on this blog that researchers must apply with PMA to access the GPS coordinates for each EA. These GPS data come in the form of a CSV file of displaced centroid points in latitude/longitude format. The PMA documentation goes into more detail about this displacement methodology. For our purposes, it is enough to keep in mind that the point locations in our GPS data are approximate: they represent the center of each sample cluster, but this center is randomly displaced up to 10 kilometers in order to further protect participant confidentiality.

The GPS data come in CSV format, which we’ll load into R with read_csv. We’ll then use the sf package to convert the imported data into an sf spatial object, which will use the coordinate reference system WGS 84.2 We can pass EPSG code 4326 to st_set_crs to indicate that our coordinates are relative to the WGS 84 system.

ea_coords <- read_csv("data/ea/gps_kenya.csv") %>%
  st_as_sf(coords = c("GPSLONG", "GPSLAT")) %>%
  st_set_crs("epsg:4326") %>% 
  select(EAID, geometry)

Now that we have our PMA data and EA coordinates prepared, we can start working with leaflet itself.

Introduction to Leaflet

For those familiar with ggplot2, the conceptual framework when building a leaflet map should be fairly familiar. In both cases, we’ll construct our final visual product in a series of layers, which will be superimposed to form the final output.

Accordingly, when we initialize a leaflet map with its eponymous function, leaflet, we see only a blank gray canvas, just like we might expect with ggplot2:

As with ggplot, the leaflet function allows you to attach data and set several other map-wide parameters.

At its core, though, calling leaflet is enough to get a map going.

Basemaps

Most web basemaps rely on a set of square tiles that can be stitched together to form an entire map. This improves processing since only the new tiles in the region being viewed need to be downloaded as the user pans or zooms. leaflet has several built-in basemaps, which can be added with addProviderTiles. Available map providers can be accessed using providers.

For our maps, we’ll use the Positron tiles from Carto. These understated basemap tiles are well-suited for emphasizing our data, which we will overlay later.

leaflet() %>%
  addProviderTiles(providers$CartoDB.PositronNoLabels)

We can set some initial parameters using the options argument in addProviderTiles. This argument itself takes a function, providerTileOptions, whose arguments correspond to several of the options available for standard JavaScript leaflet maps. Here, we set the minimum and maximum zoom range for the map. We then use the setView function to initialize the map over our area of interest with an appropriate zoom level.

basemap <- leaflet() %>%
  addProviderTiles(
    provider = providers$CartoDB.PositronNoLabels,
    options = providerTileOptions(minZoom = 6, maxZoom = 12),
  ) %>%
  setView(lat = -0.06, lng = 37.48, zoom = 7)

basemap

Adding Spatial Features

Even though our basemap shows up-to-date borders from OpenStreetMap, it would be nice to emphasize Kenya’s administrative regions to draw the viewer’s attention to our area of interest. We load a shapefile of Kenya’s administrative borders for use in our maps:

borders_sub <- st_read("data/ke_boundaries/ke_boundaries.shp")

To get a border for the entire country, we can dissolve the internal administrative region borders to the external border with st_union:

borders_cntry <- borders_sub %>% st_union() %>% st_sf()

Just like with ggplot2, leaflet has a variety of similar layer types, each of which is specific to the types of spatial features being mapped. For instance, data of point locations call for the use of addCircles or addCircleMarkers, whereas data of polygon areas would require addPolygons.

Each of these functions anticipates a map object as its first argument. This makes leaflet maps respond well to the pipe syntax you may be familiar with from data manipulation in the tidyverse, since each of these functions also returns a map object. The only difference is that instead of piping data through each step in the processing pipeline, we’re piping our map.

So, to add both our borders and our EA coordinates to the map, we simply have to pipe our basemap into a call to addPolygons where we specify that the data associated with this map layer come from our borders_sub object. Then we can pipe this output directly into a call to addCircleMarkers, adding the EA points to the map. The other function arguments, like fillColor, opacity, and so on, control the aesthetics of these layers.

basemap %>%
  addPolygons(
    data = borders_sub,
    fill = NA,
    opacity = 0.8,
    weight = 2,
    color = "#9D9D9D"
  ) %>%
  addCircleMarkers(
    data = ea_coords,
    radius = 4,
    fillColor = "#575757",
    fillOpacity = 0.8,
    color = "#1D1D1D",
    opacity = 0,
    weight = 1.5
  )