Using interactive maps for exploratory data analysis with PMA GPS data
Several recent posts have introduced techniques for integrating spatial data sources with PMA survey data. We’ve explored precipitation patterns in Burkina Faso using terra, considered the evolution of conflict areas in Uganda from 2019-2020, and created multi-country faceted maps using cowplot.
In each of these cases, we’ve relied on the familiar ggplot2 and ggspatial packages
as our cartographic workhorses. ggplot2
can be a powerful
tool when making maps, and its standardized syntax for both spatial and
non-spatial data visualization is appealing, but in practice it is best
for producing static maps.
In many — probably most — cases, a static map is all that we need to display the spatial patterns present in our data, but there are situations in which it can be beneficial to be able to interact with the map directly. Interactivity is often thought of as a feature to be employed in a final visualization product, but it can be extremely useful during data exploration, as well.
By allowing the reader to click to receive specific information about certain spatial features, zoom and pan to see both macro- and micro-level patterns, toggle between different metrics, and more, interactive maps can make it easier to:
In this post, we’ll use the leaflet package to explore the spatial distribution of two competing metrics for urbanism using PMA panel data from Kenya. IPUMS PMA has a variable indicating whether an enumeration area is urban or rural (URBAN), but the definition of this variable is not well-standardized across countries, so an external data source may provide a more reliable indication of urbanism.
The leaflet
package provides an R interface with the leaflet.js JavaScript library. While
not all of the functionality available in the JavaScript version is
available in R, the package makes it surprisingly easy to quickly
assemble professional-quality interactive maps for data exploration and
presentation.
To showcase the advantages of interactive maps, we’ll use the variable PANELBIRTH, which indicates whether a woman gave birth in the year between Phase 1 and Phase 2 of the Kenya panel study. As you might imagine, PMA surveys already contain a number of key predictors for childbirth. We’ll feature only a few variables representing each woman’s
As we’ll see, an interactive map allows the user to zoom-in on each PMA sample cluster - known as an enumeration area (EA) - and dynamically render a summary table for the average age, marital status, and wealth of reproductive age women in that specific location.
Mapping also allows us to examine the relationship between PMA
variables and data from external sources. We mentioned above that IPUMS
PMA includes the variable URBAN
, which indicates whether
each country’s statistical bureau classifies a woman’s EA as “urban” or
“rural”. Because this definition varies by country, PMA researchers like
Zhang (2022) sometimes
use road network density as a standardized proxy for urbanism. To
demonstrate, we’ll obtain road network data from Geofabrik and plot local road
density together with PMA summary data for each EA.
To get started, we’ll load the following packages in R. (If you haven’t installed all of these yet, you can do so with install.packages first.)
Next, we’ll obtain a long-format longitudinal extract with members of the Kenya panel study (Female Respondents only). To keep things simple, we’ll filter to include only records from the Phase 2 interview.1
dat <- read_ipums_micro(
ddi = "data/pma_00149.xml",
data = "data/pma_00149.dat.gz"
)
dat <- dat %>% filter(YEAR == 2020 & RESULTFQ == 1)
We’ve mentioned previously on this blog that researchers must apply with PMA to access the GPS coordinates for each EA. These GPS data come in the form of a CSV file of displaced centroid points in latitude/longitude format. The PMA documentation goes into more detail about this displacement methodology. For our purposes, it is enough to keep in mind that the point locations in our GPS data are approximate: they represent the center of each sample cluster, but this center is randomly displaced up to 10 kilometers in order to further protect participant confidentiality.
The GPS data come in CSV format, which we’ll load into R with read_csv.
We’ll then use the sf
package to convert the imported data into an sf
spatial
object, which will use the coordinate reference system
WGS 84.2 We can pass EPSG code 4326 to st_set_crs
to indicate that our coordinates are relative to the WGS 84 system.
Now that we have our PMA data and EA coordinates prepared, we can
start working with leaflet
itself.
For those familiar with ggplot2
, the conceptual
framework when building a leaflet
map should be fairly
familiar. In both cases, we’ll construct our final visual product in a
series of layers, which will be superimposed to form the final
output.
Accordingly, when we initialize a leaflet
map with its
eponymous function, leaflet, we see
only a blank gray canvas, just like we might expect with
ggplot2
:
leaflet()
As with ggplot
, the leaflet
function allows
you to attach data and set several other map-wide parameters.
At its core, though, calling leaflet
is enough to get a
map going.
Most web basemaps rely on a set of square tiles that can be stitched
together to form an entire map. This improves processing since only the
new tiles in the region being viewed need to be downloaded as the user
pans or zooms. leaflet
has several built-in basemaps, which
can be added with addProviderTiles.
Available map providers can be accessed using providers.
For our maps, we’ll use the Positron tiles from Carto. These understated basemap tiles are well-suited for emphasizing our data, which we will overlay later.
leaflet() %>%
addProviderTiles(providers$CartoDB.PositronNoLabels)
We can set some initial parameters using the options
argument in addProviderTiles
. This argument itself takes a
function, providerTileOptions,
whose arguments correspond to several of the options available for
standard JavaScript leaflet
maps. Here, we set the minimum
and maximum zoom range for the map. We then use the setView
function to initialize the map over our area of interest with an
appropriate zoom level.
basemap <- leaflet() %>%
addProviderTiles(
provider = providers$CartoDB.PositronNoLabels,
options = providerTileOptions(minZoom = 6, maxZoom = 12),
) %>%
setView(lat = -0.06, lng = 37.48, zoom = 7)
basemap
Even though our basemap shows up-to-date borders from OpenStreetMap, it would be nice to emphasize Kenya’s administrative regions to draw the viewer’s attention to our area of interest. We load a shapefile of Kenya’s administrative borders for use in our maps:
borders_sub <- st_read("data/ke_boundaries/ke_boundaries.shp")
To get a border for the entire country, we can dissolve the internal administrative region borders to the external border with st_union:
Just like with ggplot2
, leaflet
has a
variety of similar layer types,
each of which is specific to the types of spatial features being mapped.
For instance, data of point locations call for the use of
addCircles
or addCircleMarkers
, whereas data
of polygon areas would require addPolygons
.
Each of these functions anticipates a map
object as its
first argument. This makes leaflet
maps respond well to the
pipe syntax you may be familiar with from data manipulation in the
tidyverse
, since each of these functions also
returns a map
object. The only difference is that
instead of piping data through each step in the processing pipeline,
we’re piping our map.
So, to add both our borders and our EA coordinates to the map, we
simply have to pipe our basemap into a call to addPolygons
where we specify that the data associated with this map layer come from
our borders_sub
object. Then we can pipe this output
directly into a call to addCircleMarkers
, adding the EA
points to the map. The other function arguments, like
fillColor
, opacity
, and so on, control the
aesthetics of these layers.
basemap %>%
addPolygons(
data = borders_sub,
fill = NA,
opacity = 0.8,
weight = 2,
color = "#9D9D9D"
) %>%
addCircleMarkers(
data = ea_coords,
radius = 4,
fillColor = "#575757",
fillOpacity = 0.8,
color = "#1D1D1D",
opacity = 0,
weight = 1.5
)