Because the IPUMS PMA data website allows users to combine samples from multiple countries and multiple years together in a single download, tools like facet_wrap and facet_grid are an indispensable part of our data visualization toolkit. As we’ve shown elsewhere on this blog, facets are a simple way to create plots for multiple samples displayed as an array of panels.

For instance, in this recent post, we used faceted bar charts to show change in contraceptive use status for women in six samples arranged over 18 panels.

In this post, we’ll practice building choropleth maps showing modern contraceptive prevalence¹ for major (admin 1) regions in several PMA samples. As with the bar charts shown above, we’ll use facets to arrange data for each sample together in an array of panels; however, we’ll see that facet_wrap and facet_grid may not always be the best option for maps. Instead, we’ll introduce a new package, cowplot, designed to bring additional flexibility to the alignment and arrangement of panels built with ggplot2.

We’ve created a summary table modern_tbl for this post with variables from the current or recent family planning use variable group, where the column PCT gives the estimated modern method prevalence within the population of women aged 15-49 in each region GEO in each sample year YEAR for each of three countries - Burkina Faso, Ethiopia, and Uganda - listed in COUNTRY We’ve also attached GPS coordinate vectors representing the boundary of each region in the geometry column - these come from a shapefile downloaded from this page on the IPUMS PMA data website. (If you’re interested, check out our data preparation steps by clicking the button below).

Show data preparation steps

library(tidyverse)
library(ipumsr)
library(srvyr)
library(sf)
library(ggspatial)

# Load IPUMS PMA data extract into R
# This sample includes data from 3 countries, all available cross-sections
dat <- read_ipums_micro(
  ddi = "data/pma_00143.xml",
  data = "data/pma_00143.dat.gz"
)

# `COUNTRY` as a factor (for readability)
dat <- dat %>% mutate(COUNTRY = as_factor(COUNTRY))

# Create unique `YEAR` labels where multiple rounds of data were drawn
dat <- dat %>%
  mutate(
    YEAR = case_when(
      COUNTRY == "Ethiopia" & ROUND == 1 ~ "2014a",
      COUNTRY == "Ethiopia" & ROUND == 2 ~ "2014b",
      COUNTRY == "Uganda" & ROUND == 2 ~ "2015a",
      COUNTRY == "Uganda" & ROUND == 3 ~ "2015b",
      COUNTRY == "Burkina Faso" & ROUND == 3 ~ "2016a",
      COUNTRY == "Burkina Faso" & ROUND == 4 ~ "2016b",
      TRUE ~ as.character(YEAR)
    )
  )

# Uganda has 15 regions in one sample, but just 10 in others 
# Harmonize regions so that all samples have the same 
dat <- dat %>% 
  mutate(
    GEOUG = GEOUG %>% 
      lbl_relabel(
        lbl(5, "Kampala") ~ .val == 0,
        lbl(1, "Central 1") ~ .val == 1,
        lbl(2, "Central 2") ~ .val == 2,
        lbl(3, "East Central, Eastern") ~ .val %in% 3:6,
        lbl(6, "Karamoja") ~ .val == 7,
        lbl(7, "North") ~ .val %in% 8:9,
        lbl(8, "South West") ~ .val %in% 13:14,
        lbl(9, "West Nile") ~ .val == 10,
        lbl(10, "Western") ~ .val %in% 11:12
      ) %>% 
      as_factor() %>% 
      if_else(is.na(.), as_factor(GEOUGSH), .)
  )

# Now, rename any regions that do not match the labels in our shapefile 
# Harmonize region vars for all countries together in a single var, `GEO`
dat <- dat %>% 
  mutate(
    across(starts_with("GEO"), ~as_factor(.x) %>% as.character),
    GEOBF = GEOBF %>% recode("Plateau-Central" = "Plateau Central"),
    GEOET = GEOET %>% recode("Oromiya" = "Oromia"),
    GEOUG = GEOUG %>% recode("South West" = "Southwest"),
    GEO = case_when(
      !is.na(GEOUG) ~ GEOUG,
      !is.na(GEOET) ~ GEOET,
      !is.na(GEOBF) ~ GEOBF
    )
  )

# A `modern_user` is anyone using a method other than rhythm, withdrawal, or
# other traditional 
dat <- dat %>% 
  mutate(
    modern_user = if_any(
      starts_with("FPNOW") & -c(FPNOWUSRHY, FPNOWUSWD, FPNOWUSTRAD),
      ~.x %in% 1 # note: `%in%` generates FALSE if NA 
    )
  )

# Estimate the population proportion for modern use / non-use in each `GEO`
modern_tbl <- dat %>% 
  as_survey_design(weight = FQWEIGHT, id = EAID, strata = COUNTRY) %>%
  group_by(YEAR, COUNTRY, GEO, modern_user) %>%
  summarise(pct = 100*survey_mean(vartype = NULL)) %>% 
  ungroup()

# We use `pivot_wider` to create 0% values if no modern users were found
modern_tbl <- modern_tbl %>% 
  pivot_wider(
    c(YEAR, COUNTRY, GEO), 
    names_from = modern_user, 
    values_from = pct,
    values_fill = 0
  ) %>% 
  select(-`FALSE`) %>% 
  rename(PCT = `TRUE`)

# Load shapefile and harmonize variable names with `dat`
# Include only our 4 countries of interest
shape <- st_read("../../data_local/shapefiles/subnational") %>% 
  st_transform(crs = 4326) %>% 
  select(COUNTRY = CNTRY_NAME, GEO = ADMIN_NAME) %>% 
  filter(COUNTRY %in% c("Burkina Faso", "Ethiopia", "Uganda"))

# Ensure that all shapes are fully enclosed, and smooth borders to 1000 meters 
# Drop any regions marked "Waterbodies" (no households are there)
shape <- shape %>% 
  st_make_valid() %>% 
  st_simplify(dTolerance = 1000) %>% 
  filter(GEO != "Waterbodies")

# Attach shapes for each `GEO`
modern_tbl <- modern_tbl %>% full_join(shape) 

# If no data collected in a region, duplicate `NA` for each `YEAR`
# This requires `pivot_wider` to create placeholder `NA` values
# Then, `pivot_longer` again: new rows appear for each `YEAR`
modern_tbl <- modern_tbl %>% 
  pivot_wider(
    names_from = YEAR,
    values_from = PCT
  ) %>% 
  select(-`NA`) %>% 
  pivot_longer(
    where(is.double),
    names_to = "YEAR",
    values_to = "PCT"
  ) 

# Note that some countries may have 2 samples drawn in the same year, but others 
# included only 1 sample. Drop any extra `YEAR` labels for each `COUNTRY` here
modern_tbl <- modern_tbl %>% 
  group_by(COUNTRY, YEAR) %>% 
  filter(!all(is.na(PCT))) %>% 
  ungroup()

# Rearrange columns and rows for improved display 
modern_tbl <- modern_tbl %>% 
  relocate(c(YEAR, PCT), .after = GEO) %>% 
  arrange(COUNTRY, YEAR, GEO)

modern_tbl

# A tibble: 260 × 5
   COUNTRY      GEO               YEAR    PCT                           geometry
   <chr>        <chr>             <chr> <dbl>                 <MULTIPOLYGON [°]>
 1 Burkina Faso Boucle du Mouhoun 2014  20.2  (((-3.190611 13.68045, -3.226559 …
 2 Burkina Faso Cascades          2014  12.9  (((-5.39324 10.99398, -5.416136 1…
 3 Burkina Faso Centre            2014  21.9  (((-1.572501 12.60137, -1.566564 …
 4 Burkina Faso Centre-Est        2014  17.2  (((-0.2427228 12.55885, -0.292054…
 5 Burkina Faso Centre-Nord       2014  11.4  (((-0.625751 13.99783, -0.6899 13…
 6 Burkina Faso Centre-Ouest      2014  10.7  (((-2.519715 12.83308, -2.557285 …
 7 Burkina Faso Centre-Sud        2014  28.1  (((-1.459765 12.19329, -1.508548 …
 8 Burkina Faso Est               2014  19.0  (((0.0653133 13.48729, 0.0587278 …
 9 Burkina Faso Hauts-Bassins     2014  18.6  (((-4.4633 12.07738, -4.499922 12…
10 Burkina Faso Nord              2014   9.68 (((-2.119268 14.15666, -2.353057 …
# … with 250 more rows
# ℹ Use `print(n = ...)` to see more rows

Let’s look at an example featuring data from just one country first. We’ll use our favorite mapping package ggspatial, since it allows us to use familiar tools from ggplot2 to customize the layout of our panel array. Most crucially, we use facet_wrap to create one panel for each sample YEAR. Additionally, we’ll define the fill color for each region with scale_fill_gradient: we’ll use a red color #BD262D for high-levels on modern method prevalence, and we’ll use transparent for regions where no modern method users were sampled. The grey shade #CFCFCF will represent regions were no women were sampled.

Finally, we’ll customize the look of our plot with a theme we’ll call theme_pma. We encourage you to experiment with your own theme, but feel free to use ours as a template if you’re interested.

How to build theme_pma

# Define `theme_pma` for maps
library(showtext)
sysfonts::font_add(
  family = "cabrito", 
  regular = "../../fonts/cabritosansnormregular-webfont.ttf"
)
showtext::showtext_auto()

theme_pma <- theme_void() %+replace% 
  theme(
    text = element_text(family = "cabrito", size = 13),
    plot.title = element_text(
      size = 22, 
      color = "#00263A", 
      hjust = .5, 
      margin = margin(b = 10)
    ),
    plot.caption = element_text(
      size = 22, 
      color = "#00263A", 
      hjust = .5, 
      margin = margin(t = 10)
    ),
    panel.spacing = unit(1, "lines"),
    legend.position = "right"
  )

ggplot() + 
  layer_spatial(modern_tbl %>% filter(COUNTRY == "Ethiopia"), aes(fill = PCT)) + 
  facet_wrap(vars(YEAR), strip.position = "bottom", ) + 
  scale_fill_gradient(
    low = "transparent",
    high = "#BD262D",
    na.value = "#CFCFCF"
  ) + 
  labs(
    title = "Modern Contraceptive Prevalence in Ethiopia: 2014-2019",
    fill = "Percent Women \nAge 15-49"
  ) + 
  theme_pma

Our use of facet_wrap works well in this case because the spatial extent of each panel is identical: we’ve only plotted the same country seven times. Watch what happens, though, when we try the same approach to faceting maps with different spatial extents. Here, we’ll create one choloropleth for each of three countries sampled by PMA in 2019.

ggplot() + 
  layer_spatial(modern_tbl %>% filter(YEAR == "2019"), aes(fill = PCT)) + 
  facet_wrap(vars(COUNTRY), strip.position = "bottom", ) + 
  scale_fill_gradient(
    low = "transparent",
    high = "#BD262D",
    na.value = "#CFCFCF"
  ) + 
  labs(
    title = "Modern Contraceptive Prevalence: 2019",
    fill = "Percent Women \nAge 15-49"
  ) + 
  theme_pma

The problem here is that facet_wrap attempts to use the same scales along the x and y-axes by default. When we facet by COUNTRY, each map is drawn to scale and in absolute position. Instead, we want each country’s map to appear approximately the same size and in the center of each panel.

In most ggplot2 applications, we’d resolve this problem by setting scales = "free". This should allow facet_wrap to identify axes that fit the data in each panel. Instead, it generates the following error:

ggplot() + 
  layer_spatial(modern_tbl %>% filter(YEAR == "2019"), aes(fill = PCT)) + 
  facet_wrap(vars(COUNTRY), strip.position = "bottom", scales = "free") + 
  scale_fill_gradient(
    low = "transparent",
    high = "#BD262D",
    na.value = "#CFCFCF"
  ) + 
  labs(
    title = "Modern Contraceptive Prevalence: 2019",
    fill = "Percent Women \nAge 15-49"
  ) + 
  theme_pma

Error in `f()`:
! coord_sf doesn't support free scales

Fortunately, we were able to find this excellent blog post with an explanation and work-around.

It turns out the the ggplot2 codebase assumes that it can maniulate axes independently of one another. This is very much not the case with geographic data where a meter vertically needs to equal a meter horizontally, so coord_sf() locks the axes in much the same manner as coord_fixed().

— Rob Williams, Data Scientist

Williams suggests a different approach using the cowplot package developed by biologist Claus O. Wilke at the University of Texas at Austin. We’ve never featured cowplot before on our blog, so we first needed to install it from CRAN:

install.packages("cowplot")

library(cowplot)

The key difference in this approach is that you create each panel separately, and then use cowplot to align them. Panels can be arranged in rows or columns, and their sizes can be identical or individually specified. You can also overlay plots on top of each other to produce annotations or even watermarks. A general introduction is available here.

Cowplot: combining countries

The function cowplot::plot_grid takes a list of plots as input, so we first need to write a function that will generate one plot for each of our four 2019 samples. We’ll start with Ethiopia first:

ggplot() + 
  layer_spatial(
    modern_tbl %>% filter(COUNTRY == "Ethiopia" & YEAR == 2019),
    aes(fill = PCT)
  ) + 
  scale_fill_gradient(
    low = "transparent",
    high = "#BD262D",
    na.value = "#CFCFCF",
    guide = guide_colorbar(
      title.position = "left", 
      direction = "horizontal", 
      barwidth = unit(5, "cm")
    )
  ) + 
  labs(
    caption = "Ethiopia",
    fill = "Percent Women \nAge 15-49"
  ) + 
  theme_pma

If we want to switch countries to Burkina Faso, for example, we only need to replace the string "Ethiopia" in two places.

ggplot() + 
  layer_spatial(
    modern_tbl %>% filter(COUNTRY == "Burkina Faso" & YEAR == 2019),
    aes(fill = PCT)
  ) + 
  scale_fill_gradient(
    low = "transparent",
    high = "#BD262D",
    na.value = "#CFCFCF",
    guide = guide_colorbar(
      title.position = "left", 
      direction = "horizontal", 
      barwidth = unit(5, "cm")
    )
  ) + 
  labs(
    caption = "Burkina Faso",
    fill = "Percent Women \nAge 15-49"
  ) + 
  theme_pma

Notice that the range of values shown on our legend changes automatically when we switch countries? When we combine countries into a single plot, we’ll want to specify these values manually so that the same fill color represents the same value on every plot. We’ll use 0% as a minimum value, but the maximum value should be only slightly higher than the maximum value in our data: this maximizes the dynamic color range between any two data points.

modern_tbl %>% filter(YEAR == 2019) %>% summarise(max(PCT, na.rm = TRUE))

# A tibble: 1 × 1
  `max(PCT, na.rm = TRUE)`
                     <dbl>
1                     36.5

We’ll now set limits at 0% and 37% in scale_fill_gradient. (As expected, this decreases the dynamic color range between each region.)

ggplot() + 
  layer_spatial(
    modern_tbl %>% filter(COUNTRY == "Burkina Faso" & YEAR == 2019),
    aes(fill = PCT)
  ) + 
  scale_fill_gradient(
    low = "transparent",
    high = "#BD262D",
    na.value = "#CFCFCF",
    guide = guide_colorbar(
      title.position = "left", 
      direction = "horizontal", 
      barwidth = unit(5, "cm")
    ),
    limits = c(0, 37)  # Legend limits inserted here 
  ) + 
  labs(
    caption = "Burkina Faso",
    fill = "Percent Women \nAge 15-49"
  ) + 
  theme_pma

Finally, we’ll use the map function from the purrr package to build a list containing one map for each country. Remember: here map refers to a function applied iteratively to each item in a list - not the kind of spatial map we’re trying to add to our plot! The easiest way to reference a list item within a mapped function is to use the symbol ~, which signals an anonymous funciton where the list item is represented by the variable .x. Here, we use .x in place of each country name in our data.

plotlist <- unique(modern_tbl$COUNTRY) %>% 
  map(~{
    ggplot() + 
      layer_spatial(
        modern_tbl %>% filter(COUNTRY == .x & YEAR == 2019),
        aes(fill = PCT)
      ) + 
      scale_fill_gradient(
        low = "transparent",
        high = "#BD262D",
        na.value = "#CFCFCF",
        guide = guide_colorbar(
          title.position = "left", 
          direction = "horizontal", 
          barwidth = unit(10, "cm")
        ),
        limits = c(0, 37)
      ) + 
      labs(
        caption = .x,
        fill = "Percent Women \nAge 15-49"
      ) + 
      theme_pma 
  })

Now, plotlist is a list with four items: one plot per country. The function unique(modern_tbl$COUNTRY) listed our countries alphabetically:

plotlist[[1]]

plotlist[[2]]

plotlist[[3]]

Before we assemble these plots in an array, we’ll want to extract one copy of the shared legend. Here, we use get_legend to extract the legend from Burkina Faso:

legend <- get_legend(plotlist[[1]])

We can now strip all of the legends from each of the individual panels with another map function.

plotlist <- plotlist %>% map(~.x + theme(legend.position = "none"))

We’ll also build a title with ggdraw that will span all three panels. Make sure to check out the complete list of cowplot functions available to help customize your plot.

title <- ggdraw() + 
  draw_label(
    toupper("Modern Contraceptive Prevalence: 2019"),
    fontfamily = 'cabrito',
    x = 0.5, 
    y = 0.5,
    hjust = 0.5,
    color = "#00263A",
    size = 30
  )

title

We’ll build our final plot in two steps, both using plot_grid. The first arranges our three panels in a single row with three columns. It took some trial and error to get the set the relative width of each panel with rel_widths (for example, the Uganda plot is about 72% narrower than the Burkina Faso plot: if both were plotted with the same width, the Uganda plot would be taller).

output <- plot_grid(plotlist = plotlist, nrow = 1, rel_widths = c(1, 0.95, 0.72))

output

The second step arranges this output between our title and legend, creating three rows total. Here, we use rel_heights to make the output row three times taller than the rows containing title and legend.

plot_grid(
  title,
  output,
  legend,
  ncol = 1, 
  rel_heights = c(1, 3, 1)
)

And there you have it! With cowplot, it’s easy to build an array of maps - or any combination of plots - with a shared title, legend, or any other annotation you might need. And, because it’s built to work with ggplot2, cowplot works perfectly with our existing data visualization workflow.

Surveyed methods may vary by country and within the same country following a survey redesign period between rounds. Modern methods could include: female sterilization, male sterilization, implants, IUDs, injectables, pills, emergency contraception, male condoms, female condoms, diaphragms, foam, beads, N tablet, and lactational amenorrhea method (LAM).↩︎

Arranging maps in a grid

Facets: one country over time

Cowplot: combining countries

Corrections