When we announced the release of new longitudinal data from PMA this month, we mentioned two surveys from Côte d’Ivoire and Nigeria that include a special focus on women’s experiences seeking and accessing abortion services.

Both surveys are follow-up interviews with a subset of women who participated in a nationally representative, cross-sectional survey conducted in 2018. These are women who responded “yes” in 2018 to one of these questions:

Have you ever done something to remove a pregnancy when you were pregnant or worried you were pregnant?
Have you ever done something to regulate your period when you were worried you were pregnant?

Why do PMA surveys include multiple ways of asking about abortion? The answer, explained by Bell and Fissell (2021, pg. 507), is that conventional measures of abortion incidence are prone to substantial underreporting if they fail to account for procedures undertaken when a woman’s pregnancy status is ambiguous:

For example, most surveys studying induced abortion unambiguously ask women about actions they may have taken to end a pregnancy, but such questions leave no room to account for women’s actions that consciously or unconsciously attempt to regulate fertility in the period of time that we are characterizing as ambiguous, neither pregnant nor nonpregnant. Nor can we say that there is a simple third state of “maybe pregnant”; both ethnographic and historical data suggest that complex mixtures of perception and denial, a wide range of models of the reproductive body, and myriad other factors create rich and complex ambiguity rather than merely a simple third state.

Elsewhere, Bell et al. (2020a; 2020b) discuss methods for estimating abortion incidence with both of these 2018 PMA samples. They demonstrate how to estimate incidence with:

Affirmative responses only to the pregnancy removal question
Affirmative responses to either question
An average of both of the above measures.¹

In this post, we’ll share code you can use to reproduce the abortion incidence shown in both of these publications.

Because readers may wish to compare the authors’ final averaged measure against the incidence derived from either or both of the source questions, this is a particularly good opportunity to leverage an interactive dashboard built with the shiny package for R.

In this case, we’ll want to allow the user to switch between samples, incidence measures, and several disaggregation variables representing each woman’s:

age
education
household wealth
marital status
area of residence (urban / rural)

Setup

Throughout this ongoing series on PMA abortion data, we’ll be showcasing the same longitudinal (wide format) extract downloaded from IPUMS PMA. For this post, we’ll focus only on responses from the 2018 baseline surveys in both Côte d’Ivoire and Nigeria - over the next few weeks, we’ll also investigate abortion data collected each woman’s closest confidantes, and on her responses to the abortion follow-up survey conducted in 2019-2020.

To follow along, you’ll need to select a longitudinal (wide format) with Female Respondents only - these are women who completed the 2018 baseline survey (even if they were not included in the 2019-2020 follow-up). You’ll also need the following variables:

COUNTRY - country of residence
FQWEIGHT - sampling weight for 2018 survey
EAID - sample cluster (enumeration area)
STRATA - sample strata
INTFQCMC - CMC for date of 2018 interview
AGE - age
URBAN - area of residence (urban / rural)
EDUCATTGEN - education
WEALTHQ - household wealth
MARSTAT - marital status
ABORYR - year of last aborted pregnancy
REGYR - year of regulated period

In our extract, these variables will each appear twice: those ending with _1 come from the 2018 baseline surveys, with those ending with _2 come from the 2019-2020 follow-ups. Today’s analysis will focus only on the former.

After your extract has been downloaded and saved to a data folder in your working directory, attach the following packages and load your extract as an object named pma.

library(tidyverse)
library(ipumsr)
library(srvyr)

pma <- read_ipums_micro(
  ddi = "data/pma_00192.xml",
  data = "data/pma_00192.dat.gz"
)

Next, drop any women who are not part of the de facto population - in these samples, women outside of this population have a sampling weight of 0 in FQWEIGHT_1.

pma <- pma %>% filter(FQWEIGHT_1 != 0)

Analytic Dataset

Now, we’ll build an analytic dataset dat derived from the variables shown above. Most importantly, we’ll need to consider how Bell et al. report abortion incidence: they show an estimated annual abortion count per 1,000 women in each country.

The variables ABORYR and REGYR report the calendar year for each woman’s most recent pregnancy termination and period regulation, but PMA does not collect the precise month in which each event might have occurred.

Because of this limitation, we’ll include any procedure undertaken throughout all of the 2017 calendar year and any month before each woman’s 2018 interview. Create these four variables:

abor indicating whether the woman’s most recent pregnancy termination occurred after January 1, 2017
reg indicating whether the woman’s most recent period regulation occurred after January 1, 2017
any indicating whether the woman’s most recent pregnancy termination or period regulation occurred after January 1, 2017
avg the mean value of abor and any (reported by Bell et al.)
yrs - the number of months between January 1, 2017 and the date of a woman’s interview divided by 12 (e.g. 1.5 represents 18 months)

dat <- pma %>% 
  mutate(
    .keep = "none",
    reg = REGYR_1 %in% 2017:2018,
    abor = ABORYR_1 %in% 2017:2018,
    any = reg | abor,
    avg = pick(abor, any) %>% rowMeans,
    yrs = (INTFQCMC_1 - 1405)/12, # Jan 2017 as CMC = 1405
  )

transmute was superseded in favor of mutate(.keep = "none") in the latest major version of dplyr (version 1.0.0). We use this to drop any variables from pma that aren’t created directly by mutate.

Next, we’ll consider how to disaggregate abortion incidence measures in groups defined by those variables mentioned above. It will be helpful to create a few meaningful categories for each variable, and then transform the result into a factor (this will control the ordering of categories along the x-axis in our final plots). We’ll also create a variable all to show the overall abortion incidence for all women of reproductive age in each country.

dat <- pma %>% 
  mutate(
    .keep = "none",
    age = AGE_1 %>% 
      case_match(
        15:19 ~ "15-19",
        20:24 ~ "20-24",
        25:29 ~ "25-29",
        30:34 ~ "30-34",
        35:39 ~ "35-39",
        40:49 ~ "40+",
      ) %>% 
      fct_relevel("15-19","20-24","25-29","30-34","35-39"),
    urban = URBAN %>% as_factor(),
    edu = EDUCATTGEN_1 %>% 
      case_match(
        1 ~ "Never",
        2 ~ "Primary",
        3 ~ "Secondary",
        4 ~ "Higher"
      ) %>% 
      fct_relevel("Never", "Primary", "Secondary"),
    wealth = WEALTHQ_1 %>% 
      as_factor() %>% 
      str_remove(" quintile") %>% 
      fct_relevel("Lowest", "Lower", "Middle", "Higher"),
    marstat = MARSTAT_1 %>% 
      case_match(
        10 ~ "Never Married",
        21:22 ~ "Partnered",
        31:32 ~ "Separated / Widowed"
      ) %>% 
      fct_relevel("Never Married", "Partnered"),
    all = "All" 
  ) %>% 
  bind_cols(dat)

Finally, we’ll tweak the remaining technical variables for improved readability.

dat <- pma %>% 
  mutate(
    .keep = "none",
    country = COUNTRY %>% as_factor, 
    weight = FQWEIGHT_1 %>% zap_labels,
    eaid = EAID_1 %>% zap_labels,
    strata = STRATA_1 %>% zap_labels,
  ) %>% 
  bind_cols(dat)

Summary statistics

Your analytic dataset should now look something like this:

dat

# A tibble: 13,844 × 15
   country  weight  eaid strata   yrs age   urban edu   wealth marstat all   reg   abor  any     avg
   <fct>     <dbl> <dbl>  <int> <dbl> <fct> <fct> <fct> <fct>  <fct>   <chr> <lgl> <lgl> <lgl> <dbl>
 1 Cote d'…  0.574 11543  38401  1.5  30-34 Urban Seco… Higher Partne… All   FALSE FALSE FALSE   0  
 2 Cote d'…  1.26  11498  38401  1.5  15-19 Urban Never Highe… Never … All   FALSE FALSE FALSE   0  
 3 Cote d'…  0.766 11148  38402  1.5  20-24 Rural Prim… Middle Partne… All   FALSE FALSE FALSE   0  
 4 Cote d'…  0.983 11142  38401  1.5  15-19 Urban Seco… Higher Never … All   FALSE FALSE FALSE   0  
 5 Cote d'…  1.45  11279  38401  1.5  30-34 Urban Seco… Highe… Never … All   FALSE FALSE FALSE   0  
 6 Cote d'…  0.653 11989  38401  1.5  40+   Urban Never Middle Separa… All   FALSE FALSE FALSE   0  
 7 Cote d'…  1.15  11040  38402  1.58 40+   Rural Prim… Middle Separa… All   FALSE FALSE FALSE   0  
 8 Cote d'…  0.627 11331  38401  1.58 15-19 Urban Prim… Highe… Never … All   FALSE FALSE FALSE   0  
 9 Cote d'…  0.875 11898  38401  1.5  20-24 Urban High… Higher Never … All   TRUE  FALSE TRUE    0.5
10 Cote d'…  0.983 11142  38401  1.5  25-29 Urban Never Highe… Partne… All   FALSE FALSE FALSE   0  
# … with 13,834 more rows

If so, you’re ready to calculate incidence measures for each of the derived variables reg, abor, any, and avg. Because our data extract includes two completely independent samples (one per country), we’ll need to think about three iterative steps:

We’ll cycle once through the data collected from each country
Within the data for each country, we’ll cycle through each of our disaggregation variables
For each of those variables, we’ll calculate one estimate (and a 95% confidence interval) for each of our four abortion incidence measures

Before we show how to write code that performs each of these steps in sequence, let’s start with a simple example.

Incidence by education

Suppose we only wanted to estimate incidence via pregnancy termination (leaving aside period regulation) for women sorted by educational attainment in Nigeria.

Bell and Fissell (2021, pg 520) point to differences in education as a possible source of error in measures of incidence like those derived from our variable abor (termination only).

…if more educated women are more likely to interpret a missed menses as a possible pregnancy and to seek out pregnancy confirmation, researchers may be more likely to capture their behaviors with traditional questions regarding induced abortion/ending a pregnancy. Conversely, if less-educated women are less likely to view their experience in these terms, we may disproportionally underestimate their post-coital behaviors to attempt to regulate fertility. This difference could result in invalid estimates of these behaviors among different subpopulations.

Let’s check! We’ll first define information about the Nigeria survey design and then group_by our education variable edu (dropping any missing values only afterward to preserve the correct degrees of freedom from our full sample).²

dat %>% 
  filter(country == "Nigeria") %>% 
  as_survey_design(weight = weight, id = eaid, strata = strata, nest = TRUE) %>% 
  filter(!is.na(edu)) %>%
  group_by(edu) %>% 
  summarise(1000 * survey_mean(abor /yrs, vartype = "ci", proportion = TRUE))

# A tibble: 4 × 4
  edu        coef `_low` `_upp`
  <fct>     <dbl>  <dbl>  <dbl>
1 Never      6.32   3.19   12.5
2 Primary   15.2   10.1    22.6
3 Secondary 24.2   19.3    30.4
4 Higher    24.8   16.1    38.0

If we only use abor, we estimate that about 6 in 1,000 women with no formal education will report an abortion within one year. Conversely, 24 in 1,000 women with secondary education or higher education report an abortion within one year: the proportions for these groups are nearly quadrupled compared with women with no formal education! This disparity seems to support Bell and Fissell’s argument, but we’ll want to generate estimates from our other abortion measures for further context.

Here, we’ll use across to iterate over each of our measures - including avg, corresponding with the results reported in Bell et al. (2020b).

dat %>% 
  filter(country == "Nigeria") %>% 
  reframe(across(
    c(reg, abor, any, avg), 
    ~pick(everything()) %>% 
      as_survey_design(weight = weight, id = eaid, strata = strata, nest = TRUE) %>% 
      group_by(edu) %>% 
      summarise(1000 * survey_mean(.x / yrs, vartype = "ci", proportion = TRUE))
  )) %>% 
  pivot_longer(where(is.list), names_to = "measure")

# A tibble: 16 × 2
   measure value$edu $coef $`_low` $`_upp`
   <chr>   <fct>     <dbl>   <dbl>   <dbl>
 1 reg     Never      8.88    4.66    16.9
 2 abor    Never      6.32    3.19    12.5
 3 any     Never     13.6     8.17    22.5
 4 avg     Never      9.95    5.94    16.6
 5 reg     Primary   21.3    14.0     32.2
 6 abor    Primary   15.2    10.1     22.6
 7 any     Primary   33.8    24.8     46.0
 8 avg     Primary   24.5    18.0     33.3
 9 reg     Secondary 29.2    20.6     41.3
10 abor    Secondary 24.2    19.3     30.4
11 any     Secondary 48.7    38.5     61.4
12 avg     Secondary 36.5    29.5     44.9
13 reg     Higher    33.1    23.2     47.1
14 abor    Higher    24.8    16.1     38.0
15 any     Higher    56.5    43.2     73.6
16 avg     Higher    40.7    30.5     54.1

Notice here that the number of women with no formal education reporting an abortion roughly doubles (from 6 in 1,000 to nearly 14) if we define incidence with any (either pregnancy termination or period regulation). The measure avg puts our estimate directly between abor and any at about 10 in 1,000 women.

Other groups

As a reader, wouldn’t it be nice to be able to cycle through multiple samples and disaggregation measures? Next, we’ll build an summary table that we can incorporate into an application that will do exactly that!

Building on the previous example, we’ll now iterate through each of the disaggregation measures we’ve discussed. We’ll wrap everything within one larger iteration completed with group_by for each country.

dat <- dat %>% 
  group_by(country) %>% 
  reframe(across(
    c(urban, edu, wealth, marstat, age, all), 
    ~pick(everything()) %>% 
      rename(group = .x) %>% 
      filter(!is.na(group)) %>% 
      reframe(across(
        c(reg, abor, any, avg), 
        ~pick(everything()) %>% 
          as_survey_design(weight = weight, id = eaid, strata = strata, nest = TRUE) %>% 
          group_by(group) %>% 
          summarise(1000 * survey_mean(.x / yrs, vartype = "ci", proportion = TRUE)) 
      )) %>% 
      pivot_longer(where(is.list), names_to = "measure") %>% 
      unnest(value) %>% 
      list()
  )) 

dat

# A tibble: 2 × 7
  country       urban    edu      wealth   marstat  age      all     
  <fct>         <list>   <list>   <list>   <list>   <list>   <list>  
1 Nigeria       <tibble> <tibble> <tibble> <tibble> <tibble> <tibble>
2 Cote d'Ivoire <tibble> <tibble> <tibble> <tibble> <tibble> <tibble>

We’ll reference each of these lists by country within our shiny app. For example, here are the urban results for each country.

dat %>% 
  select(country, urban) %>% 
  unnest(urban)

# A tibble: 16 × 6
   country       measure group  coef `_low` `_upp`
   <fct>         <chr>   <fct> <dbl>  <dbl>  <dbl>
 1 Nigeria       reg     Rural  14.7   8.99   23.9
 2 Nigeria       abor    Rural  12.7   9.42   17.0
 3 Nigeria       any     Rural  24.9  17.7    34.8
 4 Nigeria       avg     Rural  18.8  13.8    25.4
 5 Nigeria       reg     Urban  33.2  23.6    46.3
 6 Nigeria       abor    Urban  25.2  18.7    33.8
 7 Nigeria       any     Urban  54.6  43.0    69.1
 8 Nigeria       avg     Urban  39.9  31.6    50.2
 9 Cote d'Ivoire reg     Rural  23.3  12.0    44.9
10 Cote d'Ivoire abor    Rural  15.5   7.88   30.4
11 Cote d'Ivoire any     Rural  38.9  20.7    71.9
12 Cote d'Ivoire avg     Rural  27.2  14.5    50.5
13 Cote d'Ivoire reg     Urban  19.6  12.7    30.0
14 Cote d'Ivoire abor    Urban  21.6  13.6    34.1
15 Cote d'Ivoire any     Urban  37.1  26.0    52.7
16 Cote d'Ivoire avg     Urban  29.4  20.1    42.7

Shiny Dashboard

Dashboards like this one can be run locally in RStudio, or you can publish them on a server like shinyapps.io. Ours is built as a Quarto document with the following header:

---
title: "ESTIMATED ONE-YEAR ABORTION INCIDENCE"
format: 
  html:
    page-layout: custom
    css: 'css/pma.css'
server: shiny
---

css/pma.css is responsible for incorporating the IPUMS fonts you see throughout this blog. Feel free to omit that line, or include your own css document.

The rest of the document consists of four named chunks:

context: setup - load your data and packages here
panel: sidebar - home to all user inputs
panel: fill - home to plots generated from user inputs
context: server - all back-end computation performed by R

In summary, we use ggplot2 to build a dot-whisker plot for selected abortion estimates for a user-provided country and disaggregation measure. As an extra flourish, the choiceNames argument in checkboxGroupInput uses color-coded text to save space where you would normally see a legend.

```{r, context="setup"}

library(shiny)
library(tidyverse)
dat <- read_rds("www/dat.rds")

```

```{r, panel="sidebar"}

selectInput(
  inputId = "country", 
  label = "Country:", 
  choices = unique(dat$country)
)

selectInput(
  inputId = 'group', 
  label = 'Group by:', 
  choices = list(Age = "age", `Urban / Rural` = "urban", Education = "edu", 
       Wealth = "wealth", `Marital Status` = "marstat"),
)

checkboxGroupInput(
  inputId = "measure",
  label="Abortion Measure:",
  choiceNames = list(
    HTML("<span style='color: #00263A;'>Period Regulation Only</span>"),
    HTML("<span style='color: #98579B;'>Termination Only</span>"),
    HTML("<span style='color: #B56576;'>Termination or Regulation</span>"),
    HTML("<span style='color: #EAAC8B;'>Average of Measures</span>")
  ),
  choiceValues = list("reg", "abor", "any", "avg"),
  selected =  list("reg", "abor", "any", "avg")
)

```

Your inputID in these inputs can be anything: it’s what we use to reference user choices in the server context below.

selectInput creates a dropdown menu with options provided by choices. If you want to change choice labels from the values in your dataset, you can provide them via list names as we do in the group dropdown.

checkboxGroupInput creates a group of check-boxes. Adding color is more complicated than providing list names, so we write HTML code for each option in choiceNames.

```{r, context="server"}

selectedData <- reactive({
  dat %>% 
    filter(country == input$country) %>% 
    unnest(any_of(input$group)) %>% 
    filter(measure %in% input$measure)
})

output$plot <- renderPlot(width = 700, height = 400, expr = {
  selectedData() %>%
    ggplot(aes(x = group, y = coef, color = measure)) + 
    geom_point(position = position_dodge(width = 0.3)) + 
    geom_errorbar(
      aes(ymin = `_low`, ymax = `_upp`), 
      width = 0, 
      position = position_dodge(width = 0.3)
    ) + 
    theme_minimal()  %+replace% theme(
      panel.grid.major.x = element_blank(),
      legend.position = "none"
    ) + 
    labs(x = NULL,color = NULL, y = "Number of abortions per 1,000 women") + 
    scale_y_continuous(limits = c(0, 110)) + 
    scale_color_manual(values = c(
      "reg" = "#00263A",
      "abor" = "#98579B",
      "any" = "#B56576",
      "avg" = "#EAAC8B"
    )) 
})

```

selectedData is a reactive function that reacts to user input referenced by each inputID created in our sidebar.

The object output$plot stores a new ggplot created in reaction to these inputs.

```{r, panel="fill"}

plotOutput('plot')

```

That’s it! Simply hit “Run Document” to test your application in RStudio. For information about deploying it for use by others, check out this article.

Here’s our finished app:

Bell, Suzanne O, and Mary E Fissell. 2021. “A Little Bit Pregnant? Productive Ambiguity and Fertility Research.” Population and Development Review 47 (2): 505–26. https://onlinelibrary.wiley.com/doi/10.1111/padr.12403.

Bell, Suzanne O, Elizabeth Omoluabi, Funmilola OlaOlorun, Mridula Shankar, and Caroline Moreau. 2020. “Inequities in the Incidence and Safety of Abortion in Nigeria.” BMJ Global Health 5 (1): e001814. http://dx.doi.org/10.1136/bmjgh-2019-001814.

Bell, Suzanne O, Grace Sheehy, Andoh Kouakou Hyacinthe, Georges Guiella, and Caroline Moreau. 2020. “Induced Abortion Incidence and Safety in côte d’ivoire.” PloS One 15 (5): e0232364. http://dx.doi.org/10.1371/journal.pone.0232364.

“We averaged the two point estimates because we believe the pregnancy removal data fails to capture some abortions (that women may not view as abortions or are not willing to admit are abortions) while the period regulation data likely include some experiences that we would not consider to be abortions.” (Bell et al. 2020a, pg 5)↩︎
Note: in Nigeria, the same EAID codes are recycled in multiple sample STRATA. You must use nest = TRUE to ensure that these EAs are treated as separate sampling locations!↩︎

Comparing Measures of Abortion Incidence with a Shiny Dashboard