New abortion data from PMA includes women who reported doing something to remove a pregnancy, or to regulate a late period because of a suspected pregnancy.
When we announced the release of new longitudinal data from PMA this month, we mentioned two surveys from Côte d’Ivoire and Nigeria that include a special focus on women’s experiences seeking and accessing abortion services.
Both surveys are follow-up interviews with a subset of women who participated in a nationally representative, cross-sectional survey conducted in 2018. These are women who responded “yes” in 2018 to one of these questions:
Why do PMA surveys include multiple ways of asking about abortion? The answer, explained by Bell and Fissell (2021, pg. 507), is that conventional measures of abortion incidence are prone to substantial underreporting if they fail to account for procedures undertaken when a woman’s pregnancy status is ambiguous:
For example, most surveys studying induced abortion unambiguously ask women about actions they may have taken to end a pregnancy, but such questions leave no room to account for women’s actions that consciously or unconsciously attempt to regulate fertility in the period of time that we are characterizing as ambiguous, neither pregnant nor nonpregnant. Nor can we say that there is a simple third state of “maybe pregnant”; both ethnographic and historical data suggest that complex mixtures of perception and denial, a wide range of models of the reproductive body, and myriad other factors create rich and complex ambiguity rather than merely a simple third state.
Elsewhere, Bell et al. (2020a; 2020b) discuss methods for estimating abortion incidence with both of these 2018 PMA samples. They demonstrate how to estimate incidence with:
In this post, we’ll share code you can use to reproduce the abortion incidence shown in both of these publications.
Because readers may wish to compare the authors’ final averaged measure against the incidence derived from either or both of the source questions, this is a particularly good opportunity to leverage an interactive dashboard built with the shiny package for R.
In this case, we’ll want to allow the user to switch between samples, incidence measures, and several disaggregation variables representing each woman’s:
Throughout this ongoing series on PMA abortion data, we’ll be showcasing the same longitudinal (wide format) extract downloaded from IPUMS PMA. For this post, we’ll focus only on responses from the 2018 baseline surveys in both Côte d’Ivoire and Nigeria - over the next few weeks, we’ll also investigate abortion data collected each woman’s closest confidantes, and on her responses to the abortion follow-up survey conducted in 2019-2020.
To follow along, you’ll need to select a longitudinal (wide format) with Female Respondents only - these are women who completed the 2018 baseline survey (even if they were not included in the 2019-2020 follow-up). You’ll also need the following variables:
In our extract, these variables will each appear twice: those ending with _1
come from the 2018 baseline surveys, with those ending with _2
come from the 2019-2020 follow-ups. Today’s analysis will focus only on the former.
After your extract has been downloaded and saved to a data
folder in your working directory, attach the following packages and load your extract as an object named pma
.
Next, drop any women who are not part of the de facto population - in these samples, women outside of this population have a sampling weight of 0 in FQWEIGHT_1.
Now, we’ll build an analytic dataset dat
derived from the variables shown above. Most importantly, we’ll need to consider how Bell et al. report abortion incidence: they show an estimated annual abortion count per 1,000 women in each country.
The variables ABORYR and REGYR report the calendar year for each woman’s most recent pregnancy termination and period regulation, but PMA does not collect the precise month in which each event might have occurred.
Because of this limitation, we’ll include any procedure undertaken throughout all of the 2017 calendar year and any month before each woman’s 2018 interview. Create these four variables:
abor
indicating whether the woman’s most recent pregnancy termination occurred after January 1, 2017reg
indicating whether the woman’s most recent period regulation occurred after January 1, 2017any
indicating whether the woman’s most recent pregnancy termination or period regulation occurred after January 1, 2017avg
the mean value of abor
and any
(reported by Bell et al.)yrs
- the number of months between January 1, 2017 and the date of a woman’s interview divided by 12 (e.g. 1.5
represents 18 months)Next, we’ll consider how to disaggregate abortion incidence measures in groups defined by those variables mentioned above. It will be helpful to create a few meaningful categories for each variable, and then transform the result into a factor (this will control the ordering of categories along the x-axis in our final plots). We’ll also create a variable all
to show the overall abortion incidence for all women of reproductive age in each country.
dat <- pma %>%
mutate(
.keep = "none",
age = AGE_1 %>%
case_match(
15:19 ~ "15-19",
20:24 ~ "20-24",
25:29 ~ "25-29",
30:34 ~ "30-34",
35:39 ~ "35-39",
40:49 ~ "40+",
) %>%
fct_relevel("15-19","20-24","25-29","30-34","35-39"),
urban = URBAN %>% as_factor(),
edu = EDUCATTGEN_1 %>%
case_match(
1 ~ "Never",
2 ~ "Primary",
3 ~ "Secondary",
4 ~ "Higher"
) %>%
fct_relevel("Never", "Primary", "Secondary"),
wealth = WEALTHQ_1 %>%
as_factor() %>%
str_remove(" quintile") %>%
fct_relevel("Lowest", "Lower", "Middle", "Higher"),
marstat = MARSTAT_1 %>%
case_match(
10 ~ "Never Married",
21:22 ~ "Partnered",
31:32 ~ "Separated / Widowed"
) %>%
fct_relevel("Never Married", "Partnered"),
all = "All"
) %>%
bind_cols(dat)
Finally, we’ll tweak the remaining technical variables for improved readability.
Your analytic dataset should now look something like this:
dat
# A tibble: 13,844 × 15
country weight eaid strata yrs age urban edu wealth marstat all reg abor any avg
<fct> <dbl> <dbl> <int> <dbl> <fct> <fct> <fct> <fct> <fct> <chr> <lgl> <lgl> <lgl> <dbl>
1 Cote d'… 0.574 11543 38401 1.5 30-34 Urban Seco… Higher Partne… All FALSE FALSE FALSE 0
2 Cote d'… 1.26 11498 38401 1.5 15-19 Urban Never Highe… Never … All FALSE FALSE FALSE 0
3 Cote d'… 0.766 11148 38402 1.5 20-24 Rural Prim… Middle Partne… All FALSE FALSE FALSE 0
4 Cote d'… 0.983 11142 38401 1.5 15-19 Urban Seco… Higher Never … All FALSE FALSE FALSE 0
5 Cote d'… 1.45 11279 38401 1.5 30-34 Urban Seco… Highe… Never … All FALSE FALSE FALSE 0
6 Cote d'… 0.653 11989 38401 1.5 40+ Urban Never Middle Separa… All FALSE FALSE FALSE 0
7 Cote d'… 1.15 11040 38402 1.58 40+ Rural Prim… Middle Separa… All FALSE FALSE FALSE 0
8 Cote d'… 0.627 11331 38401 1.58 15-19 Urban Prim… Highe… Never … All FALSE FALSE FALSE 0
9 Cote d'… 0.875 11898 38401 1.5 20-24 Urban High… Higher Never … All TRUE FALSE TRUE 0.5
10 Cote d'… 0.983 11142 38401 1.5 25-29 Urban Never Highe… Partne… All FALSE FALSE FALSE 0
# … with 13,834 more rows
If so, you’re ready to calculate incidence measures for each of the derived variables reg
, abor
, any
, and avg
. Because our data extract includes two completely independent samples (one per country), we’ll need to think about three iterative steps:
Before we show how to write code that performs each of these steps in sequence, let’s start with a simple example.
Suppose we only wanted to estimate incidence via pregnancy termination (leaving aside period regulation) for women sorted by educational attainment in Nigeria.
Bell and Fissell (2021, pg 520) point to differences in education as a possible source of error in measures of incidence like those derived from our variable abor
(termination only).
…if more educated women are more likely to interpret a missed menses as a possible pregnancy and to seek out pregnancy confirmation, researchers may be more likely to capture their behaviors with traditional questions regarding induced abortion/ending a pregnancy. Conversely, if less-educated women are less likely to view their experience in these terms, we may disproportionally underestimate their post-coital behaviors to attempt to regulate fertility. This difference could result in invalid estimates of these behaviors among different subpopulations.
Let’s check! We’ll first define information about the Nigeria survey design and then group_by our education variable edu
(dropping any missing values only afterward to preserve the correct degrees of freedom from our full sample).2
dat %>%
filter(country == "Nigeria") %>%
as_survey_design(weight = weight, id = eaid, strata = strata, nest = TRUE) %>%
filter(!is.na(edu)) %>%
group_by(edu) %>%
summarise(1000 * survey_mean(abor /yrs, vartype = "ci", proportion = TRUE))
# A tibble: 4 × 4
edu coef `_low` `_upp`
<fct> <dbl> <dbl> <dbl>
1 Never 6.32 3.19 12.5
2 Primary 15.2 10.1 22.6
3 Secondary 24.2 19.3 30.4
4 Higher 24.8 16.1 38.0
If we only use abor
, we estimate that about 6 in 1,000 women with no formal education will report an abortion within one year. Conversely, 24 in 1,000 women with secondary education or higher education report an abortion within one year: the proportions for these groups are nearly quadrupled compared with women with no formal education! This disparity seems to support Bell and Fissell’s argument, but we’ll want to generate estimates from our other abortion measures for further context.
Here, we’ll use across to iterate over each of our measures - including avg
, corresponding with the results reported in Bell et al. (2020b).
dat %>%
filter(country == "Nigeria") %>%
reframe(across(
c(reg, abor, any, avg),
~pick(everything()) %>%
as_survey_design(weight = weight, id = eaid, strata = strata, nest = TRUE) %>%
group_by(edu) %>%
summarise(1000 * survey_mean(.x / yrs, vartype = "ci", proportion = TRUE))
)) %>%
pivot_longer(where(is.list), names_to = "measure")
# A tibble: 16 × 2
measure value$edu $coef $`_low` $`_upp`
<chr> <fct> <dbl> <dbl> <dbl>
1 reg Never 8.88 4.66 16.9
2 abor Never 6.32 3.19 12.5
3 any Never 13.6 8.17 22.5
4 avg Never 9.95 5.94 16.6
5 reg Primary 21.3 14.0 32.2
6 abor Primary 15.2 10.1 22.6
7 any Primary 33.8 24.8 46.0
8 avg Primary 24.5 18.0 33.3
9 reg Secondary 29.2 20.6 41.3
10 abor Secondary 24.2 19.3 30.4
11 any Secondary 48.7 38.5 61.4
12 avg Secondary 36.5 29.5 44.9
13 reg Higher 33.1 23.2 47.1
14 abor Higher 24.8 16.1 38.0
15 any Higher 56.5 43.2 73.6
16 avg Higher 40.7 30.5 54.1
Notice here that the number of women with no formal education reporting an abortion roughly doubles (from 6 in 1,000 to nearly 14) if we define incidence with any
(either pregnancy termination or period regulation). The measure avg
puts our estimate directly between abor
and any
at about 10 in 1,000 women.
As a reader, wouldn’t it be nice to be able to cycle through multiple samples and disaggregation measures? Next, we’ll build an summary table that we can incorporate into an application that will do exactly that!
Building on the previous example, we’ll now iterate through each of the disaggregation measures we’ve discussed. We’ll wrap everything within one larger iteration completed with group_by for each country
.
dat <- dat %>%
group_by(country) %>%
reframe(across(
c(urban, edu, wealth, marstat, age, all),
~pick(everything()) %>%
rename(group = .x) %>%
filter(!is.na(group)) %>%
reframe(across(
c(reg, abor, any, avg),
~pick(everything()) %>%
as_survey_design(weight = weight, id = eaid, strata = strata, nest = TRUE) %>%
group_by(group) %>%
summarise(1000 * survey_mean(.x / yrs, vartype = "ci", proportion = TRUE))
)) %>%
pivot_longer(where(is.list), names_to = "measure") %>%
unnest(value) %>%
list()
))
dat
# A tibble: 2 × 7
country urban edu wealth marstat age all
<fct> <list> <list> <list> <list> <list> <list>
1 Nigeria <tibble> <tibble> <tibble> <tibble> <tibble> <tibble>
2 Cote d'Ivoire <tibble> <tibble> <tibble> <tibble> <tibble> <tibble>
We’ll reference each of these lists by country
within our shiny app. For example, here are the urban
results for each country.
# A tibble: 16 × 6
country measure group coef `_low` `_upp`
<fct> <chr> <fct> <dbl> <dbl> <dbl>
1 Nigeria reg Rural 14.7 8.99 23.9
2 Nigeria abor Rural 12.7 9.42 17.0
3 Nigeria any Rural 24.9 17.7 34.8
4 Nigeria avg Rural 18.8 13.8 25.4
5 Nigeria reg Urban 33.2 23.6 46.3
6 Nigeria abor Urban 25.2 18.7 33.8
7 Nigeria any Urban 54.6 43.0 69.1
8 Nigeria avg Urban 39.9 31.6 50.2
9 Cote d'Ivoire reg Rural 23.3 12.0 44.9
10 Cote d'Ivoire abor Rural 15.5 7.88 30.4
11 Cote d'Ivoire any Rural 38.9 20.7 71.9
12 Cote d'Ivoire avg Rural 27.2 14.5 50.5
13 Cote d'Ivoire reg Urban 19.6 12.7 30.0
14 Cote d'Ivoire abor Urban 21.6 13.6 34.1
15 Cote d'Ivoire any Urban 37.1 26.0 52.7
16 Cote d'Ivoire avg Urban 29.4 20.1 42.7
Dashboards like this one can be run locally in RStudio, or you can publish them on a server like shinyapps.io. Ours is built as a Quarto document with the following header:
---
title: "ESTIMATED ONE-YEAR ABORTION INCIDENCE"
format:
html:
page-layout: custom
css: 'css/pma.css'
server: shiny
---
The rest of the document consists of four named chunks:
context: setup
- load your data and packages herepanel: sidebar
- home to all user inputspanel: fill
- home to plots generated from user inputscontext: server
- all back-end computation performed by RIn summary, we use ggplot2 to build a dot-whisker plot for selected abortion estimates for a user-provided country and disaggregation measure. As an extra flourish, the choiceNames
argument in checkboxGroupInput uses color-coded text to save space where you would normally see a legend.
```{r, panel="sidebar"}
selectInput(
inputId = "country",
label = "Country:",
choices = unique(dat$country)
)
selectInput(
inputId = 'group',
label = 'Group by:',
choices = list(Age = "age", `Urban / Rural` = "urban", Education = "edu",
Wealth = "wealth", `Marital Status` = "marstat"),
)
checkboxGroupInput(
inputId = "measure",
label="Abortion Measure:",
choiceNames = list(
HTML("<span style='color: #00263A;'>Period Regulation Only</span>"),
HTML("<span style='color: #98579B;'>Termination Only</span>"),
HTML("<span style='color: #B56576;'>Termination or Regulation</span>"),
HTML("<span style='color: #EAAC8B;'>Average of Measures</span>")
),
choiceValues = list("reg", "abor", "any", "avg"),
selected = list("reg", "abor", "any", "avg")
)
```
```{r, context="server"}
selectedData <- reactive({
dat %>%
filter(country == input$country) %>%
unnest(any_of(input$group)) %>%
filter(measure %in% input$measure)
})
output$plot <- renderPlot(width = 700, height = 400, expr = {
selectedData() %>%
ggplot(aes(x = group, y = coef, color = measure)) +
geom_point(position = position_dodge(width = 0.3)) +
geom_errorbar(
aes(ymin = `_low`, ymax = `_upp`),
width = 0,
position = position_dodge(width = 0.3)
) +
theme_minimal() %+replace% theme(
panel.grid.major.x = element_blank(),
legend.position = "none"
) +
labs(x = NULL,color = NULL, y = "Number of abortions per 1,000 women") +
scale_y_continuous(limits = c(0, 110)) +
scale_color_manual(values = c(
"reg" = "#00263A",
"abor" = "#98579B",
"any" = "#B56576",
"avg" = "#EAAC8B"
))
})
```
That’s it! Simply hit “Run Document” to test your application in RStudio. For information about deploying it for use by others, check out this article.
Here’s our finished app:
“We averaged the two point estimates because we believe the pregnancy removal data fails to capture some abortions (that women may not view as abortions or are not willing to admit are abortions) while the period regulation data likely include some experiences that we would not consider to be abortions.” (Bell et al. 2020a, pg 5)↩︎
Note: in Nigeria, the same EAID codes are recycled in multiple sample STRATA
. You must use nest = TRUE
to ensure that these EAs are treated as separate sampling locations!↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.