Recode and Summarize Variables from Multiple Response Questions

Individuals in Context Data Manipulation Service Delivery Points across ipumsr

Use dplyr::across to summarize variables with a similar naming pattern.

Matt Gunther (IPUMS PMA Senior Data Analyst)
01-28-2021

In our last post, we introduced PMA Service Delivery Point (SDP) data as an important resource for understanding the health services environment experienced by individuals sampled in PMA Household and Female data. For our second post in this Individuals in Context series, we’ll now take deeper dive into one of the important topics in SDP data: the range and availability of contraceptive methods provided at each facility.

A common feature of the variables in this topic - and in many other topics - is that you’ll find several binary indicators constructed from the same multiple resposne item on the SDP questionnaire. We’ll see that IPUMS PMA uses a common naming convention to help users group these variables together for use in functions like dplyr::across.

SDP Multiple Response Questions

Every SDP respondent receives a question associated with the variable FPOFFERED, which indicates whether to facility usually offers family planning services or products:

Do you usually offer family planning services / products?

  [] Yes
  [] No
  [] No response

If yes, they’ll then receive a multiple response-type question asking about the contraceptive methods provided to clients. The range of options provided on the questionnaire may vary across samples, but most look something like this:

Which of the following methods are provided to clients at this facility? 

  [] Female sterilization
  [] Male sterilization
  [] Implant
  [] IUD
  [] Injectables - Depo Provera
  [] Injectables - Sayana Press
  [] Pill
  [] Emergency Contraception
  [] Male Condom
  [] Female Condom
  [] Diaphragm
  [] Foam/Jelly
  [] Std. Days / Cycle beads
  [] None of the above
  [] No response

This is a multiple response question: each method in the list could be answered individually (Yes or No), or the respondent could reply None of the above or provide No response. The IPUMS PMA extract system generates one variable for each of the methods in the list:

The questionnaire continues for each one of the methods provided at a given facility. Next, it checks for the current availability of each of the provided methods:

You mentioned that you typically provide the [METHOD] at this facility,
can you show it to me? If no, probe: Is the [METHOD] out of stock today?

  [] In-stock and observed
  [] In-stock but not observed
  [] Out of stock
  [] No Response

The variables associated with each response end with the same suffix OBS:

If a facility did have a particular method in-stock, it received a question asking whether supplies were unavailable any time in the previous three months:

Has the [METHOD] been out of stock at any time in the last 3 months?

  [] Yes 
  [] No 
  [] Don't know
  [] No response

This question becomes a series of variables ending with the suffix OUT3MO:

On the other hand, if a facility that normally provides a given method did not have supplies in-stock during the interview, it received a different question about the duration of the current stockout:

How many days has the [METHOD] been out of stock?

  Number of days____

The resulting variables - each ending with the suffix OUTDAY - take an integer value representing the stockout duration in days (except where the value is a non-response code, see below):

Setup: Load an Example Dataset into R

As you can see, we’re left with quite a few variables from just these 4 questions! That’s very useful if you’re interested in the availability of one method, in particular, but what if you want to get a picture of the full range of methods provided at a particular facility?

Fortunately, the repeated use of variable suffixes (PROV, OBS, OUT3MO, and OUTDAY) make these variables highly suitable for column-wise processing with dplyr::across.

Let’s start with an example data extract containing all of the variables listed above, collected from just two samples:

Once you’ve downloaded an extract, open RStudio and load the packages tidyverse and ipumsr:

Next, use the file paths for your data extract to load it into R:

sdp <- ipumsr::read_ipums_micro(
  ddi = "data/pma_00008.xml",
  data = "data/pma_00008.dat.gz"
)

Using dplyr::ends_with, we’ll select only FACILITYID, SAMPLE, EAID, and the variables using one of the four suffixes PROV, OBS, OUT3MO, or OUTDAY.

sdp <- sdp %>%  
  select(
    FACILITYID,
    SAMPLE, 
    EAID, 
    ends_with("PROV"),
    ends_with("OBS"),
    ends_with("OUT3MO"),
    ends_with("OUTDAY")
  )

That leaves us with 234 rows - each a facility from one of our two samples - and 49 variables:

sdp
# A tibble: 234 x 49
   FACILITYID          SAMPLE  EAID  CONPROV CYCBPROV DEPOPROV DIAPROV
    <int+lbl>       <int+lbl> <dbl> <int+lb> <int+lb> <int+lb> <int+l>
 1       7250 85405 [Burkina…  7142  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 2       7399 85405 [Burkina…  7879  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 3       7506 85405 [Burkina…  7483  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 4       7982 85405 [Burkina…  7185  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 5       7065 85405 [Burkina…  7859  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 6       7729 85405 [Burkina…  7082  1 [Yes]  0 [No]   1 [Yes]  0 [No]
 7       7490 85405 [Burkina…  7650  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 8       7311 85405 [Burkina…  7955  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
 9       7524 85405 [Burkina…  7323  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
10       7932 85405 [Burkina…  7774  1 [Yes]  1 [Yes]  1 [Yes]  0 [No]
# … with 224 more rows, and 42 more variables: EMRGPROV <int+lbl>,
#   FCPROV <int+lbl>, FSTPROV <int+lbl>, FJPROV <int+lbl>,
#   IMPPROV <int+lbl>, IUDPROV <int+lbl>, MSTPROV <int+lbl>,
#   PILLPROV <int+lbl>, SAYPROV <int+lbl>, CONOBS <int+lbl>,
#   CYCBOBS <int+lbl>, DEPOOBS <int+lbl>, DIAOBS <int+lbl>,
#   EMRGOBS <int+lbl>, FCOBS <int+lbl>, FJOBS <int+lbl>,
#   IMPOBS <int+lbl>, IUDOBS <int+lbl>, PILLOBS <int+lbl>,
#   SAYOBS <int+lbl>, CONOUT3MO <int+lbl>, CYCBOUT3MO <int+lbl>,
#   DEPOOUT3MO <int+lbl>, DIAOUT3MO <int+lbl>, EMRGOUT3MO <int+lbl>,
#   FCOUT3MO <int+lbl>, FJOUT3MO <int+lbl>, IMPOUT3MO <int+lbl>,
#   IUDOUT3MO <int+lbl>, PILLOUT3MO <int+lbl>, SAYOUT3MO <int+lbl>,
#   CONOUTDAY <int+lbl>, CYCBOUTDAY <int+lbl>, DEPOOUTDAY <int+lbl>,
#   DIAOUTDAY <int+lbl>, EMRGOUTDAY <int+lbl>, FCOUTDAY <int+lbl>,
#   FJOUTDAY <int+lbl>, IMPOUTDAY <int+lbl>, IUDOUTDAY <int+lbl>,
#   PILLOUTDAY <int+lbl>, SAYOUTDAY <int+lbl>

Recoding Variables with ipumsr

A key feature to remember about IPUMS PMA extracts is that variables often have value labels, which are text labels assigned to the different values taken by a variable. When we load the extract into R with an ipumsr function, these variables are imported as labelled objects rather than the more common factor class of objects.

As a result, IPUMS data users need to take some unusual steps when recoding a variable or handling NA values. Happily, the ipumsr package provide a few functions (starting with the prefix lbl_) that make this process very easy.

Let’s take a look at the variable CONOBS:

sdp %>% count(CONOBS)
# A tibble: 5 x 2
                                    CONOBS     n
                                 <int+lbl> <int>
1  1 [In-stock and observed]                 204
2  2 [In-stock but not observed]               3
3  3 [Out of stock]                            5
4 94 [Not interviewed (SDP questionnaire)]     4
5 99 [NIU (not in universe)]                  18

Notice that we have two values representing SDPs with male condoms “in-stock”: SDPs where the interviewer personally observed the condoms get 1, while those where condoms where reported in-stock - but not actually observed by the interviewer - get 2.

Depending on your research question, the interviewer’s personal observation of each method may or may not be important. You might decide that you’d prefer to recode this variable into a simple binary measure that could be easily plugged into a regression model as a dummy variable later on. To do that, you could use the ipumsr function lbl_relabel:

sdp %>% 
  mutate(CONOBS = lbl_relabel(
      CONOBS,
      lbl(1, "In-stock") ~ .val %in% 1:2,
      lbl(0, "Out of stock") ~ .val == 3
    )) %>% 
  count(CONOBS)
# A tibble: 4 x 2
                                    CONOBS     n
                                 <dbl+lbl> <int>
1  0 [Out of stock]                            5
2  1 [In-stock]                              207
3 94 [Not interviewed (SDP questionnaire)]     4
4 99 [NIU (not in universe)]                  18

That collapses the values 1 and 2 together, and it moves the value 3 (“Out of stock”) to 0. However, we’ve still got a the values 94 and 99, which are each a different type of non-response. The easiest strategy here would be to recode any value larger than 90 as NA, and we could do that with another ipumsr function, lbl_na_if:

sdp %>% 
  mutate(
    CONOBS = lbl_relabel(
      CONOBS,
      lbl(1, "in-stock") ~ .val %in% 1:2,
      lbl(0, "out of stock") ~ .val == 3
    ),
    CONOBS = lbl_na_if(
      CONOBS,
      ~.val > 90
    )
  ) %>% 
  count(CONOBS)
# A tibble: 3 x 2
             CONOBS     n
          <dbl+lbl> <int>
1  0 [out of stock]     5
2  1 [in-stock]       207
3 NA                   22

This works great for our example variable, CONOBS. Unfortunately, though, we can’t always rely on the rule ~.val > 90 to handle missing responses. For variables like CONOUTDAY, a value above 90 could be a valid response: what if a facility experienced a stockout lasting 94 days? For this reason, the non-response values for CONOUTDAY are padded with additional digits:

sdp %>% count(CONOUTDAY)
# A tibble: 7 x 2
                                   CONOUTDAY     n
                                   <int+lbl> <int>
1    1                                           1
2    3                                           1
3   10                                           1
4   15                                           1
5   60                                           1
6 9994 [Not interviewed (SDP questionnaire)]     4
7 9999 [NIU (not in universe)]                 225

We could write a different lbl_na_if function for our OUTDAY variables, but ipumsr provides a much nicer workaround: we can specify non-response labels rather than values, as long as we make sure to use all of the different non-response labels appearing throughout our dataset:

sdp %>% 
  mutate(
    CONOBS = lbl_relabel(
      CONOBS,
      lbl(1, "in-stock") ~ .val %in% 1:2,
      lbl(0, "out of stock") ~ .val == 3
    ),
    CONOBS = lbl_na_if(
      CONOBS,
      ~.lbl %in% c(
        "Not interviewed (SDP questionnaire)",
        "Don't know",
        "No response or missing",
        "NIU (not in universe)"
      )
    )
  ) %>% 
  count(CONOBS)
# A tibble: 3 x 2
             CONOBS     n
          <dbl+lbl> <int>
1  0 [out of stock]     5
2  1 [in-stock]       207
3 NA                   22

Now, we’ll be able to recode all of our variables with the same pair of functions! To do that, we’ll first need to take a look at the column-wise workflow made available by dplyr::across.

Introducing dplyr::across

While there are several ways to apply a function across a set of variables in R, the simplest method comes from a new addition to the dplyr package that’s loaded when you run library(tidyverse). The function dplyr::across takes two arguments: a function, and a selection of columns where you want that function to be applied.

Remember that we want collapse the values 1 - In-stock and observed and 2 - In-stock but not observed for all of the variables ending with OBS, not just CONOBS. Using across and a selection of variables ending with OBS, we’ll apply the same lbl_relabel function we used on CONOBS above:

sdp %>% 
  mutate(
    across(ends_with("OBS"), ~lbl_relabel(
      .x,
      lbl(1, "in-stock") ~ .val %in% 1:2,
      lbl(0, "out of stock") ~ .val == 3
    )) 
  ) %>% 
  count(CONOBS)
# A tibble: 4 x 2
                                    CONOBS     n
                                 <dbl+lbl> <int>
1  0 [out of stock]                            5
2  1 [in-stock]                              207
3 94 [Not interviewed (SDP questionnaire)]     4
4 99 [NIU (not in universe)]                  18

Here, we stick lbl_relabel inside a lambda function with syntax from purrr: the ~ designates a tidy lambda function, which in turn uses .x as a kind of pronoun referencing each of the variables returned by ends_with("OBS"). We’re showing that CONOBS still gets recoded as before, but so do all of the other variables in its group!

We’ll use across again with lbl_na_if, but this time we want to produce NA values for all of the variables in our dataset. In place of ends_with("OBS"), we’ll use the selection function everything(). This will take care of all the recoding we want to do, so we’ll also reassign our data with sdp <- sdp:

sdp <- sdp %>% 
  mutate(
    across(ends_with("OBS"), ~lbl_relabel(
      .x,
      lbl(1, "in-stock") ~ .val %in% 1:2,
      lbl(0, "out of stock") ~ .val == 3
    )),
    across(everything(), ~lbl_na_if(
      .x,
      ~.lbl %in% c(
        "Not interviewed (SDP questionnaire)",
        "Don't know",
        "No response or missing",
        "NIU (not in universe)"
      )
    ))
  )

Let’s pick a few variables to check out work:

sdp %>% count(CONOBS)
# A tibble: 3 x 2
             CONOBS     n
          <dbl+lbl> <int>
1  0 [out of stock]     5
2  1 [in-stock]       207
3 NA                   22
sdp %>% count(IMPOBS)
# A tibble: 3 x 2
             IMPOBS     n
          <dbl+lbl> <int>
1  0 [out of stock]     3
2  1 [in-stock]       204
3 NA                   27
sdp %>% count(CONOUTDAY)
# A tibble: 6 x 2
  CONOUTDAY     n
  <int+lbl> <int>
1         1     1
2         3     1
3        10     1
4        15     1
5        60     1
6        NA   229
sdp %>% count(IMPOUTDAY)
# A tibble: 4 x 2
  IMPOUTDAY     n
  <int+lbl> <int>
1         1     1
2        14     1
3        90     1
4        NA   231

Summarize Variable Groups by Facility

Everything looks great! Now that we’ve finished reformatting the data, remember that our ultimate goal is to get some sense of the scope of methods available at a particular facility.

We’d like to use something like across again here, but this time we’ll only want to apply our function to a selection of columns within the same row (because each row of our dataset represents one facility). To do this, we’ll divide the dataset rowwise, and then use the related function c_across to apply a calculation across columns within each row.

For instance, suppose we want to create NUM_METHODS_PROV to show the total number of methods provided at each facility. Let’s look at the PROV variables for the first few facilities:

sdp %>% select(ends_with("PROV"))
# A tibble: 234 x 13
    CONPROV  CYCBPROV DEPOPROV DIAPROV EMRGPROV  FCPROV FSTPROV FJPROV
   <int+lb> <int+lbl> <int+lb> <int+l> <int+lb> <int+l> <int+l> <int+>
 1  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 1 [Yes] 0 [No]
 2  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 1 [Yes] 0 [No]
 3  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 0 [No]  0 [No]
 4  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 0 [No]  0 [No]
 5  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  1 [Yes] 1 [Yes] 1 [Yes] 0 [No]
 6  1 [Yes]   0 [No]   1 [Yes]  0 [No]  0 [No]  1 [Yes] 1 [Yes] 0 [No]
 7  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 0 [No]  0 [No]
 8  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 1 [Yes] 0 [No]
 9  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 0 [No]  0 [No]
10  1 [Yes]   1 [Yes]  1 [Yes]  0 [No]  0 [No]  1 [Yes] 0 [No]  0 [No]
# … with 224 more rows, and 5 more variables: IMPPROV <int+lbl>,
#   IUDPROV <int+lbl>, MSTPROV <int+lbl>, PILLPROV <int+lbl>,
#   SAYPROV <int+lbl>

To calculate NUM_METHODS_PROV, we can just find the sum of values across all of the PROV variables (thanks to our recoding work, the only possible values here are 1 for “yes”, or 0 for “no”). Notice that c_across takes only one argument: a selection function like ends_with("PROV"). That’s because c_across works like the familiar concatenate function c() used to provide a vector of values to a function like sum(c(1,2,3)).

First, use rowwise() to signal that we’ll only calculate the sum across variables in the same row. Then, use c_across() to find the sum() of PROV variables in each row:

sdp %>% 
  rowwise() %>% 
  transmute(NUM_METHODS_PROV = sum(c_across(ends_with("PROV")), na.rm = T))
# A tibble: 234 x 1
# Rowwise: 
   NUM_METHODS_PROV
              <int>
 1               10
 2               10
 3                8
 4                8
 5               10
 6                9
 7                8
 8               10
 9                8
10                8
# … with 224 more rows

We can now create a summary variable for each of the four variable groups. Let’s create:

sdp %>% 
  rowwise() %>% 
  transmute(
    NUM_METHODS_PROV = sum(c_across(ends_with("PROV")), na.rm = T),
    NUM_METHODS_INSTOCK = sum(c_across(ends_with("OBS")), na.rm = T),
    NUM_METHODS_OUT3MO = sum(c_across(ends_with("OUT3MO")), na.rm = T),
    MEAN_OUTDAY = mean(c_across(ends_with("OUTDAY")), na.rm = T)
  )
# A tibble: 234 x 4
# Rowwise: 
   NUM_METHODS_PROV NUM_METHODS_INSTOCK NUM_METHODS_OUT3MO MEAN_OUTDAY
              <int>               <dbl>              <int>       <dbl>
 1               10                   8                  0         NaN
 2               10                   7                  0           8
 3                8                   8                  0         NaN
 4                8                   8                  0         NaN
 5               10                   9                  0         NaN
 6                9                   7                  0         NaN
 7                8                   7                  0         365
 8               10                   8                  1         NaN
 9                8                   8                  1         NaN
10                8                   7                  0          30
# … with 224 more rows

Summarize Variable Groups by EAID

In our last post, we mentioned that the best use case for SDP data is to aggregate information collected from facilities working in the same geographic sampling units - or enumeration areas - used to select individuals for PMA Household and Female samples. In our next post, we’ll take a close look at the variable group EASERVED, which lists all of the enumeration area codes where a facility is known to provide health services. We’ll then introduce a strategy using tidyr::pivot_longer to summarize the full scope of services available to women living in a particular enumeration area.

For now, let’s simply consider all of the sampled facilities located in a particular enumeration area. That is, rather than calculating the number of methods provided by one facility NUM_METHODS_PROV, let’s create one variable for each method indicating whether the method was provided by at least one facility in a given enumeration area EAID in a given SAMPLE.

For instance, look at the number of facilities providing IUDs in enumeration area 7111 for the Burkina Faso sample collected in 2017:

sdp %>% 
  filter(EAID == 7111, SAMPLE == 85405) %>% 
  select(EAID, SAMPLE, FACILITYID, PILLPROV)
# A tibble: 4 x 4
   EAID                            SAMPLE FACILITYID  PILLPROV
  <dbl>                         <int+lbl>  <int+lbl> <int+lbl>
1  7111 85405 [Burkina Faso 2017 Round 5]       7210   1 [Yes]
2  7111 85405 [Burkina Faso 2017 Round 5]       7029   0 [No] 
3  7111 85405 [Burkina Faso 2017 Round 5]       7441   1 [Yes]
4  7111 85405 [Burkina Faso 2017 Round 5]       7403   1 [Yes]

We want to use a summarize function to create a variable like ANY_PILLPROV, which should simply indicate whether any of these four facilities provide contraceptive pills. Three of them do provide pills, so we want ANY_PILLPROV to be TRUE.

sdp %>% 
  filter(EAID == 7111, SAMPLE == 85405) %>% 
  summarize(ANY_PILLPROV = any(PILLPROV == 1))
# A tibble: 1 x 1
  ANY_PILLPROV
  <lgl>       
1 TRUE        

Now that we’re familiar with across, we should be able to do the same thing to all PROV variables for this particular group of facilities. Let’s also introduce a naming convention where we glue the prefix ANY_ to the column name referenced by the pronoun .x:

sdp %>% 
  filter(EAID == 7111, SAMPLE == 85405) %>% 
  summarize(across(ends_with("PROV"), ~any(.x == 1), .names = "ANY_{.col}"))
# A tibble: 1 x 13
  ANY_CONPROV ANY_CYCBPROV ANY_DEPOPROV ANY_DIAPROV ANY_EMRGPROV
  <lgl>       <lgl>        <lgl>        <lgl>       <lgl>       
1 TRUE        TRUE         TRUE         FALSE       FALSE       
# … with 8 more variables: ANY_FCPROV <lgl>, ANY_FSTPROV <lgl>,
#   ANY_FJPROV <lgl>, ANY_IMPPROV <lgl>, ANY_IUDPROV <lgl>,
#   ANY_MSTPROV <lgl>, ANY_PILLPROV <lgl>, ANY_SAYPROV <lgl>

Let’s repeat the same procedure for every enumeration area in each of our samples. Rather than using a filter to select one EAID in one SAMPLE, we’ll use group_by to work with each EAID in each SAMPLE.

sdp %>% 
  group_by(EAID, SAMPLE) %>% 
  summarize(
    .groups = "keep",
    across(ends_with("PROV"), ~any(.x == 1), .names = "ANY_{.col}")
  )
# A tibble: 142 x 15
# Groups:   EAID, SAMPLE [142]
    EAID      SAMPLE ANY_CONPROV ANY_CYCBPROV ANY_DEPOPROV ANY_DIAPROV
   <dbl>   <int+lbl> <lgl>       <lgl>        <lgl>        <lgl>      
 1  7003 85405 [Bur… TRUE        TRUE         TRUE         FALSE      
 2  7003 85408 [Bur… TRUE        TRUE         TRUE         FALSE      
 3  7006 85405 [Bur… TRUE        TRUE         TRUE         FALSE      
 4  7006 85408 [Bur… TRUE        TRUE         TRUE         FALSE      
 5  7009 85405 [Bur… TRUE        FALSE        TRUE         FALSE      
 6  7009 85408 [Bur… TRUE        TRUE         TRUE         FALSE      
 7  7016 85405 [Bur… TRUE        TRUE         TRUE         FALSE      
 8  7016 85408 [Bur… TRUE        TRUE         TRUE         FALSE      
 9  7026 85405 [Bur… TRUE        TRUE         TRUE         FALSE      
10  7042 85405 [Bur… TRUE        TRUE         TRUE         FALSE      
# … with 132 more rows, and 9 more variables: ANY_EMRGPROV <lgl>,
#   ANY_FCPROV <lgl>, ANY_FSTPROV <lgl>, ANY_FJPROV <lgl>,
#   ANY_IMPPROV <lgl>, ANY_IUDPROV <lgl>, ANY_MSTPROV <lgl>,
#   ANY_PILLPROV <lgl>, ANY_SAYPROV <lgl>

This is still quite a bit of information! Suppose we want to summarize it even further: let’s calculate NUM_METHODS_PROV again with our summary output. This time, NUM_METHODS_PROV will count the number of methods provided by at least one facility in each group.

sdp %>% 
  group_by(EAID, SAMPLE) %>% 
  summarize(
    .groups = "keep",
    across(ends_with("PROV"), ~any(.x == 1), .names = "ANY_{.col}")
  ) %>% 
  transmute(NUM_METHODS_PROV= sum(c_across(ends_with("PROV")), na.rm = T))
# A tibble: 142 x 3
# Groups:   EAID, SAMPLE [142]
    EAID                            SAMPLE NUM_METHODS_PROV
   <dbl>                         <int+lbl>            <int>
 1  7003 85405 [Burkina Faso 2017 Round 5]                8
 2  7003 85408 [Burkina Faso 2018 Round 6]                8
 3  7006 85405 [Burkina Faso 2017 Round 5]                8
 4  7006 85408 [Burkina Faso 2018 Round 6]                8
 5  7009 85405 [Burkina Faso 2017 Round 5]                8
 6  7009 85408 [Burkina Faso 2018 Round 6]               10
 7  7016 85405 [Burkina Faso 2017 Round 5]                8
 8  7016 85408 [Burkina Faso 2018 Round 6]                8
 9  7026 85405 [Burkina Faso 2017 Round 5]                8
10  7042 85405 [Burkina Faso 2017 Round 5]               10
# … with 132 more rows

These summaries are exactly the type of SDP data we’d like to attach to a Household and Female dataset! Watch for our next post, where we’ll show how to create summaries by both EAID and EASERVED, and then match them to records from female respondents sampled from Burkina Faso in 2017 and 2018.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.