Create aggregate measures for women living the areas served by SDPs
Welcome to the third post in a series all about using PMA Service Delivery Point (SDP) data to better understand Individuals in Context. In our last post, we discussed a few of the variable groups related to contraceptive availability, and we showed how to use functions like dplyr::across
to recode and summarize these variable groups in preparation for merging with Household and Female data.
Before we dive in, let’s quickly revisit the geographic sampling units - or enumeration areas - we’ll be using to link SDPs with their counterparts in the Household and Female data.
Remember: the SDP sample design selects facilities meant to reflect the health service environment experienced by individuals included in Household and Female samples. If you were designing survey with this goal in mind, how would you select facilities?
Well, you might target a sample of facilities located within the same geographic sampling units PMA used to define Household and Female samples from the same country in the same year. Presumably, the health services available to a woman living in enumeration area X
would be captured pretty well if we surveyed a list of facilities also located in enumeration area X
.
But what happens if a lot of women living in enumeration area X
travel to enumeration area Y
to receive family planning services? In that case, you’d want to know as much as possible about the service catchment areas for facilities in that country. Then, you could select facilities based on whether they provide services to enumeration area X
, rather than relying simply to those that are located there.
In fact, PMA partners with government health agencies to obtain information about the service catchment area for all of the public-sector health facilities in each participating country. As a result, public SDPs are sampled if one of the enumeration areas used in a corresponding Household and Female sample appears in their service catchment area.
Because service catchment data are only available for public facilities, PMA uses a different method to select private-sector facilities. A private facility will be selected for a SDP sample only if it is located inside the boundaries of an enumeration area included in a corresponding Household and Female sample.
Let’s take a look at an example SDP dataset to see how all of this information gets reported. We’ll use the same data we highlighted in our last post, which includes facilities sampled from Burkina Faso in 2017 and 2018. First, load the following packages into R:
Again in this post, we’ll be working with all of the available contraceptive services and stock variables ending with the suffixes PROV
, OBS
, OUT3MO
, and OUTDAY
. We’ll also add the variable group EASERVED, which - as we’ll see - stores information about the service catchment area for facilities where that information was available. Finally, we’ll add a few more variables that we’ll explore a bit later: AUTHORITY, FACILITYTYPE, and FACILITYTYPEGEN.
We’ll first load the data using ipumsr::read_ipums_micro
:
sdp <- ipumsr::read_ipums_micro(
ddi = "data/pma_00008.xml",
data = "data/pma_00008.dat.gz")
Then, following the steps outlined in our last post, we’ll apply a couple of recoding functions from ipumsr.
sdp <- sdp %>%
select(-EASERVED) %>% # error from extract system
mutate(
across(ends_with("OBS"), ~lbl_relabel(
.x,
lbl(1, "in stock") ~ .val %in% 1:2,
lbl(0, "out of stock") ~ .val == 3
)),
across(everything(), ~lbl_na_if(
.x,
~.lbl %in% c(
"Not interviewed (SDP questionnaire)",
"Don't know",
"No response or missing",
"NIU (not in universe)"
)
))
)
For the moment, let’s just take a look at the basic structure of our data, selecting only the variables FACILITYID, SAMPLE, AUTHORITY
, CONSENTSQ, and EAID. For this preview, we’ll also arrange the data in ascending order of FACILITYID
and SAMPLE
:
sdp %>%
select(FACILITYID, SAMPLE, AUTHORITY, CONSENTSQ, EAID) %>%
arrange(FACILITYID, SAMPLE)
# A tibble: 234 x 5
FACILITYID SAMPLE AUTHORITY CONSENTSQ EAID
<int+lbl> <int+lbl> <int+lbl> <int+lbl> <dbl>
1 7006 85405 [Burkina Faso 2017 R… 1 [Governme… 1 [Yes] 7390
2 7006 85408 [Burkina Faso 2018 R… 1 [Governme… 1 [Yes] 7390
3 7027 85405 [Burkina Faso 2017 R… 1 [Governme… 1 [Yes] 7332
4 7027 85408 [Burkina Faso 2018 R… 1 [Governme… 1 [Yes] 7332
5 7029 85405 [Burkina Faso 2017 R… 4 [Private] 1 [Yes] 7111
6 7029 85408 [Burkina Faso 2018 R… 4 [Private] 0 [No] 7111
7 7036 85408 [Burkina Faso 2018 R… 1 [Governme… 1 [Yes] 7412
8 7046 85408 [Burkina Faso 2018 R… 4 [Private] 1 [Yes] 7798
9 7048 85405 [Burkina Faso 2017 R… 1 [Governme… 1 [Yes] 7009
10 7051 85408 [Burkina Faso 2018 R… 1 [Governme… 1 [Yes] 7798
# … with 224 more rows
Each row in our data represents one facility from one sample. Notice that some - but not all - facilities appear once in sample 85405
(from 2017), and again in sample 85408
(from 2018).
The variable AUTHORITY
shows the managing authority for each facility. Following the discussion above, we’ll expect to find information about the service catchment area for each facility where the managing authority is 1 - Government
.
Also notice CONSENTSQ
, which indicates whether a respondent at each facility consented to be interviewed. When you first obtain a data extract, you should expect most variables to be marked Not interviewed (SDP questionnaire)
for facilities where CONSENTSQ
shows 0 - No
. However, we’ve already taken the extra step of marking all non-response values NA
: we should now expect to see NA
substituted for Not interviewed (SDP questionnaire)
.
Lastly, take particular note of the variable EAID
: in SDP data, EAID
shows the identification code associated with the enumeration area where a facility is located.
We’ll find information about the service catchment area for each facility in a different set of variables, each starting with with prefix EASERVED
:
sdp %>%
select(starts_with("EASERVED"))
# A tibble: 234 x 18
EASERVED1 EASERVED2 EASERVED3 EASERVED4 EASERVED5 EASERVED6
<int+lbl> <int+lbl> <int+lbl> <int+lbl> <int+lbl> <int+lbl>
1 7380 7323 7491 7605 7142 7279
2 7879 7516 7111 7554 7934 7791
3 7483 NA NA NA NA NA
4 7185 NA NA NA NA NA
5 7725 7859 7472 7175 NA NA
6 7082 NA NA NA NA NA
7 7650 NA NA NA NA NA
8 7955 NA NA NA NA NA
9 7323 NA NA NA NA NA
10 7774 NA NA NA NA NA
# … with 224 more rows, and 12 more variables: EASERVED7 <int+lbl>,
# EASERVED8 <int+lbl>, EASERVED9 <int+lbl>, EASERVED10 <int+lbl>,
# EASERVED11 <int+lbl>, …
You’ll notice that our extract contains 18 EASERVED variables. Why 18? If you created your own data extract, you’ll remember that you only selected one variable called EASERVED
: once you’ve selected samples, the IPUMS extract system automatically determines the correct number of EASERVED variables for your dataset based on the facility with the largest service catchment list.
As we’ve discussed, PMA only receives service catchment information about public-sector facilities. In their case, each EASERVED
variable contains an ID code for one of the enumeration areas in its service catchment list, or else it’s NA
. We’ll look at these public-sector facilities first:
sdp %>% count(AUTHORITY)
# A tibble: 3 x 2
AUTHORITY n
* <int+lbl> <int>
1 1 [Government] 202
2 3 [Faith-based] 3
3 4 [Private] 29
The vast majority of SDPs in our sample are public-sector facilities. They comprise 202 of the 234 facilities in our sample.
sdp %>%
filter(AUTHORITY == 1) %>%
select(starts_with("EASERVED"))
# A tibble: 202 x 18
EASERVED1 EASERVED2 EASERVED3 EASERVED4 EASERVED5 EASERVED6
<int+lbl> <int+lbl> <int+lbl> <int+lbl> <int+lbl> <int+lbl>
1 7380 7323 7491 7605 7142 7279
2 7879 7516 7111 7554 7934 7791
3 7483 NA NA NA NA NA
4 7185 NA NA NA NA NA
5 7725 7859 7472 7175 NA NA
6 7082 NA NA NA NA NA
7 7650 NA NA NA NA NA
8 7955 NA NA NA NA NA
9 7323 NA NA NA NA NA
10 7774 NA NA NA NA NA
# … with 192 more rows, and 12 more variables: EASERVED7 <int+lbl>,
# EASERVED8 <int+lbl>, EASERVED9 <int+lbl>, EASERVED10 <int+lbl>,
# EASERVED11 <int+lbl>, …
Using two of the dplyr
functions discussed in our last post - summarize and across - we’ll get a better sense of the catchment areas for our public-sector SDPs. Let’s see how many missing values exist for each of these EASERVED
variables:
sdp %>%
filter(AUTHORITY == 1) %>%
summarise(across(starts_with("EASERVED"), ~sum(is.na(.x))))
# A tibble: 1 x 18
EASERVED1 EASERVED2 EASERVED3 EASERVED4 EASERVED5 EASERVED6
<int> <int> <int> <int> <int> <int>
1 0 156 173 181 190 192
# … with 12 more variables: EASERVED7 <int>, EASERVED8 <int>,
# EASERVED9 <int>, EASERVED10 <int>, EASERVED11 <int>, …
We see that every public facility serves at least one enumeration area (there are no missing values for EASERVED1
). However, there are 156 missing values for EASERVED2
, which tells us that 156 public facilities only serve one enumeration area. Likewise: 173 facilities serve 2 enumeration areas or fewer, 181 serve 3 or fewer, and so forth.
What about the 32 non-public facilities?
sdp %>%
filter(AUTHORITY != 1) %>%
summarise(across(starts_with("EASERVED"), ~sum(is.na(.x))))
# A tibble: 1 x 18
EASERVED1 EASERVED2 EASERVED3 EASERVED4 EASERVED5 EASERVED6
<int> <int> <int> <int> <int> <int>
1 4 32 32 32 32 32
# … with 12 more variables: EASERVED7 <int>, EASERVED8 <int>,
# EASERVED9 <int>, EASERVED10 <int>, EASERVED11 <int>, …
PMA receives no information about the service catchment areas for these facilities, so - as you might expect - there are 32 missing values for EASERVED2
onward. Note, however, that there are only 4 missing values for EASERVED1
: for non-public facilities, EASERVED1
usually contains that same enumeration area code shown in EAID
(this is the enumeration area where the facility is, itself, located).
The exception to this rule comes from facilities where CONSENTSQ
shows that no respondent provided consent to be interviewed. If we’d like, we can copy EAID
to EASERVED1
for these facilities using dplyr::case_when:
Now, every SDP has at least one enumeration area included in the EASERVED
group. This will be important in our next step, where we’ll see how to summarize the SDP data by groups of facilities serving the same enumeration area.
Now that we’re familiar with EASERVED
variables, let’s take a look at the kinds of summary statistics we might want to construct from variables related to contraceptive service availability. For example, consider EMRGPROV, which indicates whether a facility provides emergency contraceptives to clients.
Remember that, right now, each row of our SDP dataset represents responses from one facility per sample. We’ll ultimately want to count the number of facilities providing emergency contraceptives to clients in each enumeration area, so we should use the tidyr
function pivot_longer to reshape the data in a way that repeats each facility’s response to EMRGPROV
once for every enumeration area that it serves.
Take, for example, the first 5 facilities in our dataset: for now, let’s just look at the first two EASERVED
variables, along with each facility’s FACILITYID
, EAID
, and EMRGPROV
response:
sdp %>%
slice(1:5) %>%
select(FACILITYID, EASERVED1, EASERVED2, EMRGPROV)
# A tibble: 5 x 4
FACILITYID EASERVED1 EASERVED2 EMRGPROV
<int+lbl> <dbl> <int+lbl> <int+lbl>
1 7250 7380 7323 0 [No]
2 7399 7879 7516 0 [No]
3 7506 7483 NA 0 [No]
4 7982 7185 NA 0 [No]
5 7065 7725 7859 1 [Yes]
Among these 5 facilities, only facility 7065
provides emergency contraceptives. This facility happens to provide services to 2 enumeration areas: 7725
and 7859
. When we use pivot_longer
, we’ll reshape the data to emphasize a different conclusion: our example shows two enumeration areas where individuals can access emergency contraceptives. We convey this information by placing each enumeration area from EASERVED1
or EASERVED2
in its own row:
sdp %>%
slice(1:5) %>%
select(FACILITYID, EASERVED1, EASERVED2, EMRGPROV) %>%
pivot_longer(
cols = starts_with("EASERVED"),
values_to = "EASERVED",
names_to = NULL
)
# A tibble: 10 x 3
FACILITYID EMRGPROV EASERVED
<int+lbl> <int+lbl> <dbl+lbl>
1 7250 0 [No] 7380
2 7250 0 [No] 7323
3 7399 0 [No] 7879
4 7399 0 [No] 7516
5 7506 0 [No] 7483
6 7506 0 [No] NA
7 7982 0 [No] 7185
8 7982 0 [No] NA
9 7065 1 [Yes] 7725
10 7065 1 [Yes] 7859
Now, we find that each of the values previously stored in EASERVED1
and EASERVED2
appear in a new column, EASERVED
. Each facility occupies two rows: one for each of the enumeration areas that it serves.
What about the rows where EASERVED
contains NA
? These rows are meaningless: we’re repeating each facility’s response to EMRGPROV
twice to represent two enumeration areas, but facilities 7506
and 7982
only serve one enumeration area apiece. We should include the argument values_drop_na = T
to drop these rows when we use pivot_longer()
:
sdp %>%
slice(1:5) %>%
select(FACILITYID, EASERVED1, EASERVED2, EMRGPROV) %>%
pivot_longer(
cols = starts_with("EASERVED"),
values_to = "EASERVED",
names_to = NULL,
values_drop_na = T
)
# A tibble: 8 x 3
FACILITYID EMRGPROV EASERVED
<int+lbl> <int+lbl> <dbl+lbl>
1 7250 0 [No] 7380
2 7250 0 [No] 7323
3 7399 0 [No] 7879
4 7399 0 [No] 7516
5 7506 0 [No] 7483
6 7982 0 [No] 7185
7 7065 1 [Yes] 7725
8 7065 1 [Yes] 7859
Now that we know how to pivot_longer
, let’s apply the function to our full dataset:
sdp <- sdp %>%
pivot_longer(
cols = starts_with("EASERVED"),
values_to = "EASERVED",
values_drop_na = T,
names_to = NULL
) %>%
distinct() # in case any facility listed the same EASERVED twice
Dropping each row where EASERVED
is missing, we’re left with 372 rows where information about each SDP gets repeated once for every enumeration area that it serves. (Remember: our original dataset contained only 234 rows because SDPs occupied just one row apiece).
sdp %>% select(FACILITYID, EASERVED, everything())
# A tibble: 372 x 58
FACILITYID EASERVED SAMPLE COUNTRY YEAR ROUND EAID CONSENTSQ
<int+lbl> <dbl+lb> <int+lbl> <int+l> <int> <dbl> <dbl> <int+lbl>
1 7250 7380 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
2 7250 7323 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
3 7250 7491 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
4 7250 7605 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
5 7250 7142 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
6 7250 7279 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
7 7250 7370 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
8 7250 7725 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
9 7250 7811 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
10 7250 7859 85405 [Bur… 1 [Bur… 2017 5 7142 1 [Yes]
# … with 362 more rows, and 50 more variables: STRATA <int+lbl>,
# FACILITYTYPE <int+lbl>, FACILITYTYPEGEN <int+lbl>,
# AUTHORITY <int+lbl>, CONPROV <int+lbl>, …
Now that we’ve reshaped our data, we’ll be able to create some simple summary statistics about each of the enumeration areas served by the facilities in our sample. First, let’s group_by(EASERVED, SAMPLE)
and count()
the number of facilities providing services to each enumeration area in each of our samples:
sdp %>%
group_by(EASERVED, SAMPLE) %>%
summarize(
.groups = "keep",
N_SDP = n()
)
# A tibble: 149 x 3
# Groups: EASERVED, SAMPLE [149]
EASERVED SAMPLE N_SDP
<dbl+lbl> <int+lbl> <int>
1 7003 85405 [Burkina Faso 2017 Round 5] 3
2 7003 85408 [Burkina Faso 2018 Round 6] 2
3 7006 85405 [Burkina Faso 2017 Round 5] 2
4 7006 85408 [Burkina Faso 2018 Round 6] 1
5 7009 85405 [Burkina Faso 2017 Round 5] 3
6 7009 85408 [Burkina Faso 2018 Round 6] 2
7 7016 85405 [Burkina Faso 2017 Round 5] 3
8 7016 85408 [Burkina Faso 2018 Round 6] 2
9 7026 85405 [Burkina Faso 2017 Round 5] 3
10 7026 85408 [Burkina Faso 2018 Round 6] 3
# … with 139 more rows
Continuing with the variable EMRGPROV
, we can now also count the number of sampled facilities providing emergency contraception to each EASERVED
:
sdp %>%
group_by(EASERVED, SAMPLE) %>%
summarize(
.groups = "keep",
N_EMRGPROV = sum(EMRGPROV)
)
# A tibble: 149 x 3
# Groups: EASERVED, SAMPLE [149]
EASERVED SAMPLE N_EMRGPROV
<dbl+lbl> <int+lbl> <int>
1 7003 85405 [Burkina Faso 2017 Round 5] 0
2 7003 85408 [Burkina Faso 2018 Round 6] 0
3 7006 85405 [Burkina Faso 2017 Round 5] 0
4 7006 85408 [Burkina Faso 2018 Round 6] 0
5 7009 85405 [Burkina Faso 2017 Round 5] 2
6 7009 85408 [Burkina Faso 2018 Round 6] 1
7 7016 85405 [Burkina Faso 2017 Round 5] 0
8 7016 85408 [Burkina Faso 2018 Round 6] 0
9 7026 85405 [Burkina Faso 2017 Round 5] 0
10 7026 85408 [Burkina Faso 2018 Round 6] 0
# … with 139 more rows
What if we want to include a count of the facilities providing each of the different contraceptive methods in our data? Building on a technique showcased in our last post, we could use dplyr::across
to iterate over all variables ending with the suffix PROV
:
sdp %>%
group_by(EASERVED, SAMPLE) %>%
summarize(
.groups = "keep",
across(ends_with("PROV"), ~sum(.x), .names = "N_{.col}")
)
# A tibble: 149 x 15
# Groups: EASERVED, SAMPLE [149]
EASERVED SAMPLE N_CONPROV N_CYCBPROV N_DEPOPROV N_DIAPROV
<dbl+lb> <int+lbl> <int> <int> <int> <int>
1 7003 85405 [Bur… 3 3 3 0
2 7003 85408 [Bur… 2 2 2 0
3 7006 85405 [Bur… 2 2 2 0
4 7006 85408 [Bur… 1 1 1 0
5 7009 85405 [Bur… 3 1 3 0
6 7009 85408 [Bur… 2 2 2 0
7 7016 85405 [Bur… 3 3 3 0
8 7016 85408 [Bur… 2 1 2 0
9 7026 85405 [Bur… 3 3 3 0
10 7026 85408 [Bur… 3 3 3 0
# … with 139 more rows, and 9 more variables: N_EMRGPROV <int>,
# N_FCPROV <int>, N_FSTPROV <int>, N_FJPROV <int>, N_IMPPROV <int>,
# …
We’ll reduce this information even further, creating a variable NUM_METHODS_PROV
indicating the number of methods provided by at least one sampled facility:
sdp %>%
group_by(EASERVED, SAMPLE) %>%
summarize(
.groups = "keep",
across(ends_with("PROV"), ~sum(.x), .names = "N_{.col}")
) %>%
transmute(
EASERVED,
SAMPLE,
NUM_METHODS_PROV = sum(c_across(ends_with("PROV")) > 0, na.rm = T)
)
# A tibble: 149 x 3
# Groups: EASERVED, SAMPLE [149]
EASERVED SAMPLE NUM_METHODS_PROV
<dbl+lbl> <int+lbl> <int>
1 7003 85405 [Burkina Faso 2017 Round 5] 10
2 7003 85408 [Burkina Faso 2018 Round 6] 8
3 7006 85405 [Burkina Faso 2017 Round 5] 10
4 7006 85408 [Burkina Faso 2018 Round 6] 8
5 7009 85405 [Burkina Faso 2017 Round 5] 9
6 7009 85408 [Burkina Faso 2018 Round 6] 10
7 7016 85405 [Burkina Faso 2017 Round 5] 9
8 7016 85408 [Burkina Faso 2018 Round 6] 8
9 7026 85405 [Burkina Faso 2017 Round 5] 10
10 7026 85408 [Burkina Faso 2018 Round 6] 10
# … with 139 more rows
In our last post, we introduced 4 variable groups related to the availability of different contraceptive methods. We’ll now create a summary variable for each one, and then show how to attach our new variables to a Household and Female dataset:
N_SDP
- number of SDPsNUM_METHODS_PROV
- number of methods provided by at least one SDPNUM_METHODS_INSTOCK
- number of methods in-stock with at least one SDPNUM_METHODS_OUT3MO
- number of methods out of stock in the last 3 months with at least one SDPMEAN_OUTDAY
- the mean length of a stockout for all out of stock methodssdp <- sdp %>%
group_by(EASERVED, SAMPLE) %>%
summarize(
.groups = "keep",
N_SDP = n(),
across(ends_with("PROV"), ~sum(.x, na.rm = T), .names = "N_{.col}"),
across(ends_with("OBS"), ~sum(.x, na.rm = T), .names = "N_{.col}"),
across(ends_with("OUT3MO"), ~sum(.x, na.rm = T), .names = "N_{.col}"),
across(ends_with("OUTDAY"), ~mean(.x, na.rm = T), .names = "N_{.col}"),
) %>%
transmute(
EASERVED,
SAMPLE,
N_SDP,
NUM_METHODS_PROV = sum(c_across(ends_with("PROV")) > 0, na.rm = T),
NUM_METHODS_INSTOCK = sum(c_across(ends_with("OBS")) > 0, na.rm = T),
NUM_METHODS_OUT3MO = sum(c_across(ends_with("OUT3MO")) > 0, na.rm = T),
MEAN_OUTDAY = mean(c_across(ends_with("OUTDAY")), na.rm = T)
) %>%
ungroup()
Consider the following female respondent dataset collected from Burkina Faso in 2017 and 2018. It contains a variable FPCURRUSE indicating whether the woman is currently using a method of family planning:
hhf <- read_ipums_micro(
ddi = "data/pma_00011.xml",
data = "data/pma_00011.dat.gz"
) %>%
select(PERSONID, EAID, URBAN, SAMPLE, FPCURRUSE) %>%
mutate(
across(everything(), ~lbl_na_if(
.x,
~.lbl %in% c(
"No response or missing",
"NIU (not in universe)"
)
))
)
hhf
# A tibble: 6,944 x 5
PERSONID EAID URBAN SAMPLE FPCURRUSE
<chr> <dbl> <int+lb> <int+lbl> <int+lbl>
1 0762000000029022… 7620 1 [Urba… 85405 [Burkina Faso 201… NA
2 0735800000017142… 7358 0 [Rura… 85405 [Burkina Faso 201… 0 [No]
3 0710400000020992… 7104 0 [Rura… 85405 [Burkina Faso 201… 0 [No]
4 0704800000014092… 7048 1 [Urba… 85405 [Burkina Faso 201… 0 [No]
5 0715600000020782… 7156 0 [Rura… 85405 [Burkina Faso 201… 0 [No]
6 0727900000021452… 7279 1 [Urba… 85405 [Burkina Faso 201… 0 [No]
7 0743100000024642… 7431 0 [Rura… 85405 [Burkina Faso 201… 1 [Yes]
8 0721200000025792… 7212 0 [Rura… 85405 [Burkina Faso 201… 0 [No]
9 0704200000014542… 7042 1 [Urba… 85405 [Burkina Faso 201… 0 [No]
10 0797200000013032… 7972 1 [Urba… 85405 [Burkina Faso 201… 0 [No]
# … with 6,934 more rows
You’ll notice that each row represents one female respondent with a unique PERSONID
(non-respondents and other household members have been removed beforehand). We’ve also got EAID
, which represents the enumeration area where each respondent resides; the variable URBAN
indicates whether the enumeration area is primarily “urban” or “rural”.
The variable SAMPLE
contains the same values seen in our SDP data:
When we merge, we’ll want to match each woman to both a SAMPLE
and an EASERVED
from the SDP data. We’ll rename EASERVED
to match the variable EAID
in the HHF data:
bf_merged <- sdp %>%
rename(EAID = EASERVED) %>%
right_join(hhf, by = c("EAID", "SAMPLE"))
Now, each woman’s record contains all of the variables we created above summarizing the SDPs that serve her enumeration area. For example, for all sampled women living in EAID == 7003
in 2017, the value in NUM_METHODS_OUT3MO
shows the number of family planning methods that were out of stock with any SDP serving the woman’s enumeration area within three months prior to the survey:
bf_merged %>%
filter(EAID == 7003, SAMPLE == 85405) %>%
select(PERSONID, EAID, SAMPLE, NUM_METHODS_OUT3MO)
# A tibble: 55 x 4
PERSONID EAID SAMPLE NUM_METHODS_OUT3…
<chr> <dbl+lb> <int+lbl> <int>
1 070030000001973… 7003 85405 [Burkina Faso 20… 0
2 070030000001973… 7003 85405 [Burkina Faso 20… 0
3 070030000002640… 7003 85405 [Burkina Faso 20… 0
4 070030000001075… 7003 85405 [Burkina Faso 20… 0
5 070030000001609… 7003 85405 [Burkina Faso 20… 0
6 070030000000835… 7003 85405 [Burkina Faso 20… 0
7 070030000001273… 7003 85405 [Burkina Faso 20… 0
8 070030000000527… 7003 85405 [Burkina Faso 20… 0
9 070030000002561… 7003 85405 [Burkina Faso 20… 0
10 070030000002391… 7003 85405 [Burkina Faso 20… 0
# … with 45 more rows
You’ll notice that 55 women were surveyed in EAID
7003 in 2017, and each one has the same value (0) for NUM_METHODS_OUT3MO
.
We’ll dig deeper into the types of research questions that our new combined dataset can answer in our upcoming Data Analysis post. For now, take a look at the apparent relationship between FPCURRUSE
and NUM_METHODS_OUT3MO
for all of the women with non-missing responses for both variables:
bf_merged %>%
filter(!is.na(FPCURRUSE) & !is.na(NUM_METHODS_OUT3MO)) %>%
group_by(NUM_METHODS_OUT3MO > 0) %>%
count(FPCURRUSE) %>%
mutate(pct = n/sum(n))
# A tibble: 4 x 4
# Groups: NUM_METHODS_OUT3MO > 0 [2]
`NUM_METHODS_OUT3MO > 0` FPCURRUSE n pct
<lgl> <int+lbl> <int> <dbl>
1 FALSE 0 [No] 2721 0.648
2 FALSE 1 [Yes] 1475 0.352
3 TRUE 0 [No] 1124 0.700
4 TRUE 1 [Yes] 482 0.300
Notably, among those respondents living in an enumeration area that experienced zero stockouts within the 3 months prior to the SDP survey, 35% indicated that they were actively using a family planning method. Compare that to the set of respondents living in an area where at least one method was out of stock during the same time period: only 30% were using a family planning method.
While a 5% difference may or may not prove to be statistically significant under further analysis, it’s not entirely surprising that the reliable availability of contraceptive methods from service providers might influence the contraceptive prevalence rate among women in a given area.
As always, let us know what kinds of questions about fertility and family planning you’re answering with data merged from service providers!
If you see mistakes or want to suggest changes, please create an issue on the source repository.