Droplets of Insights for Integrating DHS and Rainfall Data

Some factors to consider when obtaining and preparing precipitation data for use with DHS surveys

Research concepts

Audrey Dorélien

Associate Professor, Humphrey School of Public Affairs, University of Minnesota

Molly Brown

Research Professor, Department of Geographical Sciences, University of Maryland College Park


March 7, 2024

One barrier to climate change and health research is that many surveys—including the The Demographic and Health Surveys Program (DHS)—do not collect detailed information on local environmental conditions in survey areas. Fortunately, organizations like IPUMS DHS are now taking steps to use DHS enumeration cluster coordinates to link a limited set of climate variables—including monthly and long-term average precipitation—to individual and household records within the DHS.1

However, in many cases, these pre-calculated time series may not meet researchers’ needs because they reflect only average conditions and not the experience of the specific surveyed households. To obtain a more nuanced understanding of a survey response’s environmental context, some researchers may choose to calculate and link climate variables themselves. We introduced some methods for doing so in our recent post, where we demonstrated how to integrate raw precipitation data from the Climate Hazards Center InfraRed Precipitation with Station Dataset (CHIRPS) with survey data from IPUMS DHS.

This post aims to serve as a guide for new researchers wading into research on the health impacts of climate events. While we focus on rainfall, it is important to note that there are significant interactions between rainfall and other variables such as temperature and vegetation. Furthermore, rainfall alone may be a crude measure; for instance, Grace et al. (2021) highlight multiple and nuanced ways in which precipitation can impact health.2 Nevertheless, this blog post outlines key concepts to consider when thinking through the health impacts of precipitation patterns, selecting a data source, and operationalizing precipitation data. We draw examples from recent publications that utilize DHS data; however, their inclusion does not constitute an endorsement of the publications themselves.

Impacts of rainfall on human health

Rainfall can impact human health in both direct and indirect ways, with important consequences when linking precipitation data to survey responses. For instance, researchers interested in indirect mechanisms may need to incorporate a time lag between climate events and survey response dates, since the impact of rainfall events in one period may not appear until later.3

Below, we outline several of the most common pathways through which precipitation impacts health outcomes.

Disasters and precipitation extremes

Direct and immediate impacts on human health

Extreme rainfall can result in flooding or landslides, which may lead to death or injury. Furthermore, extreme events often contribute to indirect health impacts through the other pathways described below.

Agriculture and nutrition

Indirect impacts on human health

Rainfall directly impacts crop growth and agricultural productivity, which then affects food security and nutrition. It takes time for plants to grow, so the impacts on health and well-being may not be immediate.

Precipitation is not the only variable that can be used to explore these pathways; Normalized Difference Vegetation Index (NDVI) can also be used as a proxy for food security.2

Infectious diseases

Indirect impacts on human health

Rainfall can impact the transmission of enteric disease like cholera (through its effects on sanitation practices, water quality, and hygiene) as well as vector-borne diseases like malaria and dengue (through its effects on water supply and vector habitats).

Ideally, researchers would be able to explore these pathways through more proximate measures of transmission risks like disease incidence. However, in resource-poor settings these data may not be available at fine temporal and spatial resolutions.

Crime and violence

Indirect impacts on human health

Rainfall shocks may be linked to crime and violence through their impacts on economic activities, stress, and competition for limited resources. For instance, rainfall impacts on economic security may lead to herder and farmer conflicts.

Choosing a rainfall data source

The specific research question and estimation strategy should play a central role in determining the most suitable rainfall data source. Researchers have various factors to consider when making this choice, including the available time span4,5, the manner in which data were collected (for instance, accurate ground-based station data versus satellite data),6 and the extent of coverage for the geographic location of interest.7

Data accuracy

It’s important for researchers to recognize the trade-offs between data accuracy and coverage. In many cases, more accurate products will have more limited geographic coverage, so researchers with a small area of interest may be able to use a more accurate data product.

Temporal resolution

Researchers must also consider the temporal resolution of the data (e.g. hourly, daily, monthly, etc.). An appropriate temporal resolution is likely to depend on the characteristics of the survey data that will be joined to the environmental data. For instance, it may not be necessary to use hourly precipitation data when linking to the DHS, since survey response dates are often provided only at the monthly level. However, it is always possible to aggregate fine-grained data to a larger temporal scale, and fine-grained data may provide more flexibility in the way aggregation is carried out.

Spatial resolution

The majority of rainfall data is available in raster format, wherein precipitation data are stored in a grid of cells, each with a particular precipitation value. If the grid cells are very large and encompass many DHS clusters, there won’t be enough variation to exploit in an analysis. On the other hand, if the grid cells are less than 10 kilometers across (a plausible buffer size around DHS cluster coordinates), then their values will need to be aggregated within each DHS cluster region.

Data accessibility

Of course, data availability and accessibility often drive data source decisions. Researchers may opt to use data that they or collaborators have previously used or have easy access to in order to reduce the difficulty of setting up an analysis with a new source.

Comparison of Selected Precipitation Data Sources
Name Finest Resolution Geo. Range Temp. Range Time Step Rainfall Data Source
CHIRPS 0.05° 50°S-50°N (all longitudes) 1981-near present Daily, pentad, dekad, monthly, 2-monthly, 3-monthly, annual Weather station records and geostationary satellite observations
CRU TS 0.5° All land except Antarctica 1901-2022 Monthly Weather station records
UDEL-TS 0.5° 89.75°N-89.75°S, 0.25°E-359.75°E 1901-2014 Monthly Weather station records
GPCC 0.25° 90.0°N-90.0°S, 0.0°E-360.0°E 1891-2019 Monthly Weather station records
ERA5 0.25° Global 1940-present Hourly, monthly Reanalysis (model estimates from satellite data assimilation)

Measuring rainfall

Scholars frequently aggregate total rainfall over an interval (for instance, to calculate annual, seasonal, or monthly precipitation summaries). Additionally, researchers often generate long-term rainfall averages for use in identifying anomalies and extreme rainfall events.3 It is imperative that researchers justify their choice of temporal aggregation.

In this section, we summarize several publications that highlight different techniques for measuring rainfall.

Total precipitation

Studies that simply use raw precipitation totals are able to test whether additional rainfall has a negative or positive impact on the outcome of interest.8

Total precipitation during the growing season

Some studies restrict their focus to a region’s growing season to better assess the impacts of climate conditions on agricultural production and—indirectly—nutrition and food security.9

Total rainy season precipitation

Specific details of the geographical area of interest may inform the manner in which data are aggregated. For instance, Randell, Gray, and Shayo (2022) take region-specific rainfall patterns into account in research on Tanzania, where depending on the region there might be one (Msimu) or two rainy seasons (Masika and Vuli) during the year.10

Rainfall variability and anomalies

Measures of rainfall variability consider how rainfall at a specific time and location compares to the long-term average rainfall in that same location. However, several measures of variability exist, and information should be provided to explain how the reference period was chosen and how results change if the reference period changes.

Annual variability

  • Rainfall deviation percentile can be used to quantify how a specific rainfall event compares to historical data for that same location. For instance, in Epstein et al. (2023), the 50th percentile represents a year with median rainfall levels compared to 29 previous years; numbers closer to 0 represent drier than average years and closer to 1 represent wetter years.11

  • Rainfall Z-scores quantify how many standard deviations a rainfall event differs from the mean. Negative Z-scores indicate below-average rainfall while positive Z-scores indicate above-average rainfall.12

  • Standardized Precipitation Index (SPI) is similar to the Z-score but first corrects for the skew found in rainfall distributions by transforming the data using a gamma distribution.5

Seasonal variability

Rainfall is often seasonal, and depending on the health pathway of interest, researchers should think critically about how seasons may impact their analyses. Rainfall during the dry season is likely to have a different effect than rainfall in the wet season, for instance.13

  • Randell et al. (2021) calculate Z-scores for 2015 monsoon rainfall based on monsoon rainfalls during 1980-2015 reference period.14

  • Omiat and Shively (2020) calculate deviation during the main rainfall season of the survey year and of the previous year.15

  • Abiona (2017) calculates the percentage deviation from the mean agricultural season average. Specifically, this was generated using the natural logarithm of the current agricultural season minus the 30-year historical average for the same locality.4

Climate variability during specific windows in the respondent’s life course

Environmental analyses should be tied to survey responses during specific exposure windows to determine the impact of extremes.3 For instance, research on the impact of precipitation on weight at birth may want to focus on precipitation anomalies during the months of gestation. Or, if gestational length is unknown, researchers may choose to use anomalies during the 12 months prior to birth. It is often necessary to make approximations like this based on the available data.

  • In studying in-utero rainfall variability, Le and Nguyen (2021) use “the deviation of the nine-month in-utero rainfall from the long run average of total rainfall during those nine months”.6 In this case, the long-run average was based on data from 1981-2018. The authors further dichotomized their results in order to more easily summarize the impact of wet versus dry shocks.

Precipitation extremes: Floods and droughts

It is uncommon for researchers to use rainfall data alone to define a flood event. Instead, an indicator for flood is generated based on precipitation anomalies. Consequently, we recommend using the term extreme rainfall rather than flood event when relying solely on rainfall data.

Researchers often use rainfall data alone to identify droughts, though in these instances they are really capturing meteorological droughts.16 Depending on the type of drought of interest, researchers may want to use a combination of temperature, rainfall, and other data sources to identify droughts associated with low soil moisture (agricultural droughts), low ground water, or surface runoff. As with all environmental metrics, researchers should think critically about time scales when operationalizing drought; a drought lasting 1 month is likely to have a very different effect than a drought lasting 6 or 12 months.

Specific drought indices are also available to identify drought conditions. The Standardized Precipitation Evaporation Index (SPEI) takes into account rainfall as well as evaporation, and the Palmer Drought Severity Index (PDSI) includes rainfall, evapotranspiration, and runoff.

Below we highlight some ways researchers using DHS data have used rainfall data to quantify the impacts of extreme precipitation. These examples illustrate that there are a wide range of cutoffs used to identify droughts. While this makes comparisons difficult, it’s appropriate that drought should be defined specific to the place, time, and water use patterns of a given region.

  • Extreme rainfall: rainfall deviation ≥ 90th percentile17

  • Flood: rainfall deviation > 75th percentile4

  • Drought: rainfall deviation is < 25th percentile4

  • Drought: binary variable for rainfall ≤ 15th percentile18

  • Drought: SPI values < -1.55

  • Drought: classified as ordinal categorical variable:19

    • Severe (≤ 10 percentile)
    • Mild/moderate (> 10th percentile to ≤ 30th percentile)
    • None (> 30th percentile)

Context matters

How might the same shock have different effects in different regions?

Studies that span geographic areas with heterogeneous climatic zones and other characteristics often stratify or interact rainfall with regional characteristics. This is because the same rainfall event most likely does not have the same impact on outcomes in areas with different geographic features (for instance, in arid versus non arid areas,20 rural versus urban areas,21 or on individuals with different livelihoods22).

This highlights the most important point when working with climate data: no one-size-fits-all approach exists. The manner in which data are selected, operationalized, and analyzed must be consistently informed by the physical and cultural specifics of the geographic region under consideration.

We hope that these drops of wisdom are a good starting point for your research.

Getting Help

Questions or comments? Check out the IPUMS User Forum or reach out to IPUMS User Support at ipums@umn.edu.


1. Boyle, E. H., King, M. L., Garcia, S., Culver, C., & Bourdeaux, J. (2020). Contextual data in IPUMS DHS: Physical and social environment variables linked to the Demographic and Health Surveys. Population and Environment, 41(4), 529–549. https://doi.org/10.1007/s11111-020-00348-4
2. Grace, K., Verdin, A., Dorélien, A., Davenport, F., Funk, C., & Husak, G. (2021). Exploring strategies for investigating the mechanisms linking climate and individual-level child health outcomes: An analysis of birth weight in Mali. Demography, 58(2), 499–526. https://doi.org/10.1215/00703370-8977484
3. Grace, K., Billingsley, S., & Van Riper, D. (2020). Building an interdisciplinary framework to advance conceptual and technical aspects of population-environment research focused on women’s and children’s health. Social Science & Medicine, 250. https://doi.org/10.1016/j.socscimed.2020.112857
4. Abiona, O. (2017). Adverse effects of early life extreme precipitation shocks on short-term health and adulthood welfare outcomes. Review of Development Economics, 21(4), 1229–1254. https://doi.org/10.1111/rode.12310
5. Hyland, M., & Russ, J. (2019). Water as destiny–The long-term impacts of drought in sub-Saharan Africa. World Development, 115, 30–45. https://doi.org/10.1016/j.worlddev.2018.11.002
6. Le, K., & Nguyen, M. (2021). In-utero exposure to rainfall variability and early childhood health. World Development, 144. https://doi.org/10.1016/j.worlddev.2021.105485
7. Randell, H., & Gray, C. (2019). Climate change and educational attainment in the global tropics. Proceedings of the National Academy of Sciences, 116(18), 8840–8845. https://doi.org/10.1073/pnas.1817480116
8. Mukabutera, A., Thomson, D., Murray, M., Basinga, P., Nyirazinyoye, L., Atwood, S., Savage, K. P., Ngirimana, A., & Hedt-Gauthier, B. L. (2016). Rainfall variation and child health: Effect of rainfall on diarrhea among under 5 children in Rwanda, 2010. BMC Public Health, 16, 1–9. https://doi.org/10.1186/s12889-016-3435-9
9. Grace, K., Davenport, F., Hanson, H., Funk, C., & Shukla, S. (2015). Linking climate change and health outcomes: Examining the relationship between temperature, precipitation and birth weight in Africa. Global Environmental Change, 35, 125–137. https://doi.org/10.1016/j.gloenvcha.2015.06.010
10. Randell, H., Gray, C., & Shayo, E. H. (2022). Climatic conditions and household food security: Evidence from Tanzania. Food Policy, 112. https://doi.org/10.1016/j.foodpol.2022.102362
11. Epstein, A., Harris, O. O., Benmarhnia, T., Camlin, C. S., & Weiser, S. D. (2023). Do precipitation anomalies influence short-term mobility in sub-Saharan Africa? An observational study from 23 countries. BMC Public Health, 23(1), 377. https://doi.org/10.1186/s12889-023-15264-z
12. Thiede, B. C., & Strube, J. (2020). Climate variability and child nutrition: Findings from sub-Saharan Africa. Global Environmental Change, 65. https://doi.org/10.1016/j.gloenvcha.2020.102192
13. Bandyopadhyay, S., Kanji, S., & Wang, L. (2012). The impact of rainfall and temperature variation on diarrheal prevalence in sub-Saharan Africa. Applied Geography, 33, 63–72. https://doi.org/10.1016/j.apgeog.2011.07.017
14. Randell, H., Jiang, C., Liang, X., Murtugudde, R., & Sapkota, A. (2021). Food insecurity and compound environmental shocks in Nepal: Implications for a changing climate. World Development, 145. https://doi.org/10.1016/j.worlddev.2021.105511
15. Omiat, G., & Shively, G. (2020). Rainfall and child weight in Uganda. Economics & Human Biology, 38. https://doi.org/10.1016/j.ehb.2020.100877
16. Mishra, A. K., & Singh, V. P. (2010). A review of drought concepts. Journal of Hydrology, 391(1), 202–216. https://doi.org/10.1016/j.jhydrol.2010.07.012
17. Uttajug, A., Ueda, K., Seposo, X., & Francis, J. M. (2023). Association between extreme rainfall and acute respiratory infection among children under-5 years in sub-Saharan Africa: An analysis of Demographic and Health Survey data, 2006–2020. BMJ Open, 13(4). https://doi.org/10.1136/bmjopen-2023-071874
18. Nagata, J. M., Epstein, A., Ganson, K. T., Benmarhnia, T., & Weiser, S. D. (2021). Drought and child vaccination coverage in 22 countries in sub-Saharan Africa: A retrospective analysis of national survey data from 2011 to 2019. PLOS Medicine, 18(9). https://doi.org/10.1371/journal.pmed.1003678
19. Epstein, A., Bendavid, E., Nash, D., Charlebois, E. D., & Weiser, S. D. (2020). Drought and intimate partner violence towards women in 19 countries in sub-Saharan Africa during 2011-2018: A population-based study. PLOS Medicine, 17(3). https://doi.org/10.1371/journal.pmed.1003064
20. Kudamatsu, M., Persson, T., & Strömberg, D. (2012). Weather and infant mortality in Africa. CEPR Discussion Paper No. DP9222. https://papers.ssrn.com/abstract=2210191
21. Marteleto, L. J., Maia, A. G., & Rodrigues, C. G. (2023). Climate and fertility amid a public health crisis. Population Studies, 77(3), 437–458. https://doi.org/10.1080/00324728.2023.2228288
22. Davenport, F., Dorélien, A., & Grace, K. (2020). Investigating the linkages between pregnancy outcomes and climate in sub-Saharan Africa. Population and Environment, 41(4), 397–421. https://doi.org/10.1007/s11111-020-00342-w