Measuring Population Density

An introduction to gridded data sources used to measure human presence across the globe

Population
Population density
Urbanization
Data sources
GRUMP
GHSL
LandScan
WorldPop
Research concepts
Author
Affiliation

Grace Cooper

Senior Data Analyst, IPUMS

Published

August 30, 2024

Researchers interested in understanding population distribution in the Global South are faced with a dilemma: data identifying where communities are and how many people live in them often does not exist. National censuses commonly report populations in aggregates, at province, district, or county levels—spatial resolutions much too coarse for studies of climate change or health.

Datasets that estimate how and where many people live (population density layers) do exist, but differ in fundamental ways, and the choice of which dataset to use depends on the scale and scope of the research question. For instance, some datasets are better at identifying built-up areas, while others excel at measuring rural populations. Similarly, researchers who need to understand how many people live in peri-urban areas would not benefit from a dataset that uses a dichotomous urban/rural variable to define built-up areas.

Furthermore, researchers studying change in population over time should be aware of comparability issues since some data providers update their methods regularly, making their datasets less comparable from year to year. Still, gridded population data are often the most effective tool for researchers to understand large-scale population distribution and explore human-environment relationships. Given the nuance of these datasets, how can we effectively include population density in our analyses?

As a start, this post will introduce several of the most widely-used population datasets:

Below, discuss each source’s history, input data layers, and methods used in its development. This can serve as a starting point for researchers to explore each dataset in more depth to determine which may be most appropriate for their specific research area.

Background

As fields of study related to climate, health, and population grew in the 1970s and 80s, researchers began to recognize that a deeper understanding of population distribution was needed. Censuses are costly and populations are dynamic; it became apparent that datasets estimating where people live needed to be combined with census data.

The need to fill the gap in population data availability led to the development of the first global gridded population dataset in the 1990s (see Figure 1): the Gridded Population of the World (GPW).13 By combining population counts from censuses at second-administrative levels (i.e. counties and districts) with spatially-explicit administrative boundary data, researchers created a raster dataset that depicts population counts at 5-minute grid resolution (roughly 10km at the equator) on a continuous surface of the earth.4 This data enabled researchers studying climate (data that are commonly reported as a gridded surface) to incorporate population into their analyses without aggregating climate data to subnational administrative units.

Figure 1: Data publication timeline

Since then, advancements in the technology used to collect remote sensing data, such as topography and land cover, have improved identification of urban areas.5 Increased computing power, improved spatial accuracy, and use of additional datasets such as night-time light satellite imagery, land cover, and digital elevation models (called ancillary data) have enabled researchers to advance the methods used to redistribute census population counts to grid cells at high resolution and accuracy.6 As a result, several gridded population datasets have been developed using different methods and input data, leading to datasets with varying spatial and population accuracy depending on the context. Below, we’ll highlight some of the similarities and differences between the prominent gridded population datasets that have been developed and identify appropriate use-case scenarios.

Common population density data sources

A comparison of the spatial resolution, temporal availability, data/methods used, and best use cases for each of the four datasets we’re highlighting in this post are summarized in Table 1 below, while Figure 1 (above) outlines the dates of first publication for each dataset. Without a proper understanding of how these data products are developed and their ideal use-cases, the research community is more likely to choose the wrong dataset for their project.7 In the subsequent sections, we introduce each dataset in additional detail.

Global Rural-Urban Mapping Project (GRUMP)

GRUMP is a series of global gridded population counts and densities created by Columbia University’s Centre for International Earth Science Information Network (CIESIN) in 2004, available for years 1990, 1995, and 2000.8 Using lightly-modeled dasymetric methods based on GPW population and NOAA night-time lights, it indicates the locations of urban settlements and delineates the spatial extents of urban areas at 30 arc-second (roughly 1 kilometer at the equator) resolution.6 In fact, it was the first global gridded population dataset to define urban areas and was widely used for two reasons:

  • its urban footprint is based on stable-city lights which is an inclusive measure of urban areas
  • the dichotomous definition of the urban footprint (urban or rural) is simple for researchers to use9

GRUMP’s popularity waned upon the creation of higher-resolution datasets that became available for more recent years, including the Global Human Settlement Layer (GHSL) created by the European Commission Joint Research Centre (JRC) in 2010.

Global Human Settlement Layer (GHSL)

Available in 5-year intervals from 1975 to 2020, GHSL data products are provided at a spatial resolution of 3 arc-seconds (roughly 100 meters at the equator). GHSL utilizes Landsat imagery (prior to 2014) and Sentinel-2 composite imagery (2014-forward) in an image classification method called Symbolic Machine Learning (SML) to map urban land cover and classify population and land area into seven classes along an urban-rural continuum.9,10

The GHSL population data product is made up of three datasets: GHS-POP (population size), GHS-BUILT (built-up areas), and GHS-SMOD (a Degree of Urbanization model grid that delineates settlement types using GHS-POP and GHS-BUILT).11

GHSL has been widely used to study the interactions between climate, health, and populations. For example, Pinchoff and colleagues12 used GHS-BUILT to examine the impact of urbanicity on health in Tanzania. McGranahan and colleagues13 used GHS-SMOD to calculate urbanization rates in low-elevation coastal zones and estimate the effect of urbanization on climate change in deltaic regions. In another study, the authors examined spatial accessibility to healthcare in sub-Saharan Africa (SSA) using GHS-SMOD.14

LandScan

While it is a lower resolution data product, LandScan is another prominent global population distribution dataset that provides ambient (24-hour average) population distribution at 30 arc-seconds (roughly 1 kilometer at the equator) resolution, developed in 1998 by the Oak Ridge National Laboratory in Tennessee.15,16

In contrast to GRUMP and GHSL—which provide data at 5-year intervals—LandScan is available annually from 2000 through 2022. LandScan utilizes night-time light satellite imagery along with spatial information about elevation (Digital Elevation Models), slope (Digital Terrain Models), land cover, and populated place vector data within a multivariable dasymetric model that uses machine learning to assign the likelihood of population occurrence to each cell.

LandScan differs from other datasets in that it provides ambient population distribution, depicting not only where people sleep but where they travel, work, and socialize over a 24-hour period. While this may be an important distinction for some researchers, it sets the dataset apart from others, making it less comparable. Researchers should also take caution when comparing multiple versions of LandScan data—regular updates to the input data and distribution algorithm can cause comparability issues between versions.

WorldPop

WorldPop is another global gridded population distribution dataset that provides annual estimates from 2000 through 2020; however, WorldPop is provided at a higher resolution of 3 arc-seconds (roughly 100 meters at the equator). Developed by the University of Southampton in 2013, WorldPop uses Sentinel-1 and Sentinel-2 imagery in combination with spatial information on impervious surfaces (SRTM) and slope (DEM) in a dasymetric model with machine learning to estimate population distribution.17

WorldPop was created by combining three continental-scale population datasets:

  • AfriPop (developed in 2009)
  • AsiaPop (developed in 2012)
  • AmeriPop (developed in 2013)

WorldPop’s population estimates are fractional (not integer), which can produce difficulties in interpretation since individual persons obviously cannot be divided up over an area. Its prediction model is good at estimating activity space with the caveat that it can overestimate population density in peri-urban areas while under-estimating population densities in urban centers.

Table 1: Comparison of Gridded Population Datasets
Global Urban-Rural Mapping Project (GRUMP) Global Human Settlement Layer (GHSL) LandScan WorldPop
Spatial Resolution 30 arc-seconds (~1 km) 3 arc-seconds (~100m) 30 arc-seconds (~1km) 3 arc-seconds (~100m)
Temporal Availability 1990, 1995, 2000 1975-2020 (5-year intervals) 2000-2022 (annual) 2000-2020 (annual)
Organization CIESIN, Columbia University European Commission Oak Ridge National Laboratory University of Southampton
Ancillary Data
  • DMSP-OLS night-time light imagery
  • Tactical pilotage charts
  • Sentinel-2 composite and Landsat imagery
  • Topography (DEM & SRTM)
  • Road surfaces (OpenStreetMap)
  • DMSP-OLS night-time light imagery
  • Advanced Very High Resolution Radiometry (AVHRR) satellite imagery
  • Building characteristics (DEM)
  • Slope (NIMA Digital Terrain Elevation)
  • Global Land Cover Characteristics Database
  • Populated places vector (NIMA VMAP)
  • Sentinel-1 & Sentinel-2 combination: ESA CCI land cover 300m annual global land cover time-series
  • Impervious surface (SRTM)
  • Slope (DEM)
Methods Area-weighted reallocation Dasymetric modeling with Symbolic Machine Learning (SML) is used to combine built-up areas (GHS-BUILT) and population size (GHS-POP) to create a settlement model (GHS-SMOD) based on Degree of Urbanization (DoU), which includes seven classes along an urban-rural continuum. Dasymetric modeling with machine learning (ML). Likelihood of population occurrence in a particular cell is modeled based on probability coefficients, including roads (weighted by distance from cells to roads), slope (weighted by favorable slope categories), land cover (weighted by type and applying exclusions), and night-time lights (weighted by frequency). Dasymetric modeling with machine learning (ML). A random forest prediction model is used to create a weight layer.
Advantages The first of its kind, GRUMP was the ideal population distribution dataset used by researchers prior to 2010. High resolution. Very accurate in areas with higher development. Not a dichotomous urban/rural variable, it allows re searchers to identify peri-urban areas. Depicts ambient (24-hour average) population distribution, which captures not only where people sleep but where they travel, work, and socialize. Data available annually. High resolution. Population density raster data allows researchers to define urban, peri-urban, and rural areas based on density percentages that are appropriate for that area.
Limitations Data only available at 5-year intervals and not updated for recent years. Since it is based on proportional reallocation, it relies on accurate subnational population projections and a reliable nigh-time lights dataset. Low degrees of accuracy in rural areas. Data only available at 5-year intervals. Not comparable with other datasets due to its ambient nature. Regular updates to distribution algorithms makes it inadvisable to compare different versions of the dataset. Population estimates are fractional (not integer), but people cannot exist over an entire area. Potential overestimates of population density in peri-urban areas and underestimates in urban centers.

Comparing sources

Representation of global population distribution is an increasingly important topic for researchers interested in human-environment interactions. It is vital that studies consider the advantages and limitations of the dataset they choose, as it has implications for statistical model outputs which can influence the conclusions drawn from their results. Researchers may benefit from downloading multiple datasets and comparing population density estimates based on each dataset for their study area. An example comparison is shown in Figure 2, a population density map of Nairobi, Kenya, depicting the spatial extent of the Nairobi urban center based on the GHS settlement model (GHS-SMOD) in an overlay with WorldPop population density. As the map shows, the spatial extent of the urban area is not in complete agreement between the two datasets.

Figure 2: Map comparing the delineation of urban population density in Nairobi, Kenya using GHSL and WorldPop

In future posts, we’ll demonstrate methods to work with some of these datasets in R!

Getting Help

Questions or comments? Check out the IPUMS User Forum or reach out to IPUMS User Support at ipums@umn.edu.

References

1. Tobler, W., Deichmann, U., Gottsegen, J., & Maloy, K. (1995). The global demography project (Technical Report 95-6). National Center for Geographic Information and Analysis. https://escholarship.org/uc/item/0kt69058
2. Deichmann, U. (1996). A review of spatial population database design and modeling (Technical Report 96-3). National Center for Geographic Information and Analysis.
3. Center for International Earth Science Information Network - CIESIN - Columbia University. (2018). Gridded population of the world, version 4.11 (GPWv4): Population count. Revision 11. Palisades, NY: NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/H4JW8BX5
4. Balk, D. L., Deichmann, U., Yetman, G., Pozzi, F., Hay, S. I., & Nelson, A. (2006). Determining global population distribution: Methods, applications and data. Advances in Parasitology, 62, 119–156. https://doi.org/10.1016/S0065-308X(05)62004-0
5. Pesaresi, M., Huadong, G., Blaes, X., Ehrlich, D., Ferri, S., Gueguen, L., Halkia, M., Kauffmann, M., Kemper, T., Lu, L., Marin-Herrera, M. A., Ouzounis, G. K., Scavazzon, M., Soille, P., Syrris, V., & Zanchetta, L. (2013). A global human settlement layer from optical HR/VHR RS data: Concept and first results. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 6(5), 2102–2131. https://doi.org/10.1109/JSTARS.2013.2271445
6. Leyk, S., Gaughan, A. E., Adamo, S. B., Sherbinin, A. D., Balk, D., Freire, S., Rose, A., Stevens, F. R., Blankespoor, B., Frye, C., Comenetz, J., Sorichetta, A., Macmanus, K., Pistolesi, L., Levy, M., Tatem, A. J., & Pesaresi, M. (2019). The spatial allocation of population: A review of large-scale gridded population data products and their fitness for use. Earth System Science Data, 11(3), 1385–1409. https://doi.org/10.5194/ESSD-11-1385-2019
7. Leyk, S., Uhl, J. H., Balk, D., & Jones, B. (2018). Assessing the accuracy of multi-temporal built-up land layers across rural-urban trajectories in the United States. Remote Sensing of Environment, 204, 898–917. https://doi.org/10.1016/J.RSE.2017.08.035
8. Center for International Earth Science Information Network - CIESIN - Columbia University, International Food Policy Research Institute - IFPRI, The World Bank and Centro Internacional de Agricultura Tropical - CIAT. (2011). Global rural–urban mapping project, version 1 (GRUMPv1): Urban extents grid. Palisades, NY: NASA Socioeconomic Data; Applications Center (SEDAC). https://doi.org/10.7927/H4GH9FVG
9. Macmanus, K., Balk, D., Engin, H., Mcgranahan, G., & Inman, R. (2021). Estimating population and urban areas at risk of coastal hazards, 1990-2015: How data choices matter. Earth System Science Data, 13(12), 5747–5801. https://doi.org/10.5194/ESSD-13-5747-2021
10. Pesaresi, M., Corbane, C., Julea, A., Florczyk, A. J., Syrris, V., & Soille, P. (2016). Assessment of the added-value of Sentinel-2 for detecting built-up areas. Remote Sensing, 8(4), 299. https://doi.org/10.3390/RS8040299
11. Florczyk, A. J., Corbane, C., Ehrlich, D., Freire, S., Kemper, T., Maffenini, L., Melchiorri, M., Pesaresi, M., Politis, P., Schiavina, M., Sabo, F., & Zanchetta, L. (2019). GHSL data package 2019. Publications Office of the European Union, Luxembourg, 2019. https://doi.org/10.2760/290498
12. Pinchoff, J., Mills, C. W., & Balk, D. (2020). Urbanization and health: The effects of the built environment on chronic disease risk factors among women in Tanzania. PLOS ONE, 15(11), e0241810. https://doi.org/10.1371/journal.pone.0241810
13. McGranahan, G., Balk, D., Colenbrander, S., Engin, H., & MacManus, K. (2023). Is rapid urbanization of low-elevation deltas undermining adaptation to climate change? A global review. Environment and Urbanization, 35(2), 527–559. https://doi.org/10.1177/09562478231192176
14. Florio, P., Freire, S., & Melchiorri, M. (2023). Estimating geographic access to healthcare facilities in Sub-Saharan Africa by degree of urbanisation. Applied Geography, 160, 103118. https://doi.org/10.1016/J.APGEOG.2023.103118
15. Dobson, J. E. (2000). LandScan: A global population database for estimating populations at risk. Photogrammetric Engineering & Remote Sensing, 66(7), 849–857. https://www.researchgate.net/publication/267450852
16. Rose, A. N., & Bright, E. (2014). The LandScan global population distribution project: Current state of the art and prospective innovation. Oak Ridge National Laboratory.
17. Thomson, D. R., Leasure, D. R., Bird, T., Tzavidis, N., & Tatem, A. J. (2022). How accurate are WorldPop-Global-Unconstrained gridded population data at the cell-level?: A simulation analysis in urban Namibia. PLOS ONE, 17(7), e0271504. https://doi.org/10.1371/journal.pone.0271504