These functions are analogous to dplyr's joins, except that:
They operate on a data frame and an
sf
objectThey retain the variable attributes provided in IPUMS files and loaded by ipumsr data-reading functions
They handle minor incompatibilities between attributes in spatial and tabular data that emerge in some IPUMS files
Usage
ipums_shape_left_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
ipums_shape_right_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
ipums_shape_inner_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
ipums_shape_full_join(
data,
shape_data,
by,
suffix = c("", "SHAPE"),
verbose = TRUE
)
Arguments
- data
A tibble or data frame. Typically, this will contain data that has been aggregated to a specific geographic level.
- shape_data
An
sf
object loaded withread_ipums_sf()
.- by
Character vector of variables to join by. See
dplyr::left_join()
for syntax.- suffix
If there are non-joined duplicate variables in the two data sources, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
Defaults to adding the
"SHAPE"
suffix to duplicated variables inshape_file
.- verbose
If
TRUE
, display information about any geometries that were unmatched during the join.
Examples
data <- read_nhgis(
ipums_example("nhgis0972_csv.zip"),
verbose = FALSE
)
sf_data <- read_ipums_sf(ipums_example("nhgis0972_shape_small.zip"))
joined_data <- ipums_shape_inner_join(data, sf_data, by = "GISJOIN")
colnames(joined_data)
#> [1] "GISJOIN" "YEAR" "STUSAB" "CMSA" "DIVISIONA"
#> [6] "MSA_CMSAA" "PMSA" "PMSAA" "REGIONA" "STATEA"
#> [11] "AREALAND" "AREAWAT" "ANPSADPI" "FUNCSTAT" "INTPTLAT"
#> [16] "INTPTLNG" "PSADC" "D6Z001" "D6Z002" "D6Z003"
#> [21] "D6Z004" "D6Z005" "D6Z006" "D6Z007" "D6Z008"
#> [26] "PMSASHAPE" "MSACMSA" "ALTCMSA" "GISJOIN2" "SHAPE_AREA"
#> [31] "SHAPE_LEN" "GISJOIN3" "geometry"