Skip to contents

Define the parameters of an IPUMS NHGIS extract request to be submitted via the IPUMS API.

Use get_metadata_nhgis() to browse and identify data sources for use in NHGIS extract definitions. For general information, see the NHGIS data source overview and the FAQ.

Learn more about the IPUMS API in vignette("ipums-api") and NHGIS extract definitions in vignette("ipums-api-nhgis").

Usage

define_extract_nhgis(
  description = "",
  datasets = NULL,
  time_series_tables = NULL,
  shapefiles = NULL,
  geographic_extents = NULL,
  breakdown_and_data_type_layout = NULL,
  tst_layout = NULL,
  data_format = NULL
)

Arguments

description

Description of the extract.

datasets

List of dataset specifications for any datasets to include in the extract request. Use ds_spec() to create a ds_spec object containing a dataset specification. See examples.

time_series_tables

List of time series table specifications for any time series tables to include in the extract request. Use tst_spec() to create a tst_spec object containing a time series table specification. See examples.

shapefiles

Names of any shapefiles to include in the extract request.

geographic_extents

Vector of geographic extents to use for all of the datasets in the extract definition (for instance, to obtain data within a particular state). Use "*" to select all available extents.

Required when any of the datasets included in the extract definition include geog_levels that require extent selection. See get_metadata_nhgis() to determine if a geographic level requires extent selection. At the time of writing, NHGIS supports extent selection only for blocks and block groups.

breakdown_and_data_type_layout

The desired layout of any datasets that have multiple data types or breakdown values.

  • "single_file" (default) keeps all data types and breakdown values in one file

  • "separate_files" splits each data type or breakdown value into its own file

Required if any datasets included in the extract definition consist of multiple data types (for instance, estimates and margins of error) or have multiple breakdown values specified. See get_metadata_nhgis() to determine whether a requested dataset has multiple data types.

tst_layout

The desired layout of all time_series_tables included in the extract definition.

  • "time_by_column_layout" (wide format, default): rows correspond to geographic units, columns correspond to different times in the time series

  • "time_by_row_layout" (long format): rows correspond to a single geographic unit at a single point in time

  • "time_by_file_layout": data for different times are provided in separate files

Required when an extract definition includes any time_series_tables.

data_format

The desired format of the extract data file.

  • "csv_no_header" (default) includes only a minimal header in the first row

  • "csv_header" includes a second, more descriptive header row.

  • "fixed_width" provides data in a fixed width format

Note that by default, read_nhgis() removes the additional header row in "csv_header" files.

Required when an extract definition includes any datasets or time_series_tables.

Value

An object of class nhgis_extract containing the extract definition.

Details

An NHGIS extract definition must include at least one dataset, time series table, or shapefile specification.

Create an NHGIS dataset specification with ds_spec(). Each dataset must be associated with a selection of data_tables and geog_levels. Some datasets also support the selection of years and breakdown_values.

Create an NHGIS time series table specification with tst_spec(). Each time series table must be associated with a selection of geog_levels and may optionally be associated with a selection of years.

See examples or vignette("ipums-api-nhgis") for more details about specifying datasets and time series tables in an NHGIS extract definition.

See also

get_metadata_nhgis() to find data to include in an extract definition.

submit_extract() to submit an extract request for processing.

save_extract_as_json() and define_extract_from_json() to share an extract definition.

Examples

# Extract definition for tables from an NHGIS dataset
# Use `ds_spec()` to create an NHGIS dataset specification
nhgis_extract <- define_extract_nhgis(
  description = "Example NHGIS extract",
  datasets = ds_spec(
    "1990_STF3",
    data_tables = "NP57",
    geog_levels = c("county", "tract")
  )
)

nhgis_extract
#> Unsubmitted IPUMS NHGIS extract 
#> Description: Example NHGIS extract
#> 
#> Dataset: 1990_STF3
#>   Tables: NP57
#>   Geog Levels: county, tract

# Use `tst_spec()` to create an NHGIS time series table specification
define_extract_nhgis(
  description = "Example NHGIS extract",
  time_series_tables = tst_spec("CL8", geog_levels = "county"),
  tst_layout = "time_by_row_layout"
)
#> Unsubmitted IPUMS NHGIS extract 
#> Description: Example NHGIS extract
#> 
#> Time Series Table: CL8
#>   Geog Levels: county

# To request multiple datasets, provide a list of `ds_spec` objects
define_extract_nhgis(
  description = "Extract definition with multiple datasets",
  datasets = list(
    ds_spec("2014_2018_ACS5a", "B01001", c("state", "county")),
    ds_spec("2015_2019_ACS5a", "B01001", c("state", "county"))
  )
)
#> Unsubmitted IPUMS NHGIS extract 
#> Description: Extract definition with multiple datasets
#> 
#> Dataset: 2014_2018_ACS5a
#>   Tables: B01001
#>   Geog Levels: state, county
#> 
#> Dataset: 2015_2019_ACS5a
#>   Tables: B01001
#>   Geog Levels: state, county

# If you need to specify the same table or geographic level for
# many datasets, you may want to make a set of datasets before defining
# your extract request:
dataset_names <- c("2014_2018_ACS5a", "2015_2019_ACS5a")

dataset_spec <- purrr::map(
  dataset_names,
  ~ ds_spec(
    .x,
    data_tables = "B01001",
    geog_levels = c("state", "county")
  )
)

define_extract_nhgis(
  description = "Extract definition with multiple datasets",
  datasets = dataset_spec
)
#> Unsubmitted IPUMS NHGIS extract 
#> Description: Extract definition with multiple datasets
#> 
#> Dataset: 2014_2018_ACS5a
#>   Tables: B01001
#>   Geog Levels: state, county
#> 
#> Dataset: 2015_2019_ACS5a
#>   Tables: B01001
#>   Geog Levels: state, county

# You can request datasets, time series tables, and shapefiles in the same
# definition:
define_extract_nhgis(
  description = "Extract with datasets and time series tables",
  datasets = ds_spec("1990_STF1", c("NP1", "NP2"), "county"),
  time_series_tables = tst_spec("CL6", "state"),
  shapefiles = "us_county_1990_tl2008"
)
#> Unsubmitted IPUMS NHGIS extract 
#> Description: Extract with datasets and time series tables
#> 
#> Dataset: 1990_STF1
#>   Tables: NP1, NP2
#>   Geog Levels: county
#> 
#> Time Series Table: CL6
#>   Geog Levels: state
#> 
#> Shapefiles: us_county_1990_tl2008

# Geographic extents are applied to all datasets in the definition
define_extract_nhgis(
  description = "Extent selection",
  datasets = list(
    ds_spec("2018_2022_ACS5a", "B01001", "blck_grp"),
    ds_spec("2017_2021_ACS5a", "B01001", "blck_grp")
  ),
  geographic_extents = c("010", "050")
)
#> Unsubmitted IPUMS NHGIS extract 
#> Description: Extent selection
#> 
#> Dataset: 2018_2022_ACS5a
#>   Tables: B01001
#>   Geog Levels: blck_grp
#> 
#> Dataset: 2017_2021_ACS5a
#>   Tables: B01001
#>   Geog Levels: blck_grp
#> 
#> Geographic extents: 010, 050

# Extract specifications can be indexed by name
names(nhgis_extract$datasets)
#> [1] "1990_STF3"

nhgis_extract$datasets[["1990_STF3"]]
#> $name
#> [1] "1990_STF3"
#> 
#> $data_tables
#> [1] "NP57"
#> 
#> $geog_levels
#> [1] "county" "tract" 
#> 
#> attr(,"class")
#> [1] "ds_spec"    "ipums_spec" "list"      

if (FALSE) { # \dontrun{
# Use the extract definition to submit an extract request to the API
submit_extract(nhgis_extract)
} # }