Skip to contents

Define the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.

Currently supported microdata collections include:

  • IPUMS USA: define_extract_usa()

  • IPUMS CPS: define_extract_cps()

  • IPUMS International: define_extract_ipumsi()

Learn more about the IPUMS API in vignette("ipums-api") and microdata extract definitions in vignette("ipums-api-micro").

Usage

define_extract_usa(
  description,
  samples,
  variables,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

define_extract_cps(
  description,
  samples,
  variables,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

define_extract_ipumsi(
  description,
  samples,
  variables,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)

Arguments

description

Description of the extract.

samples

Vector of samples to include in the extract request. Use get_sample_info() to identify sample IDs for a given collection.

variables

Vector of variable names or a list of detailed variable specifications to include in the extract request. Use var_spec() to create a var_spec object containing a detailed variable specification. See examples.

data_format

Format for the output extract data file. Either "fixed_width" or "csv".

Note that while "stata", "spss", or "sas9" are also accepted, these file formats are not supported by ipumsr data-reading functions.

Defaults to "fixed_width".

data_structure

Data structure for the output extract data.

  • "rectangular" provides person records with all requested household information attached to respective household members.

  • "hierarchical" provides household records followed by person records.

Defaults to "rectangular".

rectangular_on

If data_structure is "rectangular", records on which to rectangularize. Currently only "P" (person records) is supported.

Defaults to "P" if data_structure is "rectangular" and NULL otherwise.

case_select_who

Indication of how to interpret any case selections included for variables in the extract definition.

  • "individuals" includes records for all individuals who match the specified case selections.

  • "households" includes records for all members of each household that contains an individual who matches the specified case selections.

Defaults to "individuals". Use var_spec() to add case selections for specific variables.

data_quality_flags

Set to TRUE to include data quality flags for all applicable variables in the extract definition. This will override the data_quality_flags specification for individual variables in the definition.

Use var_spec() to add data quality flags for specific variables.

Value

An object of class micro_extract containing the extract definition.

See also

submit_extract() to submit an extract request for processing.

save_extract_as_json() and define_extract_from_json() to share an extract definition.

Examples

usa_extract <- define_extract_usa(
  description = "2013-2014 ACS Data",
  samples = c("us2013a", "us2014a"),
  variables = c("SEX", "AGE", "YEAR")
)

usa_extract
#> Unsubmitted IPUMS USA extract 
#> Description: 2013-2014 ACS Data
#> 
#> Samples: (2 total) us2013a, us2014a
#> Variables: (3 total) SEX, AGE, YEAR

# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_usa(
  description = "Example USA extract definition",
  samples = c("us2013a", "us2014a"),
  variables = var_spec(
    "SEX",
    case_selections = "2",
    attached_characteristics = c("mother", "father")
  )
)

# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_cps(
  description = "Example CPS extract definition",
  samples = c("cps2020_02s", "cps2020_03s"),
  variables = list(
    var_spec("AGE", data_quality_flags = TRUE),
    var_spec("SEX", case_selections = "2"),
    "RACE"
  )
)

cps_extract
#> Unsubmitted IPUMS CPS extract 
#> Description: Example CPS extract definition
#> 
#> Samples: (2 total) cps2020_02s, cps2020_03s
#> Variables: (3 total) AGE, SEX, RACE

# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")

my_vars <- purrr::map(
  var_names,
  ~ var_spec(.x, attached_characteristics = "mother")
)

ipumsi_extract <- define_extract_ipumsi(
  description = "Extract definition with predefined variables",
  samples = c("br2010a", "cl2017a"),
  variables = my_vars
)

# Extract specifications can be indexed by name
names(ipumsi_extract$samples)
#> [1] "br2010a" "cl2017a"

names(ipumsi_extract$variables)
#> [1] "AGE" "SEX"

ipumsi_extract$variables$AGE
#> $name
#> [1] "AGE"
#> 
#> $attached_characteristics
#> [1] "mother"
#> 
#> attr(,"class")
#> [1] "var_spec"   "ipums_spec" "list"      

if (FALSE) {
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)
}