
Define an extract request for an IPUMS microdata collection
Source:R/api_define_extract.R
      define_extract_micro.RdDefine the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.
The IPUMS API currently supports the following microdata collections:
Note that not all extract request parameters and options apply to all collections. For a summary of supported features by collection, see the IPUMS API documentation.
Learn more about the IPUMS API in vignette("ipums-api") and
microdata extract definitions in vignette("ipums-api-micro").
Usage
define_extract_micro(
  collection,
  description,
  samples,
  variables = NULL,
  time_use_variables = NULL,
  sample_members = NULL,
  data_format = "fixed_width",
  data_structure = "rectangular",
  rectangular_on = NULL,
  case_select_who = "individuals",
  data_quality_flags = NULL
)Arguments
- collection
- Code for the IPUMS collection represented by this extract request. See - ipums_data_collections()for supported microdata collection codes.
- description
- Description of the extract. 
- samples
- Vector of samples to include in the extract request. Use - get_sample_info()to identify sample IDs for a given collection.
- variables
- Vector of variable names or a list of detailed variable specifications to include in the extract request. Use - var_spec()to create a- var_specobject containing a detailed variable specification. See examples.
- time_use_variables
- Vector of names of IPUMS-defined time use variables or a list of specifications for user-defined time use variables to include in the extract request. Use - tu_var_spec()to create a- tu_var_specobject containing a time use variable specification. See examples.- Time use variables are only available for IPUMS Time Use collections ( - "atus",- "ahtus", and- "mtus").
- sample_members
- Indication of whether to include additional sample members in the extract request. If provided, must be one of - "include_non_respondents",- "include_household_members", or both.- Sample member selection is only available for the IPUMS ATUS collection ( - "atus").
- data_format
- Format for the output extract data file. Either - "fixed_width"or- "csv".- Note that while - "stata",- "spss", and- "sas9"are also accepted, these file formats are not supported by ipumsr data-reading functions.- Defaults to - "fixed_width".
- data_structure
- Data structure for the output extract data. - "rectangular"provides data in which every row has the same record type (determined by- "rectangular_on"), with variables from other record types written onto associated records of the chosen type (e.g. household variables written onto person records).
- "hierarchical"provides data that include rows of differing record types, with records ordered according to their hierarchical structure (e.g. each person record is followed by the activity records for that person).
- "household_only"provides household records only. This data structure is only available for the IPUMS USA collection (- "usa").
 - Defaults to - "rectangular".
- rectangular_on
- If - data_structureis- "rectangular", records on which to rectangularize. One of- "P"(person),- "A"(activity),- "I"(injury) or- "R"(round).- Defaults to - "P"if- data_structureis- "rectangular"and- NULLotherwise.
- case_select_who
- Indication of how to interpret any case selections included for variables in the extract definition. - "individuals"includes records for all individuals who match the specified case selections.
- "households"includes records for all members of each household that contains an individual who matches the specified case selections.
 - Defaults to - "individuals". Use- var_spec()to add case selections for specific variables.
- data_quality_flags
- Set to - TRUEto include data quality flags for all applicable variables in the extract definition. This will override the- data_quality_flagsspecification for individual variables in the definition.- Use - var_spec()to add data quality flags for specific variables.
Value
An object of class micro_extract containing
the extract definition.
See also
submit_extract() to submit an extract request for processing.
save_extract_as_json() and define_extract_from_json() to share an
extract definition.
Examples
usa_extract <- define_extract_micro(
  collection = "usa",
  description = "2013-2014 ACS Data",
  samples = c("us2013a", "us2014a"),
  variables = c("SEX", "AGE", "YEAR")
)
usa_extract
#> Unsubmitted IPUMS USA extract 
#> Description: 2013-2014 ACS Data
#> 
#> Samples: (2 total) us2013a, us2014a
#> Variables: (3 total) SEX, AGE, YEAR
# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_micro(
  collection = "usa",
  description = "Example USA extract definition",
  samples = c("us2013a", "us2014a"),
  variables = var_spec(
    "SEX",
    case_selections = "2",
    attached_characteristics = c("mother", "father")
  )
)
# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_micro(
  collection = "cps",
  description = "Example CPS extract definition",
  samples = c("cps2020_02s", "cps2020_03s"),
  variables = list(
    var_spec("AGE", data_quality_flags = TRUE),
    var_spec("SEX", case_selections = "2"),
    "RACE"
  )
)
cps_extract
#> Unsubmitted IPUMS CPS extract 
#> Description: Example CPS extract definition
#> 
#> Samples: (2 total) cps2020_02s, cps2020_03s
#> Variables: (3 total) AGE, SEX, RACE
# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")
my_vars <- purrr::map(
  var_names,
  ~ var_spec(.x, attached_characteristics = "mother")
)
ipumsi_extract <- define_extract_micro(
  collection = "ipumsi",
  description = "Extract definition with predefined variables",
  samples = c("br2010a", "cl2017a"),
  variables = my_vars
)
# Extract specifications can be indexed by name
names(ipumsi_extract$samples)
#> [1] "br2010a" "cl2017a"
names(ipumsi_extract$variables)
#> [1] "AGE" "SEX"
ipumsi_extract$variables$AGE
#> $name
#> [1] "AGE"
#> 
#> $attached_characteristics
#> [1] "mother"
#> 
#> attr(,"class")
#> [1] "var_spec"   "ipums_spec" "list"      
# IPUMS Time Use collections allow selection of IPUMS-defined and
# user-defined time use variables:
define_extract_micro(
  collection = "atus",
  description = "ATUS extract with time use variables",
  samples = "at2007",
  time_use_variables = list(
    "ACT_PCARE",
    tu_var_spec(
      "MYTIMEUSEVAR",
      owner = "example@example.com"
    )
  )
)
#> Unsubmitted IPUMS ATUS extract 
#> Description: ATUS extract with time use variables
#> 
#> Samples: (1 total) at2007
#> Time Use Variables: (2 total) ACT_PCARE, MYTIMEUSEVAR
if (FALSE) { # \dontrun{
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)
} # }