
Define an extract request for an IPUMS microdata collection
Source:R/api_define_extract.R
define_extract-micro.Rd
Define the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.
Currently supported microdata collections include:
IPUMS USA:
define_extract_usa()
IPUMS CPS:
define_extract_cps()
IPUMS International:
define_extract_ipumsi()
Learn more about the IPUMS API in vignette("ipums-api")
and
microdata extract definitions in vignette("ipums-api-micro")
.
Usage
define_extract_usa(
description,
samples,
variables,
data_format = "fixed_width",
data_structure = "rectangular",
rectangular_on = NULL,
case_select_who = "individuals",
data_quality_flags = NULL
)
define_extract_cps(
description,
samples,
variables,
data_format = "fixed_width",
data_structure = "rectangular",
rectangular_on = NULL,
case_select_who = "individuals",
data_quality_flags = NULL
)
define_extract_ipumsi(
description,
samples,
variables,
data_format = "fixed_width",
data_structure = "rectangular",
rectangular_on = NULL,
case_select_who = "individuals",
data_quality_flags = NULL
)
Arguments
- description
Description of the extract.
- samples
Vector of samples to include in the extract request. Use
get_sample_info()
to identify sample IDs for a given collection.- variables
Vector of variable names or a list of detailed variable specifications to include in the extract request. Use
var_spec()
to create avar_spec
object containing a detailed variable specification. See examples.- data_format
Format for the output extract data file. Either
"fixed_width"
or"csv"
.Note that while
"stata"
,"spss"
, or"sas9"
are also accepted, these file formats are not supported by ipumsr data-reading functions.Defaults to
"fixed_width"
.- data_structure
Data structure for the output extract data.
"rectangular"
provides person records with all requested household information attached to respective household members."hierarchical"
provides household records followed by person records.
Defaults to
"rectangular"
.- rectangular_on
If
data_structure
is"rectangular"
, records on which to rectangularize. Currently only"P"
(person records) is supported.Defaults to
"P"
ifdata_structure
is"rectangular"
andNULL
otherwise.- case_select_who
Indication of how to interpret any case selections included for variables in the extract definition.
"individuals"
includes records for all individuals who match the specified case selections."households"
includes records for all members of each household that contains an individual who matches the specified case selections.
Defaults to
"individuals"
. Usevar_spec()
to add case selections for specific variables.- data_quality_flags
Set to
TRUE
to include data quality flags for all applicable variables in the extract definition. This will override thedata_quality_flags
specification for individual variables in the definition.Use
var_spec()
to add data quality flags for specific variables.
Value
An object of class micro_extract
containing
the extract definition.
See also
submit_extract()
to submit an extract request for processing.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
Examples
usa_extract <- define_extract_usa(
description = "2013-2014 ACS Data",
samples = c("us2013a", "us2014a"),
variables = c("SEX", "AGE", "YEAR")
)
usa_extract
#> Unsubmitted IPUMS USA extract
#> Description: 2013-2014 ACS Data
#>
#> Samples: (2 total) us2013a, us2014a
#> Variables: (3 total) SEX, AGE, YEAR
# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_usa(
description = "Example USA extract definition",
samples = c("us2013a", "us2014a"),
variables = var_spec(
"SEX",
case_selections = "2",
attached_characteristics = c("mother", "father")
)
)
# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_cps(
description = "Example CPS extract definition",
samples = c("cps2020_02s", "cps2020_03s"),
variables = list(
var_spec("AGE", data_quality_flags = TRUE),
var_spec("SEX", case_selections = "2"),
"RACE"
)
)
cps_extract
#> Unsubmitted IPUMS CPS extract
#> Description: Example CPS extract definition
#>
#> Samples: (2 total) cps2020_02s, cps2020_03s
#> Variables: (3 total) AGE, SEX, RACE
# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")
my_vars <- purrr::map(
var_names,
~ var_spec(.x, attached_characteristics = "mother")
)
ipumsi_extract <- define_extract_ipumsi(
description = "Extract definition with predefined variables",
samples = c("br2010a", "cl2017a"),
variables = my_vars
)
# Extract specifications can be indexed by name
names(ipumsi_extract$samples)
#> [1] "br2010a" "cl2017a"
names(ipumsi_extract$variables)
#> [1] "AGE" "SEX"
ipumsi_extract$variables$AGE
#> $name
#> [1] "AGE"
#>
#> $attached_characteristics
#> [1] "mother"
#>
#> attr(,"class")
#> [1] "var_spec" "ipums_spec" "list"
if (FALSE) {
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)
}