Define an extract request for an IPUMS microdata collection
Source:R/api_define_extract.R
define_extract_micro.Rd
Define the parameters of an IPUMS microdata extract request to be submitted via the IPUMS API.
The IPUMS API currently supports the following microdata collections:
IPUMS USA
IPUMS CPS
IPUMS International
IPUMS Time Use (ATUS, AHTUS, MTUS)
IPUMS Health Surveys (NHIS, MEPS)
Note that not all extract request parameters and options apply to all collections. For a summary of supported features by collection, see the IPUMS API documentation.
Learn more about the IPUMS API in vignette("ipums-api")
and
microdata extract definitions in vignette("ipums-api-micro")
.
Usage
define_extract_micro(
collection,
description,
samples,
variables = NULL,
time_use_variables = NULL,
sample_members = NULL,
data_format = "fixed_width",
data_structure = "rectangular",
rectangular_on = NULL,
case_select_who = "individuals",
data_quality_flags = NULL
)
Arguments
- collection
Code for the IPUMS collection represented by this extract request. See
ipums_data_collections()
for supported microdata collection codes.- description
Description of the extract.
- samples
Vector of samples to include in the extract request. Use
get_sample_info()
to identify sample IDs for a given collection.- variables
Vector of variable names or a list of detailed variable specifications to include in the extract request. Use
var_spec()
to create avar_spec
object containing a detailed variable specification. See examples.- time_use_variables
Vector of names of IPUMS-defined time use variables or a list of specifications for user-defined time use variables to include in the extract request. Use
tu_var_spec()
to create atu_var_spec
object containing a time use variable specification. See examples.Time use variables are only available for IPUMS Time Use collections (
"atus"
,"ahtus"
, and"mtus"
).- sample_members
Indication of whether to include additional sample members in the extract request. If provided, must be one of
"include_non_respondents"
,"include_household_members"
, or both.Sample member selection is only available for the IPUMS ATUS collection (
"atus"
).- data_format
Format for the output extract data file. Either
"fixed_width"
or"csv"
.Note that while
"stata"
,"spss"
, and"sas9"
are also accepted, these file formats are not supported by ipumsr data-reading functions.Defaults to
"fixed_width"
.- data_structure
Data structure for the output extract data.
"rectangular"
provides data in which every row has the same record type (determined by"rectangular_on"
), with variables from other record types written onto associated records of the chosen type (e.g. household variables written onto person records)."hierarchical"
provides data that include rows of differing record types, with records ordered according to their hierarchical structure (e.g. each person record is followed by the activity records for that person)."household_only"
provides household records only. This data structure is only available for the IPUMS USA collection ("usa"
).
Defaults to
"rectangular"
.- rectangular_on
If
data_structure
is"rectangular"
, records on which to rectangularize. One of"P"
(person),"A"
(activity),"I"
(injury) or"R"
(round).Defaults to
"P"
ifdata_structure
is"rectangular"
andNULL
otherwise.- case_select_who
Indication of how to interpret any case selections included for variables in the extract definition.
"individuals"
includes records for all individuals who match the specified case selections."households"
includes records for all members of each household that contains an individual who matches the specified case selections.
Defaults to
"individuals"
. Usevar_spec()
to add case selections for specific variables.- data_quality_flags
Set to
TRUE
to include data quality flags for all applicable variables in the extract definition. This will override thedata_quality_flags
specification for individual variables in the definition.Use
var_spec()
to add data quality flags for specific variables.
Value
An object of class micro_extract
containing
the extract definition.
See also
submit_extract()
to submit an extract request for processing.
save_extract_as_json()
and define_extract_from_json()
to share an
extract definition.
Examples
usa_extract <- define_extract_micro(
collection = "usa",
description = "2013-2014 ACS Data",
samples = c("us2013a", "us2014a"),
variables = c("SEX", "AGE", "YEAR")
)
usa_extract
#> Unsubmitted IPUMS USA extract
#> Description: 2013-2014 ACS Data
#>
#> Samples: (2 total) us2013a, us2014a
#> Variables: (3 total) SEX, AGE, YEAR
# Use `var_spec()` to created detailed variable specifications:
usa_extract <- define_extract_micro(
collection = "usa",
description = "Example USA extract definition",
samples = c("us2013a", "us2014a"),
variables = var_spec(
"SEX",
case_selections = "2",
attached_characteristics = c("mother", "father")
)
)
# For multiple variables, provide a list of `var_spec` objects and/or
# variable names.
cps_extract <- define_extract_micro(
collection = "cps",
description = "Example CPS extract definition",
samples = c("cps2020_02s", "cps2020_03s"),
variables = list(
var_spec("AGE", data_quality_flags = TRUE),
var_spec("SEX", case_selections = "2"),
"RACE"
)
)
cps_extract
#> Unsubmitted IPUMS CPS extract
#> Description: Example CPS extract definition
#>
#> Samples: (2 total) cps2020_02s, cps2020_03s
#> Variables: (3 total) AGE, SEX, RACE
# To recycle specifications to many variables, it may be useful to
# create variables prior to defining the extract:
var_names <- c("AGE", "SEX")
my_vars <- purrr::map(
var_names,
~ var_spec(.x, attached_characteristics = "mother")
)
ipumsi_extract <- define_extract_micro(
collection = "ipumsi",
description = "Extract definition with predefined variables",
samples = c("br2010a", "cl2017a"),
variables = my_vars
)
# Extract specifications can be indexed by name
names(ipumsi_extract$samples)
#> [1] "br2010a" "cl2017a"
names(ipumsi_extract$variables)
#> [1] "AGE" "SEX"
ipumsi_extract$variables$AGE
#> $name
#> [1] "AGE"
#>
#> $attached_characteristics
#> [1] "mother"
#>
#> attr(,"class")
#> [1] "var_spec" "ipums_spec" "list"
# IPUMS Time Use collections allow selection of IPUMS-defined and
# user-defined time use variables:
define_extract_micro(
collection = "atus",
description = "ATUS extract with time use variables",
samples = "at2007",
time_use_variables = list(
"ACT_PCARE",
tu_var_spec(
"MYTIMEUSEVAR",
owner = "example@example.com"
)
)
)
#> Unsubmitted IPUMS ATUS extract
#> Description: ATUS extract with time use variables
#>
#> Samples: (1 total) at2007
#> Time Use Variables: (2 total) ACT_PCARE, MYTIMEUSEVAR
if (FALSE) { # \dontrun{
# Use the extract definition to submit an extract request to the API
submit_extract(usa_extract)
} # }