Reads a dataset downloaded from the IPUMS extract system. For IPUMS projects with microdata, it relies on a downloaded DDI codebook and a fixed-width file. Loads the data with value labels (using labelled format) and variable labels. See 'Details' for more information on how record types are handled by the ipumsr package.

read_ipums_micro(
  ddi,
  vars = NULL,
  n_max = Inf,
  data_file = NULL,
  verbose = TRUE,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  lower_vars = FALSE
)

read_ipums_micro_list(
  ddi,
  vars = NULL,
  n_max = Inf,
  data_file = NULL,
  verbose = TRUE,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  lower_vars = FALSE
)

Arguments

ddi

Either a filepath to a DDI xml file downloaded from the website, or a ipums_ddi object parsed by read_ipums_ddi

vars

Names of variables to load. Accepts a character vector of names, or dplyr_select_style conventions. For hierarchical data, the rectype id variable will be added even if it is not specified.

n_max

The maximum number of records to load.

data_file

Specify a directory to look for the data file. If left empty, it will look in the same directory as the DDI file.

verbose

Logical, indicating whether to print progress information to console.

var_attrs

Variable attributes to add from the DDI, defaults to adding all (val_labels, var_label and var_desc). See set_ipums_var_attributes for more details.

lower_vars

Only if reading a DDI from a file, a logical indicating whether to convert variable names to lowercase (default is FALSE, in line with IPUMS conventions). Note that this argument will be ignored if argument ddi is an ipums_ddi object rather than a file path. See read_ipums_ddi for converting variable names to lowercase when reading in the DDI.

Value

read_ipums_micro returns a single tbl_df data frame, and read_ipums_micro_list returns a list of data frames, named by the Record Type. See 'Details' for more information.

Details

Some IPUMS projects have data for multiple types of records (eg Household and Person). When downloading data from many of these projects you have the option for the IPUMS extract system to "rectangularize" the data, meaning that the data is transformed so that each row of data represents only one type of record.

There also is the option to download "hierarchical" extracts, which are a single file with record types mixed in the rows. The ipumsr package offers two methods for importing this data.

read_ipums_micro loads this data into a "long" format where the record types are mixed in the rows, but the variables are NA for the record types that they do not apply to.

read_ipums_micro_list loads the data into a list of data frames objects, where each data frame contains only one record type. The names of the data frames in the list are the text from the record type labels without 'Record' (often 'HOUSEHOLD' for Household and 'PERSON' for Person).

Examples

  # Rectangular example file
  cps_rect_ddi_file <- ipums_example("cps_00006.xml")

  cps <- read_ipums_micro(cps_rect_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
  # Or load DDI separately to keep the metadata
  ddi <- read_ipums_ddi(cps_rect_ddi_file)
  cps <- read_ipums_micro(ddi)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.

  # Hierarchical example file
  cps_hier_ddi_file <- ipums_example("cps_00010.xml")

  # Read in "long" format and you get 1 data frame
  cps_long <- read_ipums_micro(cps_hier_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
  head(cps_long)
#> # A tibble: 6 × 9
#>   RECTYPE      YEAR SERIAL HWTSUPP STATEFIP    MONTH PERNUM WTSUPP        INCTOT
#>   <chr+lbl>   <dbl>  <dbl>   <dbl> <int+lb> <int+lb>  <dbl>  <dbl>     <dbl+lbl>
#> 1 H [Househo…  1962     80   1476. 55 [Wis…  3 [Mar…     NA    NA  NA           
#> 2 P [Person …  1962     80     NA  NA       NA            1  1476.  4.88e3      
#> 3 P [Person …  1962     80     NA  NA       NA            2  1471.  5.8 e3      
#> 4 P [Person …  1962     80     NA  NA       NA            3  1579.  1.00e8 [Mis…
#> 5 H [Househo…  1962     82   1598. 27 [Min…  3 [Mar…     NA    NA  NA           
#> 6 P [Person …  1962     82     NA  NA       NA            1  1598.  1.40e4      

  # Read in "list" format and you get a list of multiple data frames
  cps_list <- read_ipums_micro_list(cps_hier_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
  head(cps_list$PERSON)
#> # A tibble: 6 × 6
#>   RECTYPE            YEAR SERIAL PERNUM WTSUPP              INCTOT
#>   <chr+lbl>         <dbl>  <dbl>  <dbl>  <dbl>           <dbl+lbl>
#> 1 P [Person Record]  1962     80      1  1476.     4883           
#> 2 P [Person Record]  1962     80      2  1471.     5800           
#> 3 P [Person Record]  1962     80      3  1579. 99999998 [Missing.]
#> 4 P [Person Record]  1962     82      1  1598.    14015           
#> 5 P [Person Record]  1962     83      1  1707.    16552           
#> 6 P [Person Record]  1962     84      1  1790.     6375           
  head(cps_list$HOUSEHOLD)
#> # A tibble: 6 × 6
#>   RECTYPE               YEAR SERIAL HWTSUPP       STATEFIP     MONTH
#>   <chr+lbl>            <dbl>  <dbl>   <dbl>      <int+lbl> <int+lbl>
#> 1 H [Household Record]  1962     80   1476. 55 [Wisconsin] 3 [March]
#> 2 H [Household Record]  1962     82   1598. 27 [Minnesota] 3 [March]
#> 3 H [Household Record]  1962     83   1707. 27 [Minnesota] 3 [March]
#> 4 H [Household Record]  1962     84   1790. 27 [Minnesota] 3 [March]
#> 5 H [Household Record]  1962    107   4355. 19 [Iowa]      3 [March]
#> 6 H [Household Record]  1962    108   1479. 19 [Iowa]      3 [March]

  # Or you can use the \code{%<-%} operator from zeallot to unpack
  c(household, person) %<-% read_ipums_micro_list(cps_hier_ddi_file)
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
  head(person)
#> # A tibble: 6 × 6
#>   RECTYPE            YEAR SERIAL PERNUM WTSUPP              INCTOT
#>   <chr+lbl>         <dbl>  <dbl>  <dbl>  <dbl>           <dbl+lbl>
#> 1 P [Person Record]  1962     80      1  1476.     4883           
#> 2 P [Person Record]  1962     80      2  1471.     5800           
#> 3 P [Person Record]  1962     80      3  1579. 99999998 [Missing.]
#> 4 P [Person Record]  1962     82      1  1598.    14015           
#> 5 P [Person Record]  1962     83      1  1707.    16552           
#> 6 P [Person Record]  1962     84      1  1790.     6375           
  head(household)
#> # A tibble: 6 × 6
#>   RECTYPE               YEAR SERIAL HWTSUPP       STATEFIP     MONTH
#>   <chr+lbl>            <dbl>  <dbl>   <dbl>      <int+lbl> <int+lbl>
#> 1 H [Household Record]  1962     80   1476. 55 [Wisconsin] 3 [March]
#> 2 H [Household Record]  1962     82   1598. 27 [Minnesota] 3 [March]
#> 3 H [Household Record]  1962     83   1707. 27 [Minnesota] 3 [March]
#> 4 H [Household Record]  1962     84   1790. 27 [Minnesota] 3 [March]
#> 5 H [Household Record]  1962    107   4355. 19 [Iowa]      3 [March]
#> 6 H [Household Record]  1962    108   1479. 19 [Iowa]      3 [March]