Reads a dataset downloaded from the IPUMS extract system, but does so by returning an object that can read a group of lines at a time. This is a more flexible way to read data in chunks than the functions like read_ipums_micro_chunked, allowing you to do things like reading parts of multiple files at the same time and resetting from the beginning more easily than with the chunked functions. Note that while other read_ipums_micro* functions can read from .csv(.gz) or .dat(.gz) files, these functions can only read from .dat(.gz) files.

read_ipums_micro_yield(
  ddi,
  vars = NULL,
  data_file = NULL,
  verbose = TRUE,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  lower_vars = FALSE
)

read_ipums_micro_list_yield(
  ddi,
  vars = NULL,
  data_file = NULL,
  verbose = TRUE,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  lower_vars = FALSE
)

Arguments

ddi

Either a filepath to a DDI xml file downloaded from the website, or a ipums_ddi object parsed by read_ipums_ddi

vars

Names of variables to load. Accepts a character vector of names, or dplyr_select_style conventions. For hierarchical data, the rectype id variable will be added even if it is not specified.

data_file

Specify a directory to look for the data file. If left empty, it will look in the same directory as the DDI file.

verbose

Logical, indicating whether to print progress information to console.

var_attrs

Variable attributes to add from the DDI, defaults to adding all (val_labels, var_label and var_desc). See set_ipums_var_attributes for more details.

lower_vars

Only if reading a DDI from a file, a logical indicating whether to convert variable names to lowercase (default is FALSE, in line with IPUMS conventions). Note that this argument will be ignored if argument ddi is an ipums_ddi object rather than a file path. See read_ipums_ddi for converting variable names to lowercase when reading in the DDI.

Value

A HipYield R6 object (See 'Details' for more information)

Details

These functions return an IpumsYield R6 object which have the following methods:

  • yield(n = 10000) A function to read the next 'yield' from the data, returns a `tbl_df` (or list of `tbl_df` for `hipread_list_yield()`) with up to n rows (it will return NULL if no rows are left, or all available ones if less than n are available).

  • reset() A function to reset the data so that the next yield will read data from the start.

  • is_done() A function that returns whether the file has been completely read yet or not.

  • cur_pos A property that contains the next row number that will be read (1-indexed).

Super classes

hipread::HipYield -> hipread::HipLongYield -> IpumsLongYield

Methods

Inherited methods


Method new()

Usage

IpumsLongYield$new(
  ddi,
  vars = NULL,
  data_file = NULL,
  verbose = TRUE,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  lower_vars = FALSE
)


Method yield()

Usage

IpumsLongYield$yield(n = 10000)

Super classes

hipread::HipYield -> hipread::HipListYield -> IpumsListYield

Methods

Inherited methods


Method new()

Usage

IpumsListYield$new(
  ddi,
  vars = NULL,
  data_file = NULL,
  verbose = TRUE,
  var_attrs = c("val_labels", "var_label", "var_desc"),
  lower_vars = FALSE
)


Method yield()

Usage

IpumsListYield$yield(n = 10000)

Examples

# An example using "long" data
long_yield <- read_ipums_micro_yield(ipums_example("cps_00006.xml"))
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
# Get first 10 rows
long_yield$yield(10)
#> # A tibble: 10 × 8
#>     YEAR SERIAL HWTSUPP       STATEFIP     MONTH PERNUM WTSUPP            INCTOT
#>    <dbl>  <dbl>   <dbl>      <int+lbl> <int+lbl>  <dbl>  <dbl>         <dbl+lbl>
#>  1  1962     80   1476. 55 [Wisconsin] 3 [March]      1  1476.     4883         
#>  2  1962     80   1476. 55 [Wisconsin] 3 [March]      2  1471.     5800         
#>  3  1962     80   1476. 55 [Wisconsin] 3 [March]      3  1579. 99999998 [Missin…
#>  4  1962     82   1598. 27 [Minnesota] 3 [March]      1  1598.    14015         
#>  5  1962     83   1707. 27 [Minnesota] 3 [March]      1  1707.    16552         
#>  6  1962     84   1790. 27 [Minnesota] 3 [March]      1  1790.     6375         
#>  7  1962    107   4355. 19 [Iowa]      3 [March]      1  4355. 99999999 [N.I.U.…
#>  8  1962    107   4355. 19 [Iowa]      3 [March]      2  1386.        0         
#>  9  1962    107   4355. 19 [Iowa]      3 [March]      3  1629.      600         
#> 10  1962    107   4355. 19 [Iowa]      3 [March]      4  1432. 99999999 [N.I.U.…
# Get 20 more rows now
long_yield$yield(20)
#> # A tibble: 20 × 8
#>     YEAR SERIAL HWTSUPP       STATEFIP     MONTH PERNUM WTSUPP            INCTOT
#>    <dbl>  <dbl>   <dbl>      <int+lbl> <int+lbl>  <dbl>  <dbl>         <dbl+lbl>
#>  1  1962    108   1479. 19 [Iowa]      3 [March]      1  1479.    12300         
#>  2  1962    108   1479. 19 [Iowa]      3 [March]      2  1482.        0         
#>  3  1962    122   3603. 27 [Minnesota] 3 [March]      1  3603.    15550         
#>  4  1962    122   3603. 27 [Minnesota] 3 [March]      2  3603.        0         
#>  5  1962    122   3603. 27 [Minnesota] 3 [March]      3  4243.     3443         
#>  6  1962    122   3603. 27 [Minnesota] 3 [March]      4  3920.      255         
#>  7  1962    122   3603. 27 [Minnesota] 3 [March]      5  3689.      135         
#>  8  1962    124   4104. 55 [Wisconsin] 3 [March]      1  4104.    15000         
#>  9  1962    124   4104. 55 [Wisconsin] 3 [March]      2  1487.     3550         
#> 10  1962    124   4104. 55 [Wisconsin] 3 [March]      3  1450.      692         
#> 11  1962    124   4104. 55 [Wisconsin] 3 [March]      4  1441.        0         
#> 12  1962    125   2182. 55 [Wisconsin] 3 [March]      1  2182.     4470         
#> 13  1962    126   1826. 55 [Wisconsin] 3 [March]      1  1826. 99999999 [N.I.U.…
#> 14  1962    126   1826. 55 [Wisconsin] 3 [March]      2  1629.        0         
#> 15  1962    761   1751. 19 [Iowa]      3 [March]      1  1751.     7300         
#> 16  1962    761   1751. 19 [Iowa]      3 [March]      2  1751.     3700         
#> 17  1962    762   1874. 19 [Iowa]      3 [March]      1  1874.     2534         
#> 18  1962    762   1874. 19 [Iowa]      3 [March]      2  1874.        0         
#> 19  1962    763   1874. 19 [Iowa]      3 [March]      1  1874.     1591         
#> 20  1962    764   1724. 19 [Iowa]      3 [March]      1  1724.     8002         
# See what row we're on now
long_yield$cur_pos
#> [1] 31
# Reset to beginning
long_yield$reset()
# Read the whole thing in chunks and count Minnesotans
total_mn <- 0
while (!long_yield$is_done()) {
  cur_data <- long_yield$yield(1000)
  total_mn <- total_mn + sum(as_factor(cur_data$STATEFIP) == "Minnesota")
}
total_mn
#> [1] 2362

# Can also read hierarchical data as list:
list_yield <- read_ipums_micro_list_yield(ipums_example("cps_00006.xml"))
#> Use of data from IPUMS-CPS is subject to conditions including that users should
#> cite the data appropriately. Use command `ipums_conditions()` for more details.
#> Assuming data rectangularized to 'P' record type
list_yield$yield(10)
#> $P
#> # A tibble: 10 × 8
#>     YEAR SERIAL HWTSUPP       STATEFIP     MONTH PERNUM WTSUPP            INCTOT
#>    <dbl>  <dbl>   <dbl>      <int+lbl> <int+lbl>  <dbl>  <dbl>         <dbl+lbl>
#>  1  1962     80   1476. 55 [Wisconsin] 3 [March]      1  1476.     4883         
#>  2  1962     80   1476. 55 [Wisconsin] 3 [March]      2  1471.     5800         
#>  3  1962     80   1476. 55 [Wisconsin] 3 [March]      3  1579. 99999998 [Missin…
#>  4  1962     82   1598. 27 [Minnesota] 3 [March]      1  1598.    14015         
#>  5  1962     83   1707. 27 [Minnesota] 3 [March]      1  1707.    16552         
#>  6  1962     84   1790. 27 [Minnesota] 3 [March]      1  1790.     6375         
#>  7  1962    107   4355. 19 [Iowa]      3 [March]      1  4355. 99999999 [N.I.U.…
#>  8  1962    107   4355. 19 [Iowa]      3 [March]      2  1386.        0         
#>  9  1962    107   4355. 19 [Iowa]      3 [March]      3  1629.      600         
#> 10  1962    107   4355. 19 [Iowa]      3 [March]      4  1432. 99999999 [N.I.U.…
#>