IPUMS - CPS Extraction and Analysis

Exercise 1

OBJECTIVE: Gain an understanding of how the IPUMS dataset is structured and how it can be leveraged to explore your research interests. This exercise will use the IPUMS dataset to explore associations between health and work status and to create basic frequencies of food stamp usage.

This vignette is adapted from the CPS Data Training Exercise available here: https://pop.umn.edu/sites/pop.umn.edu/files/final_review_-_cps_spss_exercise_1_0.pdf

Research Questions

What is the frequency of food stamp recipiency in the US? Are health and work statuses related?

Objectives

  • Create and download an IPUMS data extract
  • Decompress data file and read data into R
  • Analyze the data using sample code
  • Validate data analysis work using answer key

IPUMS Variables

  • PERNUM: Person number in sample unit
  • FOODSTMP: Food stamp receipt
  • AGE: Age
  • EMPSTAT: Employment status
  • AHRSWORKT: Hours worked last week
  • HEALTH: Health status

Download Extract from IPUMS Website

  1. Register with IPUMS - Go to http://cps.ipums.org, click on CPS Registration and apply for access. On login screen, enter email address and password and submit it!

  2. Make an Extract

  • Go to http://cps.ipums.org, click on CPS Registration and Apply for access
  • On login screen, enter email address and password and submit
  • Go back to homepage and go to Select Data
  • Click the Select Samples box, check the box for the 2011 ASEC sample, click the Submit sample selections box
  • Using the drop down menu or search feature, select the following variables:
    • PERNUM: Person number in sample unit (under Person > Core > Technical in the drop down)
    • FOODSTMP: Food stamp receipt (under Household > Core > Economic Characteristics; note that we do not want “FOODSTAMP: Family market value of food stamps”)
    • AGE: Age (under Person > Core > Demographics)
    • EMPSTAT: Employment status (under Person > Core > Work)
    • AHRSWORKT: Hours worked last week (under Person > Core > Work)
    • HEALTH: Health status (under Person > Annual Social & Economic Supplement (ASEC) > Disability)
  1. Request the Data
  • Click the orange VIEW CART button under your data cart
  • Review variable selection. Click the orange Create Data Extract button
  • Review the ‘Extract Request Summary’ screen, describe your extract and click Submit Extract
  • You will get an email when the data is available to download
  • To get to the page to download the data, follow the link in the email, or follow the Download and Revise Extracts link on the homepage
  1. Download the Data
  • Go to http://cps.ipums.org and click on Download or Revise Extracts
  • Right-click on the data link next to extract you created
  • Choose “Save Target As…” (or “Save Link As…”)
  • Save into “Documents” (that should pop up as the default location)
  • Do the same thing for the DDI link next to the extract

Getting the data into R

You will need to change the filepaths noted below to the place where you have saved the extracts.

Note that the data_file argument is optional if you didn’t change the data file name and have it saved in your working directory; read_ipums_micro can use information from the DDI file to locate the corresponding data file.

Exercises

These exercises include example code written in the “tidyverse” style, meaning that they use the dplyr package. This package provides easy to use functions for data analysis, including mutate(), select(), arrange(), slice() and the pipe (%>%). There a numerous other ways you could solve these answers, including using the base R, the data.table package and others.

library(dplyr, warn.conflicts = FALSE)

Analyze the Sample – Part I Frequencies of FOODSTMP

  1. On the website, find the codes page for the FOODSTMP variable and write down the code value, and what category each code represents.
  1. What is the universe for FOODSTMP in 2011 (under the Universe tab on the website)?
  1. How many people received food stamps in 2011?
  1. What proportion of the population received food stamps in 2011?

Using household weights (HWTSUPP)

Suppose you were interested not in the number of people living in homes that received food stamps, but in the number of households that were food stamp participants. To get this statistic you would need to use the household weight.

In order to use household weight, you should be careful to select only one person from each household to represent that household’s characteristics. You will need to apply the household weight (HWTSUPP).

  1. How many households received food stamps in 2011?
  1. What proportion of households received food stamps in 2011?

Analyze the Sample – Part II Relationships in the Data

  1. What is the universe for EMPSTAT in 2011?
ipums_website(cps_ddi, "EMPSTAT")

#    A: Age 15+
  1. What are the possible responses and codes for the self-reported HEALTH variable?
  1. What percent of people with ‘poor’ self-reported health are at work?
  1. What percent of people with ‘very good’ self-reported health are at work?
  1. In the EMPSTAT universe, what percent of people:
  1. self-report ‘poor’ health and are at work?
  1. self-report ‘very good’ health and are at work?

Analyze the Sample – Part III Relationships in the Data

  1. What is the universe for AHRSWORK?
  1. What are the average hours of work for each self-reported health category?

Bonus

  1. Use the ipumsr package metadata functions (like ipums_var_label() and ipums_file_info()) and ggplot2 to make a graph of the relationship between HEALTH and percent employed (from Part III above).

  1. Are there any variables that might be confounding this relationship? How might you explore this relationship?