Getting Started with R

R Tips R Packages

How to download R for free and install some of the R packages used on this blog

Matt Gunther (IPUMS PMA Senior Data Analyst)
11-2-2020

Why analyze PMA data with R?

Like all IPUMS data projects, IPUMS PMA data is available free of charge to users who agree to our terms of use. That’s because we believe that cost and institutional affiliation should not be barriers to answering pressing concerns around women’s health. (You can register here for a free IPUMS PMA user account.)

In fact, users can analyze IPUMS PMA data with any software they like! We’ve chosen to highlight R, in particular, because it is also free and popular with data analysts throughout the world. It’s available for Windows, MacOS, and a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux).

Getting started with R

To get a copy of R for yourself, visit the Comprehensive R Archive Network (CRAN) and choose the right download link for your operating system.

If you’re new to R (or want to refresh your skills), we recommend the excellent, free introductory text R for Data Science. It also introduces tidyverse conventions, which we use throughout this blog.

Our favorite resources

Do I really need statistical software?

If you’re new to data analysis, you might wonder exactly what you’re going to find in a toolkit like R.

Plenty of people come to R after working with more common types of data analysis software, like Microsoft Excel or other spreadsheet programs. If you wanted to, you could absolutely download a CSV file from the IPUMS PMA extract system and open it in Excel. You would find individual respondents in rows and their responses for variables in columns, and you could make use of built-in spreadsheet functions to do things like:

However, you might also notice that a spreadsheet comes with certain limitations:

Statistical software is designed specifically to address these and other issues related to data cleaning and analysis. Learning a program like R takes a lot of practice, but doing so will almost certainly make your work much more efficient!

Are there alternatives?

Yes! Many data analysts use proprietary statistical software like Stata, SAS, or SPSS. These tools are also powerful, and you may even find them easier to use than R.

Beyond price, R has a few additional advantages that make it a particularly useful tool for working with PMA data:

If you’re a beginner, learning R can be a daunting task. Keep at it! And never hesitate to ask questions.

RStudio

We strongly recommend running R within RStudio, an integrated development environment (IDE) designed to make your experience with R much easier. Some of the reasons we use it, ourselves:

R packages

An R package is a collection of functions created by other R users that you can download and install for yourself. Packages can be distributed in many ways, but all of the packages we highlight on this blog can be downloaded from CRAN (the same resource used to download “base” R). A package like ipumsr can be downloaded from CRAN by typing the following function into the R console:

install.packages("ipumsr")

Packages also come with help files detailing the purpose and possible inputs (or “arguments”) of each included function. Other included metadata explains what version of R you’ll need to use the package, and also whether the package borrows functions from any other packages that should also be installed (usually these are called “dependencies”).

In order to access the functions and help files for a package, you need to load it after installation with:

On this blog, we will often show functions together with their package like this:

ipumsr::read_ipums_micro(
  ddi = "~/Downloads/pma_00001.xml",
  data_file = "~/Downloads/pma_00001.dat.gz"
)

The function read_ipums_micro comes from the package ipumsr. It is not necessary for you to include the package each time you call a function (as long as you’ve already loaded the package with library()); we’re using this notation simply as a reminder (in case you want to consult the original package documentation).

Essentials

Here are the packages you’ll need to install to reproduce the code on this blog:

ipumsr

The ipumsr package contains functions that make it easy to load IPUMS PMS data into R (mainly read_ipums_micro).

It also contains functions that will return variable metadata (like the variable descriptions you see while browsing for data on pma.ipums.org.

tidyverse

The tidyverse package actually installs a family of related packages, including:

This blog uses tidyverse functions and syntax wherever possible because so-called “tidy” conventions are designed with the expressed purpose of making code and console output more human readable. Sometimes, human readability imposes a performance cost: in our experience, IPUMS PMA datasets are small enough that this is not an issue.

shiny

Interactive graphics shown throughout this blog are built with the shiny package.

Watch for updates here

We may add more package suggestions for future posts!

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.