Blog Category: Data

Ingesting the 2020 U.S. Census with DuckDB by Colin Davis · November 27, 2023
DuckDB made it possible for us to reshape the U.S. Census with pure SQL and a sprinkling of classic UNIX utilities.
Read More ›

Exploring the Weird World of IPUMS Data Availability by Colin Davis · October 3, 2019
IPUMS aims to harmonize all the world's demographic data, but sometimes the data is just not there, leaving us with interesting user experience challenges as we help users discover when data is and isn't available.
Read More ›

Big Data on a Laptop: Tools and Strategies - Part 3 by Colin Davis · September 19, 2018
In our final installment of this series, we show how to harness all the compute cores available on your local system, turning it into a personal cluster for parallel computing.
Read More ›

Big Data on a Laptop: Tools and Strategies - Part 2 by Colin Davis · July 5, 2018
When you've hit the memory or storage limits of your local machine, it's time to look at more efficient data storage formats. Today, we explore Parquet.
Read More ›

Big Data on a Laptop: Tools and Strategies - Part 1 by Colin Davis · May 25, 2018
Doing analysis on big data doesn’t have to mean mastering enterprise Big Data tools or accessing hundred-node compute clusters in the cloud. In this series, we’ll find out what your humble laptop can do with the help of modern tools first.
Read More ›

Exploring Nicollet Island by MPC IT · December 21, 2015
Using MPC Data to Research and Visualize the Demographic History of Your Neighborhood.
Read More ›

High Performance Analysis of Big Spatial Data by MPC IT · November 18, 2015
Our own HPC specialist Ankit Soni and the TerraPop team presented their published article at the IEEE Big Data 2015 conference in Santa Clara earlier this month.
Read More ›

Data Product Spotlight: IHIS by Colin Davis · November 10, 2015
In this edition of the data product spotlight, Colin introduces us to IHIS, our health survey data product.
Read More ›

Data Product Spotlight: IPUMS USA by Colin Davis · October 21, 2015
This edition of the Data Product Spotlight goes wayyy back to the very beginnings of the MPC and takes a look at our first and still most widely known data product, IPUMS USA.
Read More ›

Data Product Spotlight: Current Population Survey by Colin Davis · September 30, 2015
Our newest data product spotlight shines on CPS, the federal government monthly survey on employment in America, among other topics.
Read More ›

Data Product Spotlight: Series Intro and the American Time Use Survey by Colin Davis · September 16, 2015
Colin Davis launches a new series which will provide quick summaries of our microdata products from a developer perspective. In this first post, he introduces the concept then takes a look at the American Time-Use Survey.
Read More ›

Keeping it Simple: Exploiting CSV and csvkit at the MPC by Ben Klaas · November 21, 2014
How we use csvkit to wrangle data around here.
Read More ›

Introduction to Terra Populus by Alex Jokela · November 8, 2014
Harmonizing Data at the MPC by Colin Davis · September 23, 2014