Blog Archive

Check out our blog archive. Click on a headline to read the teaser.

Ingesting the 2020 U.S. Census with DuckDB by Colin Davis · November 27, 2023

DuckDB made it possible for us to reshape the U.S. Census with pure SQL and a sprinkling of classic UNIX utilities.
Read More ›

Reading the Parquet Data Format in Rust by Colin Davis · July 1, 2022

Move Beyond the Basic Examples
Read More ›

Implementing The Fastest (Pseudo) Jaro-Winkler Algorithm in Rust by Jake Wellington · June 24, 2022

Creating one of the fastest jaro winkler implementations available
Read More ›

Upgrading the Shared Conda Installation at ISRDI by Kelly Thompson · November 28, 2020

A shared conda installation provides a multitude of benefits for an organization, but upgrading it can be a challenge. This is that journey.
Read More ›

GPL: Do We Need to Wait for the Courts? by Jimm Domingo · November 2, 2020

Is there another way to resolve the ambiguity with the GPL that doesn't involve the courts?
Read More ›

More Better Python: 10 Cool New (to me) Python Things by Ben Klaas · January 8, 2020

One of the sublime pleasures of Python is discovering newer, better ways at doing Python. Here are 10 I've discovered recently.
Read More ›

Reflections on Creating an API Program by Fran Fabrizio · January 6, 2020

As the IPUMS NHGIS Metadata and Data Extract APIs reach v1.0, this is a great time to look back and reflect on how we got here.
Read More ›

Exploring the Weird World of IPUMS Data Availability by Colin Davis · October 3, 2019

IPUMS aims to harmonize all the world's demographic data, but sometimes the data is just not there, leaving us with interesting user experience challenges as we help users discover when data is and isn't available.
Read More ›

The GPL License and Linking: Still Unclear After 30 Years by Fran Fabrizio · October 4, 2018

The prevalence of GPL licensing for R libraries in CRAN, the challenge that creates for users, and the reckoning that the R community might not see coming.
Read More ›

Big Data on a Laptop: Tools and Strategies - Part 3 by Colin Davis · September 19, 2018

In our final installment of this series, we show how to harness all the compute cores available on your local system, turning it into a personal cluster for parallel computing.
Read More ›

Big Data on a Laptop: Tools and Strategies - Part 2 by Colin Davis · July 5, 2018

When you've hit the memory or storage limits of your local machine, it's time to look at more efficient data storage formats. Today, we explore Parquet.
Read More ›

Big Data on a Laptop: Tools and Strategies - Part 1 by Colin Davis · May 25, 2018

Doing analysis on big data doesn’t have to mean mastering enterprise Big Data tools or accessing hundred-node compute clusters in the cloud. In this series, we’ll find out what your humble laptop can do with the help of modern tools first.
Read More ›

ipumsr - Integrating IPUMS Data with R by Greg Freedman Ellis · November 17, 2017

We are excited to announce the ipumsr R package, which helps make importing IPUMS data into R easy.
Read More ›

Flame Graphs: Making the Opaque Obvious by Colin Davis · August 9, 2017

With a flame graph style profile of your application, you can spot poor performance hotspots even at a glance
Read More ›

Open Sourcing Code with Intention by Fran Fabrizio · February 17, 2017

A quick guide to what you need to think about before you open source your code.
Read More ›

Slurping Up Excel Data on the Quick: Python, Pandas, and Pickle by Ben Klaas · February 14, 2017

If you have very large tables of data imprisoned in a vendor-locked Excel jail, consider setting them free by caching worksheets using Python+Pandas+Pickle.
Read More ›

Towards a Sustainable Excel by Ben Klaas · February 3, 2017

Building Excel Macros With Python, part 3 of a series on reinventing our metadata management environment.
Read More ›

MPC IT Shark Tank - Cycles 4 & 5 Results! by Fran Fabrizio · January 3, 2017

We closed out 2016 with two more rounds of Shark Tank! Read on for details.
Read More ›

Automated Analysis of a Data Workflow - Part 2 by Jesse Erdmann · September 14, 2016

The conclusion of the story of how we created DCP Analytics - our in-house automated, web-based analysis tool using Pandas, Bokeh, Jupyter and Conda to help our researchers quickly find data anomalies and processing errors in our data production pipelines..
Read More ›

Automated Analysis of a Data Workflow - Part 1 by Jesse Erdmann · August 24, 2016

The story of how we created DCP Analytics - our in-house automated, web-based analysis tool using Pandas, Bokeh, Jupyter and Conda to help our researchers quickly find data anomalies and processing errors in our data production pipelines.
Read More ›

MPC IT Shark Tank - Cycle 3 Results by Fran Fabrizio · July 14, 2016

Cycle 3 of MPC IT Shark Tank wrapped up last month. Here's the scoop!
Read More ›

Excel VBA and Version Control by Jimm Domingo · May 19, 2016

The second post in the series about Team Unicorn Rainbows' work in the first round of MPC IT Shark Tank.
Read More ›

MPC IT Shark Tank - Cycle 2 Results by Fran Fabrizio · March 15, 2016

Cycle 2 of MPC IT Shark Tank has concluded, and once again we have some fantastic results!
Read More ›

Embracing the Uncertainty of Software Development by Fran Fabrizio · February 29, 2016

Thoughts on how to avoid being fooled by the false certainty of deadlines and estimates and approach our work more honestly.
Read More ›

Improving Menu Creation in Excel with VBA by Jimm Domingo · February 25, 2016

In this series, we present some highlights from Team Unicorn Rainbows' work in the first round of MPC IT Shark Tank. This first post describes how we improved menu creation in Excel.
Read More ›

Exploring Nicollet Island by MPC IT · December 21, 2015

Using MPC Data to Research and Visualize the Demographic History of Your Neighborhood.
Read More ›

MPC IT Shark Tank - Cycle 1 Results by Fran Fabrizio · December 21, 2015

The results of our first-ever MPC IT Shark Tank round are in, and a winner has been crowned!
Read More ›

High Performance Analysis of Big Spatial Data by MPC IT · November 18, 2015

Our own HPC specialist Ankit Soni and the TerraPop team presented their published article at the IEEE Big Data 2015 conference in Santa Clara earlier this month.
Read More ›

Data Product Spotlight: IHIS by Colin Davis · November 10, 2015

In this edition of the data product spotlight, Colin introduces us to IHIS, our health survey data product.
Read More ›

Data Product Spotlight: IPUMS USA by Colin Davis · October 21, 2015

This edition of the Data Product Spotlight goes wayyy back to the very beginnings of the MPC and takes a look at our first and still most widely known data product, IPUMS USA.
Read More ›

IT Core Spotlight: Jayandra Pokharel by MPC IT · October 8, 2015

Introducing Jayandra Pokharel, one of our IPUMS developers.
Read More ›

Data Product Spotlight: Current Population Survey by Colin Davis · September 30, 2015

Our newest data product spotlight shines on CPS, the federal government monthly survey on employment in America, among other topics.
Read More ›

IT Core Spotlight: Jim Young by MPC IT · September 22, 2015

Meet Jim Young, who joined us this year on a part-time basis as a way to stay involved in software development after retirement.
Read More ›

Data Product Spotlight: Series Intro and the American Time Use Survey by Colin Davis · September 16, 2015

Colin Davis launches a new series which will provide quick summaries of our microdata products from a developer perspective. In this first post, he introduces the concept then takes a look at the American Time-Use Survey.
Read More ›

MPC IT Shark Tank! by Fran Fabrizio · August 24, 2015

Our latest initiative to keep things fun and solve those pesky challenges that live in the shadows of our main projects.
Read More ›

IT Core Spotlight: Colin Davis by MPC IT · July 8, 2015

Meet Colin, the longest tenured member of the IT Core.
Read More ›

New Blog Platform, New Blog Look by Fran Fabrizio · July 2, 2015

We've updated the blog platform from WordPress to Jekyll (GitHub Pages)! We also took the opportunity to refresh the blog's look. Read more about the motivations for the switch.
Read More ›

Our IT Hiring Process: How and Why by Fran Fabrizio · June 5, 2015

An overview of our IT hiring process and why we designed it this way.
Read More ›

Affirmatively Insecure: Chrome and SHA-1 Certificates by Brian Gottreu · June 1, 2015

SHA-1 is on the way out and being helped through the door by Google. Brian will show a way to find which SSL certificates still use SHA-1 signatures and how to prioritize their replacement.
Read More ›

Importing Fixed Length Data Using Ruby (Part Two) by Colin Davis · May 7, 2015

A follow-up to my post discussing my 'hflr' Ruby gem for reading hierarchical data in FLR format, today I'll demonstrate how to combine 'hflr' with a simple importer class to load a database with the data.
Read More ›

RailsConf 2015 Wrap-Up by Marcus Peterson · April 29, 2015

The entire IPUMS web development team saddled up and headed to Atlanta for RailsConf 2015. Here's our summary.
Read More ›

IT Core Spotlight: Jon Renner by MPC IT · April 10, 2015

The IT Core Spotlight shines on Jon Renner this week. Jon is one of our newest hires, working in our Data Production group on historical USA census data.
Read More ›

IT Core Spotlight: Jason Goray by MPC IT · March 16, 2015

This week, the IT Core Spotlight takes a look at Jason Goray, our UX/UI Developer and long time member of the MPC IT Core.
Read More ›

Migrating a Very Large Project from SVN to Git by Jayandra Pokharel · March 10, 2015

Jayandra shows us how to migrate a large, complex project from Subversion to Git.
Read More ›

A Recipe for a New TerraPop UI: Part 1 by Fabio Trabucchi · February 19, 2015

A look at our newest project, Terra Populus, and the collaborative process we're utilizing to design develop the new TerraPop UI.
Read More ›

Docker: Ignoring the Whale in the Room by Dan Elbert · February 12, 2015

Dealing with fixed-length record (FLR) data is a reality for us at the MPC. Colin introduces readers to his Ruby Gem, HFLR, which makes processing hierarchical fixed-length record data a bit easier.
Read More ›

MPC IT New Year's Resolutions - Technical Debt Edition by Fran Fabrizio · January 21, 2015

In the spirit of the new year, we thought we’d share our technical debt reduction resolutions, 2015 edition!
Read More ›

Introducing Team Spotlights by MPC IT · January 7, 2015

An introduction to Ember.js for devs who are used to thinking in Rails.
Read More ›

IT Core Spotlight: Fran Fabrizio by MPC IT · December 7, 2014

A short conversation with Fran Fabrizio, the IT Core Director.
Read More ›

Take It and Run: A Tale of Risk, Failure, and the Beginning of a Javascript Journey by Jake Wellington · December 1, 2014

How we use csvkit to wrangle data around here.
Read More ›

Introduction to Terra Populus by Alex Jokela · November 8, 2014

The MPC's data has been cited thousands of times. In this article, we explore how we connected those citations with our user accounts using fuzzy name matching.
Read More ›

Data Duplication Detection by Jesse Erdmann · October 8, 2014

The folks at the NYT's Upshot recently did a piece charting how Americans have moved between states since 1900, using Census microdata obtained from ipums.org, the MPC's longest running data project.
Read More ›

MPC IN THE NEWS: Top 1 Percent: What Jobs Do They Have? by MPC IT · September 12, 2014

In this visualization from the Business section of the New York Times, IPUMS data was used to explore the profession of America's top 1%.
Read More ›

Optimizing for Developer Happiness: Migrating from Java to Ruby Without Missing a Beat by Marcus Peterson · September 9, 2014

Read More ›