The prevalence of GPL licensing for R libraries in CRAN, the challenge that creates for users, and the reckoning that the R community might not see coming. Read More ›
In our final installment of this series, we show how to harness all the compute cores available on your local system, turning it into a personal cluster for parallel computing. Read More ›
When you've hit the memory or storage limits of your local machine, it's time to look at more efficient data storage formats. Today, we explore Parquet. Read More ›
Doing analysis on big data doesn’t have to mean mastering enterprise Big Data tools or accessing hundred-node compute clusters in the cloud. In this series, we’ll find out what your humble laptop can do with the help of modern tools first. Read More ›
If you have very large tables of data imprisoned in a vendor-locked Excel jail, consider setting them free by caching worksheets using Python+Pandas+Pickle. Read More ›
The conclusion of the story of how we created DCP Analytics - our in-house automated, web-based analysis tool using Pandas, Bokeh, Jupyter and Conda to help our researchers quickly find data anomalies and processing errors in our data production pipelines.. Read More ›
The story of how we created DCP Analytics - our in-house automated, web-based analysis tool using Pandas, Bokeh, Jupyter and Conda to help our researchers quickly find data anomalies and processing errors in our data production pipelines. Read More ›
In this series, we present some highlights from Team Unicorn Rainbows' work in the first round of MPC IT Shark Tank. This first post describes how we improved menu creation in Excel. Read More ›
Our own HPC specialist Ankit Soni and the TerraPop team presented their published article at the IEEE Big Data 2015 conference in Santa Clara earlier this month. Read More ›
This edition of the Data Product Spotlight goes wayyy back to the very beginnings of the MPC and takes a look at our first and still most widely known data product, IPUMS USA. Read More ›
Colin Davis launches a new series which will provide quick summaries of our microdata products from a developer perspective. In this first post, he introduces the concept then takes a look at the American Time-Use Survey. Read More ›
We've updated the blog platform from WordPress to Jekyll (GitHub Pages)! We also took the opportunity to refresh the blog's look. Read more about the motivations for the switch. Read More ›
SHA-1 is on the way out and being helped through the door by Google. Brian will show a way to find which SSL certificates still use SHA-1 signatures and how to prioritize their replacement. Read More ›
A follow-up to my post discussing my 'hflr' Ruby gem for reading hierarchical data in FLR format, today I'll demonstrate how to combine 'hflr' with a simple importer class to load a database with the data. Read More ›
The IT Core Spotlight shines on Jon Renner this week. Jon is one of our newest hires, working in our Data Production group on historical USA census data. Read More ›
Alex Jokela - Lead TerraPop Developer, beekeeper, photographer, chicken raiser. The IT Core Spotlight visits with the Core's renaissance man. Read More ›
Dealing with fixed-length record (FLR) data is a reality for us at the MPC. Colin introduces readers to his Ruby Gem, HFLR, which makes processing hierarchical fixed-length record data a bit easier. Read More ›
The MPC's data has been cited thousands of times. In this article, we explore how we connected those citations with our user accounts using fuzzy name matching. Read More ›
The folks at the NYT's Upshot recently did a piece charting how Americans have moved between states since 1900, using Census microdata obtained from ipums.org, the MPC's longest running data project. Read More ›