A shared conda installation provides a multitude of benefits for an organization, but upgrading it can be a challenge. This is that journey. Read More ›
The prevalence of GPL licensing for R libraries in CRAN, the challenge that creates for users, and the reckoning that the R community might not see coming. Read More ›
In our final installment of this series, we show how to harness all the compute cores available on your local system, turning it into a personal cluster for parallel computing. Read More ›
When you've hit the memory or storage limits of your local machine, it's time to look at more efficient data storage formats. Today, we explore Parquet. Read More ›
Doing analysis on big data doesn’t have to mean mastering enterprise Big Data tools or accessing hundred-node compute clusters in the cloud. In this series, we’ll find out what your humble laptop can do with the help of modern tools first. Read More ›
If you have very large tables of data imprisoned in a vendor-locked Excel jail, consider setting them free by caching worksheets using Python+Pandas+Pickle. Read More ›
The conclusion of the story of how we created DCP Analytics - our in-house automated, web-based analysis tool using Pandas, Bokeh, Jupyter and Conda to help our researchers quickly find data anomalies and processing errors in our data production pipelines.. Read More ›
The story of how we created DCP Analytics - our in-house automated, web-based analysis tool using Pandas, Bokeh, Jupyter and Conda to help our researchers quickly find data anomalies and processing errors in our data production pipelines. Read More ›
In this series, we present some highlights from Team Unicorn Rainbows' work in the first round of MPC IT Shark Tank. This first post describes how we improved menu creation in Excel. Read More ›
Our own HPC specialist Ankit Soni and the TerraPop team presented their published article at the IEEE Big Data 2015 conference in Santa Clara earlier this month. Read More ›
A follow-up to my post discussing my 'hflr' Ruby gem for reading hierarchical data in FLR format, today I'll demonstrate how to combine 'hflr' with a simple importer class to load a database with the data. Read More ›
Dealing with fixed-length record (FLR) data is a reality for us at the MPC. Colin introduces readers to his Ruby Gem, HFLR, which makes processing hierarchical fixed-length record data a bit easier. Read More ›
The MPC's data has been cited thousands of times. In this article, we explore how we connected those citations with our user accounts using fuzzy name matching. Read More ›