MPC IT Shark Tank - Cycle 2 Results

Cycle 2 of MPC IT Shark Tank has concluded, and once again we have some fantastic results!

Last week we wrapped up our second round of MPC IT Shark Tank, and the teams wildly exceeded my expectations yet again! This cycle we had three teams present their work from the past three months:

  • Sharkbook - The project explored the options for deploying Jupyter Hub and Notebooks at the MPC, with an emphasis on how to best integrate with existing researcher workflows.  
  • IPUMS Mobile - The team’s goal was to create a mobile-friendly application for accessing IPUMS USA metadata and servicing simple questions that can be answered using variable code frequencies (e.g. “How many married people in 1910?”)
  • Cluster ALL The Things - This project aimed to create a framework for quickly launching a wide variety of jobs and applications on our shared cluster environment.

First up was the Sharkbook team. In their presentation, they demostrated how they used a Jupyter Notebook to capture a complicated, multi-step data production process for the IPUMS CPS project. This process involves a large number of shell scripts spread across many carefully-named directories which indicate the order in which they need to be run. Researchers feed data files through this directory structure step-by-step via the command line, monitoring the job output as they go and performing QA checks along the way. The team was able to capture this entire workflow within a single interactive web-based Notebook, placing all of the input, code, output, documentation, and even the resulting data analysis in one place. Researchers can traverse and run the documented steps, observing the output of each step directly within the notebook. Even better, the team demo’ed some visualizations of the data after key steps, which are integrated directly within the Notebook and allow the researcher to do immediate visual checks of data quality. Some early adopter researchers are already transitioning their work to Notebooks.

Next was the IPUMS Mobile team. They demonstrated a beautiful, responsive webapp that looked great on our phones and allowed users to quickly browse the metadata for our USA samples either by variable search or sample search. Even better, the team showed how they integrated frequency counts. For each variable, the app includes the number of respondents with each possible answer - either straight values for variables like AGE or coded values like Marital Status (MARST), where the coded values are things like Married, Single, Widowed, and so on. This is a great tool for answering quick demographic questions on the fly without needing to do an online analysis or extract using the full IPUMS USA webapp. While still only available internally for now, we hope to release it publicly in the future.

We wrapped up the presentations with the Cluster ALL The Things “team” (quotes because it ended up being a team of one). The presentation showed how Mesos could be used as a cluster scheduler to allow all sorts of things to be concurrently scheduled on our cluster, from traditional HPC jobs that simply launch Python scripts, to Apache Spark jobs, Flask webapps, Docker containers and a full Elasticsearch environment. This flexibility and self-service capability has the potential to dramatically alter the ability of the development teams to spin up and down dev and test environments. And Ops can better manage our production workflows and respond more nimbly in adjusting capacity or provisioning new services.

The energy in the room was high throughout all three presentations - each of them has the potential to impact a large segment of the MPC community. It was interesting that each targeted a different segment of that community: Sharkbook -> MPC researchers, IPUMS Mobile -> IPUMS users, and Cluster -> MPC IT staff. This also made it very difficult to pick a champion – honestly, all of these are wins for us. Nevertheless, we once again relied on our time-honored (read: we did it this way the only other time) tradition of “cheering loudness” voting. Multiple rounds were required to determine the champ, but in the end, Cluster ALL The Things prevailed in a one-woman-team upset! Congrats to June Taylor, the brains of the Cluster ALL The Things operation. Even more impressive, June is the MPC IT rookie, having just joined the team in February. What a way to hit the ground running!

Congrats also to the other two teams. It looks like all three projects will make it to production, which is impressive. Once again I am floored by the talent and creativity of this group of folks. Shark Tank Cycle 3 has launched and we’ll have more results to report in another three months. See you then!

Dialogue & Discussion