Docker: Ignoring the Whale in the Room
Docker is a new and interesting technology. I’d read about it and spent some time creating images and containers at home, but I always struggled to understand what it would look like to use Docker every day working on a non-trivial application. I wondered if it could be useful at the MPC, if we could use it to run tests more consistently, if we could use it to create development environments faster, or if we might even be able to realistically run containers in production.
So It’s Like a VM?
What is Docker, then? Why is it interesting? I’ve often heard (and said myself a few times) that Docker containers are similar to ultra-lightweight VMs, but that’s not quite accurate. Docker makes use of Linux kernel features (specifically: Linux namespaces and cgroups) to isolate container processes from their host OS. Containers have their own process space, networking, and root file system. A container must include all software and dependencies it needs, but there’s no virtualized hardware.
In addition to the containerization technology, Docker also makes use of AuFS, a layered differencing filesystem. This allows images to be built on top of each other in layers, simplifying container development and deployment.
Finally there’s the Docker Registry. Similar in nature to public source repositories like Github or Bitbucket, the Docker Registry allows users to push, pull, and share Docker images. Because of the registry, many simple, fully functional images can be obtained for free, and more complex ones can be built using them as a starting point.
A Small Docker Experiment
I set out to answer some of my own lingering questions, scoped to the perspective of a developer. I’m not an ops person, so it would be difficult (if not naive) to go into the details of how we might structure a docker deployment in production.
First, an overview of the application I used as a case study, our IPUMS codebase. It’s a jruby/rails application that powers the front end of many of our data projects: IPUMS International, IPUMS USA, American Time Use Survey X, amongst others. Each project has different features toggled, different css, and other customizations. The Docker solution is going to have to support running the application as any project, and in the best case, multiple projects at once.
The application has a few moving parts:
- A rails web application
- An extract engine daemon
- A shared user database
- A database per project
- Several cron jobs and rake tasks that are periodically manually run
Both the web application and the extract engine need access to the databases. The web application needs to be able to serve the files generated by the extract engine. The MySQL databases should be persistent, even if the containers are re-created. Since this is going to focus on development, I’ll skip the cron jobs and rake tasks.
The Technical Nitty Gritty
Given those bits, how might they be divided into Docker images and containers? It’s generally advocated that a docker container only contain a single process or service. I’m also going to use data volume containers (a pattern described in detail here: https://docs.docker.com/userguide/dockervolumes/) We’re probably looking at 6 containers:
*MySQL Data Container: A container that exposes a shared volume for SQL data *Extracts Data Container: A container that exposes a shared volume for extracts *Web Container: runs rails app *Extract Container: runs extract engine *DB Container: hosts MySQL *Reverse Proxy Container: When running multiple projects at once, this will proxy requests to the correct application/container
I’ll need at least three images: one for the IPUMS code, one for the DB, and one for the proxy. It’s tempting to build two IPUMS images: one for the web application and one for the extract engine. For a production deployment, that might even be the best way to go. However, I want to keep this simple so will build a single IPUMS image that will run the web application or extract engine based on environment variables.
There are already many MySQL and nginx images (including officially supported images from Docker) available, so I’ll use those for the database and proxy containers. I’ll have to build my own IPUMS image, however. After reading some of Phusion’s Docker related articles, I’ve chosen to use the phusion/baseimage as the base. The rationale is described in excellent detail here: https://github.com/phusion/baseimage-docker
I’ll start by adding a Dockerfile to the project root. The only system dependencies I need added to the baseimage are java and jruby. Then I’ll need to add the IPUMS application itself, its gems, and scripts to start the webserver or extract engine. My IPUMS Dockerfile looks like this:
To control whether the container starts the webserver, the extract engine (or both), the RUN_WEBSERVER and RUN_EXTRACT_ENGINE are read by the run scripts installed to /etc/service. Here’s what the webserver run script looks like:
Before I can run a container, though, I’ll have to create and seed a MySQL volume container. Our IPUMS codebase already has a script to help developers seed their local DBs, so I’ll co-opt it to seed a mysql instance in a container with a new script called docker_init.sh. This is how it works: It creates (but doesn’t run) two volume containers if they don’t exist: ipums_db_data and ipums_extract_data. These will exist outside the rest of the configuration and will persist (along with the data they store) until manually removed. Then it spins up a mysql container that stores its data in ipums_db_data, bound to a local port. Our seed script is run against that mysql instance and then the mysql container is stopped and removed.
Here’s the script:
At this point, I can actually test my IPUMS container with these three commands from the project root:
With any luck, opening http://localhost:3000 will render the homepage.
Now that I’ve got images containers, how should I wire them together? One option is a handful of shell scripts, but thankfully there’s Fig (http://www.fig.sh): a tool that can orchestrate running many containers for an application.
As a proof of concept, I want Fig configured to automatically start two different project websites and extract engines, a mysql server, and a reverse proxy. Fig allows grouping a set of containers and all their run parameters into a YAML file. Then it’s just a matter of running “fig up”.
Here’s my fig.yml:
Now that everything is in place, running the application locally is extremely simple. It involves four steps:
- Install Docker and Fig
- Pull a copy of our code
- ./docker_init.sh
- fig up
Conclusions
In the end, I have mixed feelings about the process and the technology. I’ll say first off that I did most of this work under OSX and boot2docker. Running the Docker engine in a VM is not ideal; there’s a very significant performance penalty, the networking configuration of Docker is made more complex, and the shared folder performance of VirtualBox makes mounting volumes from the local machine unusable.
There are other technologies that would allow similar automated bootstrapping. Vagrant would be an obvious choice, and one we’ve explored on other projects.
Docker has a few clear advantages, however. Compared to automating a set of VMs, Docker uses much, much less disk space and requires significantly less overhead (even with boot2docker, I’m only running 1 VM instead of 6). There is a very active Docker community and a great deal of images on Docker Hub to use or learn from.
Our IPUMS codebase is fairly complex and has a history of requiring an embarrassing amount of time to configure a development environment. Bootstrapping the entire process with ./docker_init.sh && fig up
is hard to ignore.
Further reading:
- Docker Docs: https://docs.docker.com/
- Fig: http://www.fig.sh/
- Docker Docs, Understanding Docker: https://docs.docker.com/introduction/understanding-docker/
- Phusion Blog: http://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
- Mathew Miner Blog Post on Docker: http://matthewminer.com/2015/01/25/docker-dev-environment-for-web-app.html
- Dev Ops U Blog Post on Docker Misconceptions: https://devopsu.com/blog/docker-misconceptions/
Authored by Dan Elbert
Infrastructure