Lessons from launching billions of Docker containers

The Iron.io Platform is an enterprise job processing system for building powerful, job-based, asynchronous software. Simply put, developers write jobs in any language using familiar tools like Docker, then trigger the code to run using Iron.io’s REST API, webhooks, or the built-in scheduler. Whether the job runs once or millions of times per minute, the work is distributed across clusters of “workers” that can be easily deployed to any public or private cloud, with each worker deployed in a Docker container.

At Iron.io we use Docker both to serve our internal infrastructure needs and to execute customers’ workloads on our platform. For example, our IronWorker product has more than 15 stacks of Docker images in block storage that provide language and library environments for running code. IronWorker customers draw on only the libraries they need to write their code, which they upload to Iron.io’s S3 file storage, where our message queuing service merges the base Docker images with the user’s code in a new container, runs the process, then destroys the container.

In short, we at Iron.io have launched several billion Docker containers to date, and we continue to run Docker containers by the thousands. It’s safe to say we have more than a little experience with Docker. Our experience launching billions of containers for our customers’ workloads has enabled us to discover (very quickly) both the excellent benefits and the frustrating aspects of Docker.

The good parts

We’ve been fortunate to interact regularly with new technologies, and although new technologies bring their own sets of problems, they have helped us achieve otherwise impossible goals. We needed to quickly execute customer code in a predictable and reproducible fashion. Docker was the answer: It gives us the ability to deploy user code and environments in a consistent and repeatable way, and it provides ease of use when running operational infrastructure.