Imagine that you're building the next big thing and would like to use cloud services in crafting that beautiful unicorn to life. To help you get things done and sleep soundly at night, big established players (e.g. Heroku, AWS, Azure, GAE) provide you with a big black box that handles one of your non-core activities: complex system administration. However should you accept this black box as a given?
At Futurice there is a long history of internal development to create tools and procedures to let our employees focus on their work. This means making hour markings less painful, tracking inventory (eg. phones, laptops, gadgets, books), integration with 3rd party services to avoid manual duplication of HR information, and in general creating tools for problems worth solving.
Since 2015, we’ve been containerizing our existing internal applications to be cloud-ready. The plan has been to replace our own hosted hardware in a local data center, bring forth a better developer experience, and ease the maintenance burden for our IT staff. This would result in less hassle with project kickoffs, maintenance, firewall rules, backups, hardware capacity for VPS-s, SSL certificate renewals, web server configurations and so forth. This kind of hassle that is a sign of dated working habits.
We could not find a product on the market that would meet our minimum requirements:
Making your own Platform as a Service (PaaS) is seen as a foolish endeavour due to the complexity combined with rapid development and turbulence in this field that renders home-rolled solutions obsolete within months. In the past our automation was done using tools such as Fabric, Puppet and Ansible to bring cloud/hosted servers up to a predefined state. It worked like eating soup with chopsticks — slightly messy. Docker made me begrudgingly throw all this away to package software into containers and lean on self-healing orchestration platforms like Docker Swarm and Kubernetes.
We started with a 3rd party solution called Deis in late 2015. Soon our containers were up and running and things looked good, except none of us had any idea on how to maintain this black box. When Deis eventually broke down a few months later, we were back at square one.
The journey to making our own platform began with a proof of concept (PoC) in early 2016 to gauge the size of this possibly massive undertaking. Much to my delight the PoC required only two days of tinkering and could potentially meet our minimum requirements. This result reassured me to continue building a PaaS based on readily available open source components running on a single cloud provider. The main software components are Docker Swarm, Docker Flow Proxy and SSSD. Amazon Web Services provides load balancing (ELB), instances (EC2), a private network (VPC), firewalls (SG), encryption keys (KMS), key-value storage (S3), relational databases (RDS PostgreSQL), SSL certificates (ACM), caching (EC Redis, Memcached), persistent storage (EBS) and logging (CloudWatch) among other things. Docker Hub serves as a private container registry for image backups. Apache handles authentication with Google Suite to provide a Single Sign On experience for our employees' intranet. Red Hat's System Security Services Daemon (SSSD) serves as the SSH key lookup for our employee database (LDAP) needed for authentication. Docker Flow Proxy listens to the Docker API to keep a tally of running services and handles service discovery in the form of routing service domains to their respective swarm service node ports. Docker Swarm is the heart of the system that provides the orchestration for deploying and running containers at scale. A touch of scripting and four weeks later the first iteration was running our internal services.
The main goals for the PaaS were:
These goals were met by:
Now, after creating an image recipe (Dockerfile), deployment to https://retirement-plan.lb.example.com is easy as:
$ futuswarm app:deploy --image bitcoin-miner --tag 1.0 --name retirement-plan
The next person to update the software only needs to increment the build (eg. --tag 1.1) to have new changes live. This is a reproducible application deployment process from local development to production utilizing the best of open source configured to our internal requirements. Our platform will be open sourced soon. This is not to say there aren’t better solutions out there, however we’ve laid the groundwork for better internal practices and tailored the hosting environment to our needs.
Having dared a peek into the darkness of the rabbit hole, knowledge of the boring parts with a bit of programming magic helped create a system that ticks all the boxes for our IT's maintenance happiness while providing a tool for our developers to craft and showcase their creations with less pain.
EDIT 28.12: We're live https://github.com/futurice/futuswarm