Snowdrift and Keter

We can now deploy the Snowdrift.coop website with a single command! Starting from the source directory, admins can run yesod keter and have an update running live in a matter of minutes.

Streamlining deploys is good for everyone, and it's especially nice for a volunteer-heavy organization like Snowdrift. I don't want my own time to be a bottleneck for adding new features or bugfixes that come from our community!

Let's review how our new infrastructure works, how we implemented it, and what can we learn from this experience. This article will be somewhat heavy on technical details.


From Foundation to Crown

Yesod and Keter both take their name from the Kabbalah and can be translated as 'foundation' and 'crown' respectively. Yesod is the foundation, or framework, on which Snowdrift.coop is written. Keter is an operations tool for simplifying the deployment of web apps. Bundle your app appropriately, drop it in the proper directory, and Keter will handle the rest.

The switch to Keter affected a number of tools in our production stack:

Amazon Web Services

All of the following systems are running in Amazon's cloud. This is not necessarily our first choice, as I will discuss below. For the moment, however, we are using the following Amazon services as part of our Keter setup:

Postgres

Postgres is our database backend. We configured Keter to pass the pre-existing database credentials to all apps. Keter does have a plugin for auto-managing an app's database, which creates and manages the database access permissions. Since we had an existing database, though, we chose not to use that plugin.

Cron

We use cron to run the payment-processing app, and this presented a small wrinkle. Keter creates new working directories for each version of the app; how would cron know where to find the executable? The solution suggested was to create a background task that maintains a symlink.

Nginx

"But wait! Can't you use Keter without Nginx?" Yes, absolutely. We use Nginx for something else: redirecting non-encrypted traffic to the proper, encrypted port. We do this by pointing the load balancer's http endpoint to a pseudo-random port served by Nginx.

Keter

We set up Keter using the suggestions found on the Keter GitHub repo.

We have two distinct "Keter" configuration files. For Keter itself, we have an unremarkable file that lives on the server in /opt/keter/etc. The Snowdrift Keter bundle configuration, however, is more interesting. That file describes Snowdrift to Keter, telling it what executables should be run and various other details. It shows off some nice features of Keter.

The yesod-bin utility uses that same Snowdrift-specific file to package the app and send it to the server. We invoke that action with yesod keter.

Process and Failures

Using Elastic Load Balacing (ELB) was a last-resort effort to get running with Keter. The first option was to continue using Nginx as the encryption endpoint, as we had been doing previously. This did not work immediately. As best we could tell, HTTP redirection responses were being occasionally being translated into error responses somewhere in the stack.

Just trying to find the fastest solution, we tried the next option: using Keter as the encryption endpoint. This failed much more reliably. Keter does not expose many configuration options for TLS yet, and for some reason it does not play well with our certificate. As best we could tell, we were hitting the same problem mentioned in a Keter issue. Since this involves a private TLS key, it may be hard to duplicate.

Thus, we finally tried using Amazon, knowing that many others who use Keter do so with ELB. This required changing control of our zone over to Route 53. We use a bare domain (snowdrift.coop) for our website, and pointing a bare domain at ELB cannot be done through the standard DNS system.

After the switch, unfortunately, we were still getting errors! We finally noticed the pattern, however, and were able to root out a Keter bug that was closing connections between Keter and Snowdrift after 0.3 seconds. Many of our pages take longer than that to load. That was probably the problem all along!

There is a lesson in this tale about digging deeper when a problem is encountered. If we had worked to understand the issue with Nginx, a lot of hassle could have been saved. Luckily, in the end it is still a happy tale.

The Future

Now that we recognize the underlying timeout problem, we could easily switch to a different encryption endpoint, giving us the flexibility to move away from AWS. Keter itself would be preferable as the endpoint, though we would have to resolve our outstanding issue with TLS.

Most importantly, with this fix in place we can focus more effort on core development of the site! We will undoubtedly hit many other bottlenecks, but now deploying fixes for those problems will be a snap.