New Infrastructure for 2013

Post Information

Posted on January 01, 2013

By Daniel Morrison

As we roll into a new year, we’re excited to also roll out our new infrastructure.

Our Pingdom-reported uptime was 99.89% for December but we know that’s not near good enough. Here’s what we have planned with a bit of background.

Harmony in 2012

We’ve long been limited by one non-redundant server that controls most of Harmony. If this server had any hiccups, we felt it across all of Harmony. We were also limited in the maintenance we could perform.

In August, we added a load balancer to give us more flexibility. This let us quickly re-route traffic to a slave server when our primary server had hiccups. While we still didn’t have an ideal setup, this was a definite improvement.

In early December, we moved our MongoDB database from a single server with a slave into a proper replica set. This not only allowed us faster disaster recovery, it also removed a major barrier in moving to a new server setup.

2013: Redundancy, Uptime, and Upgrades

Our new infrastructure brings us a new level of redundancy, allowing us, both automatically and manually, to re-route traffic around problems and perform maintenance without affecting customer sites. It also allows us to add capacity at any time, giving us much more flexibility in dealing with unexpected surges in traffic (like a DDoS attack).

We’re currently testing out our new production servers. This week, we’ll be moving over a few of our own sites as well as a few customer volunteers. When we move everyone over (we’ll announce before we do) you won’t notice any downtime and should notice faster response times.

Under the hood, we’ve been able to make other improvements including upgrades to Ruby 1.9.3 and Rails 3.2, as well as making IPv6 available for all sites.

What’s next?

New hardware means we’ll be comfortable pushing the bounds of Harmony (and not worried about servers going down). For example, we can better separate admin and content code. Or we could route all API traffic to certain servers, keeping our main ones clear for page traffic.

Finally, we’ve added checks for the new infrastructure to our Status Page. We’re striving for zero downtime, and we want to be as transparent as possible.

Uptime Report for [Beta] Active Live Sites: Last 30 days

Want early access?

If you have a live site you’d like to get on the new infrastructure, use the Feedback link in the admin to volunteer!

Make a Comment