Performing a deployment requires making changes to production. Change = risk. There are many techniques available to deploy changes - some simple, others quite complex; some require downtime, others do not. Blue/green deployment is one of these techniques that is quite simple, requires zero downtime, and best of all, very little risk.
The process is this:
Bring up the auxiliary resources required to run your application
These are things like load balancers and security groups.
Deploy version 1 of your application
All of the traffic from the load balancer goes to this version.
Create version 2 of your application
This usually is different code (new feature, bug fix, etc), but could also be a different instance type or key name.
Switch traffic from version 1 to version 2
Here is your blue/green deployment. The traffic going through the load balancer no longer goes to Version 1, it goes to Version 2 instead.
Delete Version 1
Once you are happy with how Version 2 is performing, you can delete the resources (instances, etc) that Version 1 was using.
This is the nice, normal flow of a blue/green deployment. During the normal flow, your application is always online because for a short period of time, version 2 and version 1 are both running at the same time - both versions, simultaneously. The is how zero downtime is achieved. Once all Version 2 is enabled and running, Version 1 is disabled. This happens as fast as the load balancer can put the instances in Version 2 in to service, which is dependent on your health check settings.
But what happens when things don't go according to plan? What if there is a change in Version 2 that causes the navigation bar to disappear, or user login to stop functioning? This is where the true power of blue/green deployment comes in. Deploying a new version causes zero changes to Version 1. Version 1 is completely untouched during the deployment. This means it is always the last known good state.
The risk of performing a deployment is greatly reduced because you can, at any point in time, roll back to the previous version, which is always fully provisioned, in a known good state, and ready to go.
Another approach to performing a deployment is a rolling deployment. This is where you deploy a new version of code by taking 1 or more servers out-of-services, perform the update on them, and then put them back in service - repeating the process until all instances in the cluster have been updated.
This is also something AWS has support for at the ELB level.
Lets quickly go over the pros of a rolling deployment over blue/green deployment
Sounds appealing right? The success case usually is. It's the error cases where things get interesting. What happens when the new version 2 has issues? Here is where the cons really add up
While rolling deployments sound nice in theory, you are taking on an incredible amount of risk every time you deploy. If you are using Auto Scaling groups, changing the instance type or user data requires changing the Launch Configuration, which is yet another process that needs to be managed. This is potentially a different process than you use for rolling updates to existing servers.
It's not all rainbows and unicorns with blue/green deployments. There are a few things that need to be taken into consideration in order to have deployments truly risk-free.
When switching from version 1 to version 2, nothing else should happen. For those Ruby on Rails developers, this means deploying a new version of code does not run any database migrations. All schema migrations should be performed on a separate schedule.
Consider this: if Version 2 requires a DB schema change to function, when do you perform that migration? Whether you are doing a blue/green deployment or a rolling update, at some point in time, both version 1 and version 2 need to be running simultaneously. Given that this is always the case, make sure Version 2 can run on Version 1's schema.
This also brings us back to the error case. If you have Version 2 running, with a different DB schema, and need to roll back to Version 1, what will happen? How will Version 1 run on Version 2's schema?
Having both Version 1 and Version 2 of your code running on the same DB schema is not something that comes naturally to the Rails community, which is unfortunate. You will need to be aware of this.
Etsy have a thing called "Schema Change Thursday". They perform all database schema changes once a week only, yet deploy to production 50 times per day. How do they do this? A few things:
This is what is considered the Best Practice for deploying new features and handling DB changes.
Since both Version 1 and Version 2 are running simultaneously during a deployment, for some period of time, you will be running double the number of instances for an application than the normal state requires. There are real costs involved with this, so please be aware of it, and once you are certain Version 2 is working perfectly well, terminate Version 1. Another thing to consider is if you are getting close to the limit of EC2 instances you can run in your AWS account, you may need to increase this limit so that deployments can occur seamlessly.
Since all of the instances for version 1 are terminated when they are no longer required, do not store anything on the local disk (even EBS volumes). Any data will be lost forever after version 1 no longer exists. This is pretty standard for any auto scaling system, so should not be a surprise.