This is the second post in the series of "Cloud Best Practices". Last time we reviewed AWS IAM Best Practices to cover essentials of your AWS security barriers.
Now, let's get to basics - EC2 performance. When we "cloudify" our applications we tend to think of how we scale them and make robust, fault-tolerant and cloud-compatible in general. However, before going up and multiplying your EC2 instances with Auto Scaling groups you may need to look at that single EC2 box and ask yourself... "Why so slow?". Yes, that dreaded performance tuning task most of us never know enough to feel confident.
Fear not, for when talking about Linux performance there is one person that knows it all. Brendan Gregg, Senior Performance Architect at Netflix and an author of "Systems Performance" has given a "Performance Tuning EC2 Instances" session at AWS re:Invent 2014.
What do we learn from this session?
Performance tuning of an EC2 instance is a process that can be split to 4 different activities:
- Selecting EC2 instance
- Selecting EC2 features
- Tuning Linux kernel
- Observing performance
Selecting EC2 instance
The list of currently available EC2 instance types may not be immediately obvious. Some find it confusing. But here's what's important to remember:
- There are new generation instances, each type with its own specialization: T2 - burstable CPU power, M3 - standard instances, C3 - CPU (C4 is
comingnow available), I2 - local instance storage, R3 - memory, G2 - GPU.
- There are old generation instances: T1, M1, M2, C1, HI1, CR1. Obviously, it's best to migrate them: T1/M1 => T2/M3, C1 => C3/C4, HI1 => I2.
- In the past, the workloads had to match the hardware purchased but on AWS one can find the best matching instance type for every type of a workload. So we get to choose new "hardware" for every new application.
So how does one select a proper instance type?
- Match the workload, disk and caching requirements with instance specialization (storage/memory/CPU). Find out your bounding resource and choose an instance type that potentially does the best job in addressing it.
- Utilize Blue/Green deployment with Elastic Load Balancer and Auto Scaling groups to deploy your application to a new group built entirely with new instance types. Benchmark the group with a real load rather than micro-benchmarking a single instance.
- Alternatively, "brute-force" and run a load test on all possible instance types while measuring application's throughput, latency and eventually calculating the most effective price/performance ratio. However, when boosting up the throughput remember to check for acceptable latency distribution and make sure your 99th percentile is satisfiable.
- It makes sense to consider a re-selection if bounding resources evolve and loose in their significance, AWS prices change or new instance types come into existence.
I2 instance type optimizations
A number of additional optimizations need to be considered on I2 instance types using local SSD instance storage.
Make sure you use Linux kernel 3.8 or higher (available with Amazon Linux 2014.03, Ubuntu 14.04, RHEL 7 and CentOS 7) as it contains important optimizations in how local instance storage is accessed in virtualized environments.
Issue a TRIM command (
"fstrim -a") on a daily or weekly basis to discard unused blocks on a mounted SSD drive so it knows not to preserve them unnecessarily, normally an expensive and wearing operation.
With all of the advances, choosing the correct EC2 instance type is not enough. Next step is to choose between several EC2 features.
Selecting EC2 features
Speaking of EC2 features there are just three to keep in mind:
- Virtualization type - HVM and PV
- Placement Groups
- Single Root I/O Virtualization (SR-IOV) or, in Amazon parlance, Enhanced Networking
Recently we wrote about Paravirtual vs HVM images and here's another blog post on some PV/HVM aspects. To make long story short simply remember you should use HVM virtualization type when selecting an AMI from now on. To learn more about HVM and PV check out Brendan's posts on Xen modes (1, 2).
Placement Groups allow to logically group related EC2 instances within a single Availability Zone. Once assigned to the same Placement Groups AWS will attempt to allocate virtual EC2 instances in a closest proximity to each other, to enjoy better network performance (low latency and higher throughput up to 10 Gbps). Note that this is a "best-effort" feature which can not be always guaranteed, especially when newer instances are put in a group running for extended period of time already. Therefore, the best way to utilize Placement Groups would be to launch all required EC2 instances of identical type in a single request. Naturally, this feature assumes static allocation of EC2 instances, contrary to what we normally recommend with Auto Scaling groups.
And finally, SR-IOV is a specification to optimize network cards interaction with hypervisors, such as Xen used at AWS. Enabling enhanced networking results in lower CPU utilization and boosts up the PPS (packets per second) networking throughput while reducing network latencies and jitter at the same time. But in order to use it you have to:
- Launch a C3, C4, I2 or R3 instance type
- .. from an HVM AMI
- .. in VPC
- .. and follow the instructions to enable it on your Linux instance (here is a friendly step-by-step guide for Amazon Linux and Windows)
Having dealt with EC2 instance type and it's features we can now face the dragon - the Linux Kernel.
Tuning the Linux kernel
As you might expect, the subject of Linux kernel tuning requires a lot of understanding and expertise. And since pretending to be kernel experts isn't something we can comfortably do we'll let Brendan speak for himself and walk you through the number of possible "tunables".
Now, the process of kernel tuning can either be inefficient or efficient.
Inefficient tuning starts when you try playing the guess game and perform various random tunings (worse, several at once) until the problem mysteriously goes away. Or take a simplified view of the system, trying out only familiar or popular tools while looking for obvious issues. Or even try blaming some other parts of a system you're less familiar with, like the EC2 instance itself or "the network". While these approaches may occasionally work they hardly teach us much about the real state of a system or provide any useful information. Well, it's called inefficient for a reason :)
Efficient tuning is data-driven. It is built on observability of the workload and resources usage, results analysis and careful tuning of relevant tunables. A good example of this would be Brendan's USE method (with recommended checklists) where for every hardware and software resource one examines its Utilization, Saturation and Errors. Resources constraints usually show up through their saturation or high utilization which may lead to further actions or questions.
If the efficient process of EC2 tuning is data-driven and relies on careful examination of system state, which tools can we use?
Standard Linux performance tools could be categorized as statistical, profiling, and tracing. You can view a list of tools at Brendan's Linux Performance page and here is another compilation of Linux performance tools and articles.
Observing and troubleshooting performance of Java applications can be done with a myriad of tools. We'll just mention a few you may not be familiar with:
- lightweight-java-profiler is a sampling and asynchronous Java profiler, minimizing the traditional runtime profiler overhead.
- jHiccup, HdrHistogram, and LatencyUtils from Gil Tene (CTO of Azul) record and visualize application's latency with high accuracy and minimal cost, taking JVM and platform hiccups into account, to not leave behind any unnoticed outliers.
Netflix recently open-sourced Atlas (not to be confused with Atlas by HashiCorp), a backend system for cloud-wide monitoring. A scalable telemetry platform it manages dimensional time-series data for near real-time operational insights and intelligence. Among many other things it allows to see application latencies for each instance type, further helping in proper instance type selection.
Brendan has also mentioned Vector (a work-in-progress at the moment), a "real-time per-second performance-monitoring tool for on-demand profiling". Given an EC2 instance it analyzes its low-level performance, making resources Utilization, Saturation, and Erros (mentioned above) immediately observable.
We can only go so far in a single blog post and have only scratched the surface of what you need to know in order to become knowledgable, confident and efficient in tuning performance of your EC2 instances. Nevertheless, we believe following these best practices will put you way ahead!
Our next blog posts will dive into AWS EBS and AWS S3 performance tuning. Meanwhile, here are some additional EC2 and tuning-related resources:
- AWS re:Invent 2014
- Gill Tene - Understanding Latency, How Not to Measure Latency
- Brendan Gregg - "Systems Performance" from Prentice Hall