This is another installment in CloudNative's series of "Cloud Best Practices". Previous posts covered AWS IAM Best Practices, AWS EC2 Performance Tuning and AWS EBS Best Practices and Performance Tuning.
This time we're going to talk about one of the most essential (and earliest!) AWS services - Simple Storage Service or S3. How does one go about optimizing performance of one's S3 buckets? Is it infinitely scalable out of the box?
What do we learn from this session?
There are several areas to focus on when tuning S3 performance:
- Choosing a region
- Choosing an object's key
- Optimizing PUT operations
- Optimizing GET operations
- Optimizing LIST operations
So let's dive in!
Choosing a region
When choosing a region for your S3 bucket you need to consider bucket's proximity to its future clients. When I say clients I don't necessarily mean human users, these may be EC2 instances or other AWS services uploading and download bucket objects. Obviously, the closer they are to your S3 buckets the better. Co-locating all relevant AWS services and S3 buckets in the same region helps reducing the costs and latencies involved. Simply put, keep the data as close as possible to where it's consumed or processed.
Choosing an object's key
You may find it surprising to know objects naming is critical when you're aiming for higher than 100 TPS (Transactions Per Seconds) processing rate of your S3 bucket.
If you're familiar with hashtable data structures in Java then you know an object key's
hashCode() function is of paramount importance as it's responsible for spreading the objects evenly among all hashtable buckets. Bad
hashCode() may store most objects in the same bucket, reducing hashtable's access time complexity from
O(n), effectively making it a linked list.
S3 buckets are somewhat similar. Each bucket has multiple partitions to store objects keys which helps spreading transaction load. So keys partition keyname space in a bucket and define the partition they're stored in. The most significant part is the first 3-4 characters of a key (this number may increase together with the amount of objects stored in a bucket). So the best practice of coming up with good S3 keys is to randomize as much as possible their prefixes so they're better distributed across a bucket's partitions.
What makes a good or a bad S3 key then? Plain GUIDs, timestamps and running counters would be terrible candidates due to their repeating prefixes. But it all changes if you revert them. Key that starts with reversed timestamp, reversed epoch time or reversed counter would be a good one.
<bucket>/42-37-17-15-18-01.jpg <bucket>/18-20-14-15-18-01.jpg <bucket>/28-30-18-15-16-01.jpg <bucket>/03-30-18-15-16-01.jpg
Or one that start with some part of object's hexadecimal MD5 hash. In short, anything making key's prefix random:
<bucket>/52af3-01-18-15-17-37-42.jpg <bucket>/1f4c6-01-18-15-14-20-18.jpg <bucket>/4bc10-01-16-15-18-30-28.jpg <bucket>/f6e23-01-16-15-18-30-03.jpg
Now, when objects keys start with a random prefix - what happens when you list them later? S3 LIST operation returns object keys sorted alphabetically so they'll appear as ordered completely randomly. If there's a logical grouping of bucket's keys you may simply use group's name in a key:
<bucket>/A/52af3-01-18-15-17-37-42.jpg <bucket>/B/1f4c6-01-18-15-14-20-18.jpg <bucket>/B/4bc10-01-16-15-18-30-28.jpg <bucket>/A/f6e23-01-16-15-18-30-03.jpg
This way the ordered list returned will be grouped by the prefixes
Following these best practices you can achieve S3 performance equal to thousands of TPS. So once you've have a good S3 object key, what can you do about improving the performance of writing data to a bucket?
Optimizing PUT operations
PUT operations can and should be optimized with multipart uploads where on the sending side the original file is split into multiple parts, uploaded in parallel and on the receiving side the file is composed back to a single object. This brings several performance-related advantages:
- Operation's bottleneck becomes your upload link bandwidth, making it run significantly faster. Single-threaded upload operation is hardly utilizing your bandwidth to its full potential.
- The upload operation has much better resiliency to networking errors. Failing to upload a single chunk results in re-uploading of this chunk only, not the entire file.
- The upload operation of each chunk can start, pause, and resume at will. Or even begin before the entire file is ready and its size is known which works nicely with streaming data.
It's important to pick the right size for your parts, though. Sending too many small parts or too few large parts will be less efficient, so benchmarking your use case will give your an idea of the optimal chunk size. A good starting point would be 25-50Mb chunks for regular and 10Mb for mobile networks.
Multipart uploads are supported by all AWS SDKs for Java, .NET, Ruby, PHP and REST API so take advantage of that!
Ok, once objects are written to S3 what are possible read optimizations?
Optimizing GET operations
The most obvious optimization when reading objects from S3 is using AWS CloudFront. Being a CDN service it plays the role of a local cache for your users and brings S3 objects to 50+ edge locations around the world, providing low latencies and high transfer rates for read operations due to proximity of the data. Using CloudFront reduces the number of GET requests hitting the original bucket, reducing in turn the cost and the load incurred.
Also, both S3 and CloudFront support partial requests for an object (Range GETs), when it's downloaded in parallel smaller units, similar to multipart uploads. Most download managers do exactly that to minimize the time it takes to fetch a file.
An alternative optimization for efficient S3 reads would be utilizing S3's BitTorrent support. This way all downloading clients share the content among themselves rather than requesting an S3 bucket for the same object over and over again. Similarly to CloudFront, this reduces the S3 load and the costs involved as significantly less data is transferred from an S3 bucket out to the Internet by the end of the day. One important thing to note, though - BitTorrent only works for S3 objects smaller than 5Gb.
As you see, either CloudFront or BitTorrent is a "must have" technique if you're about to publicly distribute S3 objects. Giving out a bare S3 download link is not the way to go for potentially popular content, as it can only go so far when the load increases.
Now, after we've optimized PUT and GET operation, what's left? A LIST operation reading the list of bucket objects.
Optimizing LIST operations
One thing to know about LIST operation is that it's expensive and heavy. Consequently, frequent LIST requests are not recommended. To overcome this you're advised to use a secondary index of keys. AWS DynamoDB, RDS, recently announced Aurora, CloudSearch or any other data storage can host the information about your S3 objects and their keys. This will allow to reduce the amount of expensive LIST requests with an additional advantage of being able to send specialized queries to a secondary index, like fetching keys by a timestamp or any additional metadata attached to them.
To know more about S3 you can watch another excellent AWS re:Invent 2014 session "Amazon S3 Deep Dive and Best Practices" (slides) where Tim Hunt shows all the latest advances in S3 events notifications, gives a brilliant demo of the new AWS Lambda service and goes into the gory details of S3 buckets versioning and lifecycle policies.
And keep following our "Cloud Best Practices" series, we always have more content and summaries to share with you!