All I have to say is "Wow". I just switched our boot drives in us-west-2 over, and our initialization time was cut in half (~7 minutes to ~3 minutes).
As someone who launches a lot of instances based on user demand, I'm very excited EC2 has finally addressed the glaring speed issues with EBS volumes. This brings boot times in line with the original GCE boot times, which were stellar.
Of course, time will tell if the new General Purpose volumes can hold up as more users come onto the system, but for now I'm impressed.
Is there any information about the expected durability of the new EBS SSD volumes compared to the old Magnetic ones?
FAQ is still showing the same "Magnetic-era" numbers:
"
most recent Amazon EBS Snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.
"
Really neat announcement. This opens up a bunch of interesting possibilities for applications that are cpu-bound once you're on SSD, but for which the delta between SSD-backed EBS and ephemeral storage doesn't change the bottleneck.
In particular, in that sort of scenario, you might be able to mix-and-match and get a compute-optimized instance with loads of cpu power and a healthy amount of sufficiently performant storage on EBS.
Interesting quote from his post "Our testing indicates that a typical Linux boot requires about 7,000 I/O operations and a typical Windows boot requires about 70,000."
I had assumed Linux was lighter-weight from casually observing the difference in boot times, but that's a 10x difference!
I think that's a bit of a weird comparison: a "typical" Linux boot on a server usually doesn't involve bringing up a GUI, while on Windows you're always bringing the GUI up (right? I haven't touched Windows in years; maybe new Windows Server versions have a true headless option?). If you were to bring up X11 with a full-featured desktop environment on a Linux server, I imagine the I/O operation count would go up quite a bit.
Windows Server has supported "Core" editions of its products back to 2008 where the UI is simply a login screen and command prompt for remotely-managed/headless operation. In my experience most Windows admins still prefer the GUI. And while it does obviously save some resources not to load the full UI, the cost of the additional cores/memory to run it is minimal by comparison.
My actual point (which I didn't make as strongly as I could because post too long already), was that the initial allocation of IOPS was more than sufficient to handle multiple boots of either Linux or Windows.
"Switching from a Magnetic volume to a General Purpose (SSD) volume of the same size reduces the typical boot time for Windows 2008 R2 by approximately 50%."
Previously a 1TB, 4K PIOPS volume was $525/mo. With a 35% discount on PIOPS this now runs $385/mo, but you can also get 1TB, 3K PIOPS General Purpose volume for $100/mo. Pretty nice price drop!
Generally, since 1 GB of GP SSD costs as much as 1 PIOP, in most cases you should just purchase a max(DESIRED_VOLSIZE, DESIRED_PIOPS/3)GB GP SSD volume rather than a PIOPS volume. I think.
The 3,000 IOPS figure for General Purpose SSDs comes with two caveats - it is available only in bursts for up to 30 minutes and it comes out of a capped reserve which is slowly refilled over time based on the size of the drive. In other words, it is not Provisioned IOPS.
If you wanted it to be true PIOPS, the cost of a 1TB 3K PIOPS SSD is $425 (1,000GB * $0.125 for storage plus 3,000 PIOPS * $0.1 for operations). (EDIT: this figures are wrong; please see responses below for the right math).
EDIT: My understanding was incomplete - General Purpose SSDs can burst up to 3k IOPS, but they also provide provisioned IOPS at a rate of 3 IOPS per GB. Effectively, 1TB drive then does provide 3k PIOPS (3 * 1,000GB) and bursting limits are only a factor for smaller drives.
Thank you for correcting me. I disagreed after reading the same blog post, but after your explanation I realised I completely missed the baseline performance guarantee. Your math was spot on!
But note that the baseline performance for the new General Purpose SSD is listed as the size (in GB) times 3 IOPS (so 3K/3072 IOPS for your 1000GB/1TB example). So large volumes will have a baseline that basically matches the listed "burst" IOPS speed, and at less than 1/2 the cost of the PIOPS version.
It looks like 3 IOPS per GB guaranteed, even when throttled, and much higher when bursting. That's still 3000 IOPS (3 X 1000) worst case for a 1TB volume.
I think amazon has a pricing error on their website for the light utilization instances since the April price drop as they are more expensive than even the on demand ones (at least the ones I looked at).
I submitted a support ticket but haven't heard anything back. Once you start looking at medium reservation the prices fall significantly.
For example for the 3.75GB m3.medium is $51.1 on demand per month vs ~56 for the light reservation vs ~35 for the medium reservation. All running 24/7 and 1 year terms on reservations.
EDIT: Just got a reply on my support ticket. I guess the light reservation instances are just for reserving capacity now.
"You are correct about the Light Utilization Reserved Instances. I have brought this to the attention of the EC2 pricing team to see if this is an error on our part. Unfortunately, these Reserved Instance types now only cover Capacity Reservation and no longer offer a discounted hourly rate over the On Demand hourly prices."
The previous generation had a small instance (m1.small). There is no comparable offering in the current generation (at least, not yet). The smallest m3 instance is the m3.medium. You can still provision an m1.small, though, right? You just miss out on the hardware upgrade.
GCE, Digital Ocean, etc may be a better fit than AWS for these kinds of instances.
For the best performance you'd want dedicated hardware with a battery-backed RAID array. SSDs can be used for tablespaces with the most frequently accessed data.
Note that the AWS IOPS numbers are for 16KB reads; with a max 4k IOPS disk you'll be getting a max throughput of ~62.5MB. You could toss a few together with software RAID 0 to scale up, but you're limited to a 4x improvement based on the largest instance types listed below:
Look at the i2 instances with ephemeral storage. I use the 2xlarge one, great performance. Can write/read at 500 megabytes per second and the random seek performance is pretty good as well.
EBS is still pretty poor for IOPS, even using SSDs. (It's network attached storage, after all). RDS also uses EBS instead of native DB replication, so expect a huge slowdown from native.
If you wanted real IO speed, run two postgres instances (in ec2) - in sync replication - using local SSD storage. I'd guess order(s) of magnitude more iops.
It still doesn't come close to read hardware / single level virtualisation.
3-4K IOP/s is actually pretty poor, for example I have two USB keys that can do 39K IOP/s (given thats at 4K, not 16k) and my laptop has a SSD that does 95K.
Not really a fair comparison. If we're comparing to USB keys, you should at least compare to ephemeral SSDs rather than EBS SSDs. Unlike EBS, your USB disk isn't made for 99.999% durability, doesn't allow snapshotting, etc.
Does anyone else experience poor disk IO performance on AWS? I ran 'dd if=/dev/zero of=/tmp/file' and the speed is between 1-10 MB/s. I would expect hundreds of MB/s if they advertise SSD.
Isn't your server just virtual? Your instance could be running on a physical host running a hundred others, all accessing disk at the same time. A lot of times these cloud providers don't really target super fast individual responses anyway, just decent responses across millions of requests.
You can't do online migration, but migration via a snapshot should be quick and easy. Snapshots are incremental, so the best way to get low downtime is to create a snapshot, then when that one completes, unmount the old volume and snapshot again. The second snapshot should be really quick, especially if your application isn't write-heavy.
No. DO doesn't provide a remote block storage solution, only local disks where your data is much more likely to be lost if the machine fails and also much slower to back up and awkward to manage. Local SSD storage has been around for years and hardly represents innovation. The only thing DO did was slice local SSD drives into tiny slivers for customers which means you're sharing a drive with many other customers on the same box, some of which will be furiously scrubbing their disks when they destroy the droplet.
As someone who launches a lot of instances based on user demand, I'm very excited EC2 has finally addressed the glaring speed issues with EBS volumes. This brings boot times in line with the original GCE boot times, which were stellar.
Of course, time will tell if the new General Purpose volumes can hold up as more users come onto the system, but for now I'm impressed.