Amazon strongly implies that your data is stored triple redundantly. You'll have...

saidajigumi · on Nov 6, 2012

One external online provider really only counts as "one copy", ever. This is primarily because you cannot audit the ongoing storage architecture and processes of any given provider. You're looking for SPOFs, not how many disks may hold data replicas. One software error (or site/account hack) can wipe out all of your data. Or an entirely out-of-band error occurs: the provider goes belly-up.

Cloud storage is awesome in many ways. Yet it doesn't replace your backup strategy, it merely complements it.

rsync · on Nov 7, 2012

We regularly run an ad campaign on reddit discussing that very notion:

http://www.reddit.com/comments/hg9oa/your_platform_is_on_aws...

... that a single provider is really just a single "copy".

It is also the reason that we build 's3cmd' into our environment and so many customers use it:

ssh user@rsync.net s3cmd put abc.txt s3://account/abc.txt

luser001 · on Nov 6, 2012

Very good points. I actually agree with you on most points. Which why I started my startup.

ciniglio · on Nov 6, 2012

Since you're obviously experienced in this area, can you point me to a tool or article that describes this 'scrubbing'?

Is this something that people should be using on their old files/backups?

luser001 · on Nov 6, 2012

'scrubbing' is just a fancy way of saying that the files are read from media, checksums recomputed, and compared against stored checksums. If checksums are different and data was stored redundantly, then recovery is carried out, and correct data is written back to media.

I'm too lazy and don't really do it with my multiple DVD, CDs and HDD backup directories. But ideally, I should be doing it. My startup will make this sort of thing easy and automatic.

jl6 · on Nov 6, 2012

Search for zfec. Allows you to split a file into N chunks, only M of which are required to reconstitute the original. Protects against N-M independent errors/corruptions.

cheeseprocedure · on Nov 6, 2012

> (100 bytes per TB are expected to go bad every year)

That's unsettling. Source?

luser001 · on Nov 7, 2012

Sorry, I think I should retract that statement which I seem to have recalled mistakenly. The error rate seems to be quite a bit lower than that, so I will post an article here after I research it thoroughly.

luser001 · on Nov 7, 2012

Sorry, it's 10 bytes, not 100.

1 TB = 2^40 bytes

Amazon claims 99.999999999% durability. https://aws.amazon.com/glacier/faqs/

2^40 * (1 - 99.999999999/100) = 10.99 bytes

microtonal · on Nov 7, 2012

Isn't it more bytes in practice? That's 88 bits, but they could be spread over different bytes, right?

In other words: shouldn't you calculate the loss over the total number of bits?

luser001 · on Nov 7, 2012

Sorry, my comment above wasn't well thought out. Amazon's durability guarantee is on a per-object basis, not on a per-byte basis. I will post an article here after I research it thoroughly.