My understanding is that the only reliable way of long-term digital archival storage is to refresh the media you are storing things on every few years, copying the previous archives to the fresh storage.
Since storage constantly gets cheaper, 100GB first stored in 2001 can be stored on updated media for a fraction of that original cost in 2024.
Long term archival is successive short/middle term archival.
I think I read this quote on Tim Bray's blog[0], but I am not sure anymore. This is now my approach, my short/middle term archival is designed to be easily transferred to the next short/middle term store on a regular basis. I started with 500GB drives, now I am at 14TB.
My first hard drive was 5Mb, and I had to write my own driver for it (PDP11, c. 1982). It was a hell of a step up from 8" floppies, enough so I partitioned it into 8 separate areas.
Even the floppies were a step up from paper tape - the older guys used to have a cupboard of paper tapes on coathangers, and linked their code by feeding the tapes through the reader in the right order.
Pretty much. You see hobbyists getting data off of 30+ year old hard drives for the novelty of it, but I can’t imagine relying on that as a preservation copy. Optical media rots, magnetic media rots and loses magnetic charge, bearings seize, flash storage loses charge, etc. Entropy wins, sometimes much faster than you’d expect.
Sometimes they fail for other reasons as well, such as improper storage.
Back in the 90's to 00's a friend had a collection of cd's that he'd written, but he stored them in a big sleeved folder container. The container itself caused them to warp slightly, which made them unusable.
I took a few for testing and managed to unbend them after some time, which turned them back into a working state.
[Note: That's the most apostrophes I've ever used in a sentence, it feels dirty]
Yeah I didn't want to use heat in case that did more damage, so I just used some weights I had lying around.
If I remember right I just stacked a few starting on the floor with a protective layer in-between so as to not scratch them (a piece of paper is fine). Then add a 5kg or whatever weight on top. After a few days I turned them over and did the same again.
After that most were flat, only one or two needed some more individual time. I imagine that if after that if they're still not flat them maybe heating them slightly in the oven or even just the sun outside might do the trick.
I have been working in long term storage for many years. I never understood why we cant just 3d-print binary code on thin clay tablets and then burn them for long term storage. Clay tablets are readable for thousands of years.
Stone - and more usefully, clay - last almost forever, but they're impractical for digital storage, since there's no useful way of imprinting on them that isn't very low density, unlike paper.
Unless we could improvise something with old-school dot matrix impact heads to print on clay - I wonder if anyone has tried.
1 line of 80col text per card is pretty awful though, and then you bring back all the horrors of 60s card sequencing but for bigger files.
Some kind of 'barcode' encoding, with heavy error correction, would probably be better. I've seen attempts that claim 500kB per side using a largely unmodified QR code system, but I suspect better could be achieved - the method of scanning is probably going to be the bottleneck anyway.
Just one article discussing it. Do you have a source to back this up? M-DISCs are getting hard to purchase these days, and I have a lot of stuff I want to put on them which I likely will want to look at in 30 years.
Mdisc uses a special very hardened layer thats set it apart from other discs. That is why long term storage works so well.
"Instead, the M-DISC™’s data layer is composed of rock-like materials known to last for centuries. The M-DISC READY™ Drive etches the M-DISC™’s rock-like layer creating a permanent physical data record that is immune to data rot. The stability and longevity of the M-Disc DVD has been proven in rigorous tests conducted according to the ISO/IEC 10995 test standard for determining data lifetime of optical media."
interestingly this is how long term cold tape storage works more or less (in case of taps you have a bit different failure characteristics so it's more like "check read" at least every "some_time" and on checksum errors rewrite to new tape restoring from "raid" duplicates, but conceptual it's kinda the same idea)
I don't think I'd want to trust tape for more than a decade or two though, as we've seen with audio tapes, iron separation from the substrate becomes an issue a lot sooner than we'd like.
Tape also has a problem shared with hard disks - to achieve high density we've rapidly hit a stage where the technology is too complex to enable data archeology at some point in the future; using 90s era complexity hard drives is about where the archeological limit is. LTO-1 may even be beyond that complexity compared to DLT, QIC or even Data8 (helical scan may be too much of a spanner in the works)
Modern polymers may make microfiche/microfilm a longer term solution than it has been in the past with acetate film/slides, but I'm not sure how much research has been done into which polymers might be best.
For longer than a century, our best experience is, thus far, with clay and paper (assuming good quality acid-free paper, rather than cheap modern consumer paper).
As someone who runs an archive for a 90s radio show, I have to contratict partly. I regulary get tapes that are 30-35 years old. The quality of those audio tapes is shockingly good. The old chrome audio tapes very rarely show signs of degradation compared with audio CDs from the same time period.
Mid to late 90s we also got the metall audio tapes that are even better. I got a few DAT tapes from the early 90s that are in same league as metall audio tapes.
Microfiche development for its own sake has mostly disappeared.
But we have a lot of great new substrates used in displays. A well produced acetate sheet may still be the best for the range of properties including aging, ink adhesion, etc. I no longer remember if microfiche is made by printing or by photo-development.
I have multiple 5tb external disks attached to my main tower that (among other things) serves up Plex content. I switch each one out every year, for equivalent of about a hundred dollars each. I try and find a compromise between the amount of read requests and availability for these disks, but in the end, if they're read often enough, they die soon enough.
What killed the last one was an experiment with installing Emby. Like many similar systems, it bewilderingly has no rate-limiting function, and will thrash a disk to within an inch of its life in order to index it. And that was the most recent thing that killed one of my external Plex drives, with multiple series and movies on it.
So yes, just keep refreshing the media, at reasonable intervals.
PS Yes, I know this is a poor method of content storage. NAS is looming up for me one of these days.
If it does't have to be offline for long durations, software raid + adding a new drive every once in a while, and discarding failing drives is pretty foolproof.
AFAIK large data centers automate something like this.
The issue with (software) raid is you have no idea if what you're copying isn't actually corrupted. If the filesystem isn't checksumed there's no guarantee.
Since storage constantly gets cheaper, 100GB first stored in 2001 can be stored on updated media for a fraction of that original cost in 2024.