Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LTO-9 tapes can be easily found on Amazon in many countries, made by IBM, HP, Quantum or Fuji.

The vendor does not matter, whichever happens to be cheaper at the moment is fine.

For the tape drives, the internal drives can be cheaper by around 10%, but I prefer the tabletop drives, because they are less prone to accumulate dust, especially if you switch them on only when doing a backup or a retrieval. The tape drives have usually very noisy fans, because they are expected to be used in isolated server rooms.

I believe that the cheapest tape drives from a reputable manufacturer are those from Quantum. I have been using a Quantum LTO-7 tape drive for about 7 or 8 years and I have been content with it. Looking now at the prices, it should be possible to find a tabletop LTO-9 drive for no more than $5000. Unfortunately, the prices for tape drives have been increasing. When I have bought an LTO-7 tabletop drive many years ago it was only slightly more than $3000.

The tapes are much cheaper and much more reliable than hard disks, but because of the very expensive tape drive you need to store a few hundred TB to begin to save money over hard disks. You should normally make at least two copies of any tape that is intended for long-term archiving (to be stored in different places), which will shorten the time until reaching the threshold of breaking even with HDDs.

Even if there are applications that simulate the existence of a file system on a tape, which can be used even by a naive user to just copy files on a tape, like copying files between disks, they are quite slow and inefficient in comparison to just using raw tape commands with the traditional UNIX utility "mt".

It is possible to write some very simple scripts that use "mt" and which allow the appending of a number of files to a tape or the reading of a number of consecutive files from a tape, starting from the nth file since the beginning of a tape. So if you are using only raw "mt" commands, you can identify the archived files only by their ordinal number since the beginning of the tape.

This is enough for me, because I prepare the files for backup by copying them in some directory, making an index of that directory, then compressing it and encrypting it. I send to the tape only encrypted and compressed archive files, so I disable the internal compression of the tape drive, which would be useless.

I store the information about the content of the archives stored on tapes (which includes all relevant file metadata for each file contained in the compressed archives, including file name, path name, file length, modification time, a hash of the file content) in a database. Whenever I need archived data, I search the database, to determine that it can be found, for instance in tape 63, file 102. Then I can insert the corresponding cartridge in the drive and I give the command to retrieve file 102.

I consider much better the utility "mt" of FreeBSD than that of Linux. The Linux magnetic drive utilities have seen little maintenance for many years.

Because of that, when I make backups or retrievals they go to a server that runs FreeBSD, on which the SAS HBA card is installed. When a tabletop drive is used, the SAS HBA card must have external SAS connectors, to allow the use of an appropriate cable. I actually reboot that server into FreeBSD for doing backups or retrievals, which is easy because I boot it from Ethernet with PXE, so I can select remotely what OS to be booted. One could also use a FreeBSD VM on a Linux server, with pass-through of the SAS HBA card, but I have not tried to do this.

My servers are connected with 10 Gb/s Ethernet links, which does not differ much from the SAS speed, so they do not slow much the backup/retrieval speed. I transfer the archive files with rsync over ssh. On slow computers and internal networks one can use rsync without ssh. I give the commands for the tape drive from the computer that is backed up, as one line commands executed remotely by ssh.

The archive that is transferred is stored in a RAMdisk before being written on the tape, to ensure that the tape is written at the maximum speed. I write to the tape archive files that have usually a size of up to about 60 GB (I split any files bigger than that; e.g. there are BluRay movies of up to 100 GB). The server has a memory of 128 GB, so I can configure on it a RAMDdisk of up to 80 GB without problems. This method can be used even with a slow 1 Gb/s or 2.5 Gb/s network, but then uploading a file through Ethernet would take much more time than writing or reading the tape.

There is one weird feature of the raw "mt" commands, which is poorly documented, so it took me some time to discover it, during which I have wasted some tape space.

When you append files to a partially written tape, you first give a command to go to the end of the written part of the tape. However, you must not start writing, because the head is not positioned correctly. You must go 2 file marks backwards, then 1 file mark forwards. Only then is the head positioned correctly and you can write the next archived file. Otherwise there would be 1 empty file intercalated at each point where you have finished appending a number of files and then you have rewound the tape and then you have appended again other files at the end.



A lot very interesting details in your reply - thanks. I have this question:

If you aren’t budget constrained today and had to set it all up again. What would you do?

While I’m a Linux guy, I’ll happily run BSDs when appropriate, like for pfSense, and if it really has better mt tools or driver for LTO-9 drives due to the culture/contributors being more old school, then I’d just grab a 1U server to dedicate for it run a BSD and attach the drive to that.

You seem to have extensive practical hands on experience and while I was doing tapes 20 years ago this will be first time I’m hands on again with it since then. So I need to research most reliable drive vendors and state of kernel drivers and tools, just as you are alluding to.

Pretend you have $50K if needed (doubt it). 2PB existing data, 1PB/year targeted rate, probably 10-20%/year acceleration on that rate. with a data center rack location, 20Gb/s interconnect via bonded 10Gb NICs to storage servers (45drives storinators) and then an office center cabinet/rack/desk (your choice) and will put a tape drive holding at least 8 tapes in data center, planning for worst case of 100TB a month and data center visits to swap in new tapes shouldn’t be too frequent. Any details on what you would do would be interesting.


Like I have said, it is not necessary to dedicate a full-time FreeBSD server for this, you can use either a Linux server that is rebooted temporarily in FreeBSD or a FreeBSD virtual machine on the Linux server.

Around $5000 to $6000 should be enough for a LTO-9 tabletop tape drive plus a suitable SAS HBA card and SAS cable. The card must have matching SAS connectors and SAS speed with the tape drive.

More money will not bring anything extra until a much higher amount is reached, which would be enough to buy a tape autoloader/library, which would eliminate the necessity for a human to insert and remove the cartridges into the tape drive when needed. I am not sure if $50K is enough for a tape autoloader.

Tape autoloaders/libraries are worthwhile only for very big organizations where the amount of data that is continuously written or read to or from the tapes is very large. For a small business or for an individual a tape autoloader is certainly not worthwhile, because the tape drive will be in use at most a small fraction of every day.

1 PB/year is less than 3 TB/day. This can be written on a single tape in a little more than 2 hours. Even with a simple non-pipelined implementation of the file uploading with the writing on the tape, the backup can be done in less than 4 hours. Even writing 2 copies can be done in less than 8 hours. The backup can be done mostly or completely overnight.

For a much bigger amount of data one could buy several tape drives, before starting to think about an autoloader. Also it is possible to pipeline the network transfers with the tape writing, for a backup speed higher by around 50%.

If money would not be a problem and if the data needs to be archived for a long term, so that multiple copies are desirable, I would buy 2 tape drives, to be able to write 2 copies simultaneously.

This would also halve the time for archiving the initial 2 PB of existing data, which will take several months, so a speed-up would be desirable. Having 2 drives will also increase the reliability, as the system will continue to work if one becomes defective.

With only 3 TB written per day, a LTO-9 tape, which has a capacity of 18 TB, will be enough for 6 days.

So unless a backup must be restored, the operator would need to change the tape only once per week.

This is a moderate amount of data, easy to handle with a single drive, even if two are preferable for redundancy and for higher speed.

I do not understand your reference to a "a tape drive holding at least 8 tapes in data center". If you mean an autoloader, from what you describe it does not seem that the very big expense for an autoloader would be justified.

The LTO tapes are best stored in suitcases that can contain 20 cartridges, i.e. when using LTO-9 that is 360 TB. Therefore 3 suitcases store more than 1 PB, i.e. a year of data according to your example. The suitcases should be stored in a secure safe or cabinet. They are usually made to be stackable.

I have assumed that your 1 PB is of already compressed data. If the data is compressible than the requirements for the usage time of the drives and for the storage volume would be much smaller.

I have forgotten to mention that after I compress and encrypt the archived files, I add redundancy with a Reed-Solomon code, e.g. with the par2 program. If I choose e.g. a redundancy of 5%, then a file retrieved from the magnetic tape could have defects of up to 5% of its size, while the original data could still be extracted from it.


Excellent help. To clarify a few items: - yes I mean drives with autoloader. for example: https://www.backupworks.com/qualstar-Q24-LTO-9-SAS-Library.a... it’s basically a hard requirement as we don’t have staff time to enter data centers frequently. we are a bit unusual in being certainly not big, but not really a small business either when looking at budgets available. unless there is something wrong with qualstar product linked above perhaps autoloaders are cheaper than you believed?

- understood your rebooting trick. however being full automated (apart from blank tape rotations) is a requirement also. it’s a production infrastructure. if FreeBSD provides significant value it seems safer to spec a dedicated 1U server to use for backups. there is a management node currently that might work though that has to run Linux as it currently does and I need to check if the SAS on it can be used. It has an bunch of SAS ssd drives currently and I would have assumed there is a way to cable up the qualstar drive … but again I’m still early in researching. and the SAS compatibility issue you raise is perfect example of stuff I need to figure out.

- love par2cmdline and our burner with mdisc for IP backup uses that on git repo files and then seqbox as an outer container for data to guard against potential fs metadata corruption issues. there was a newer low level tool (rust rewrite I think) with many bitrot protection features that I can’t recall it’s name currently and isn’t immediately coming up in my notes, but I know it exists and have been meaning to look into it. it has a newer erasure encoding like raptorq and also block metadata like seqbox, I think can replace the par2 seqbox combo we are currently using on MDISC physical backup for IP. I don’t trust a 100% cloud as one can imagine somehow getting all accounts hacked and deleted.

- yes on compressed. the 2PB is already highly highly compressed. so it means 18TB/tape.

Do you have any vendor/distributors you can recommend? I always recommend 45drives to people and I was planning to ask them about LTO when we order next storinator which is coming up soon also.

There is this interesting blog post from a couple of years ago that probably was the seed of my plan to embark on LTO. Our monthly backblaze invoice is totally out of control. But we need a full backup of our data as it’s simply not replaceable and at the heart of the business.

https://blog.benjojo.co.uk/post/lto-tape-backups-for-linux-n...

they talked me into it. not out of it given our specific situation.


That is indeed a cheap autoloader.

If you would use the full configuration with 2 tape drives, the cost of the system might be around $15k, which is very reasonable for a tape library with autoloader.

I think that this autoloader is a good choice, especially if the price includes "1 x IBM LTO-9 SAS Tape Drive Installed".

As I have said, I believe that it is better to choose the option of also including the second tape drive.

For the tapes, there is no reason to worry about specific distributors. I have always bought them from Amazon, but shops that are specialized in storage products should be OK, unless they charge a premium price over what can be found at Amazon or Newegg. While the tapes are made by Fuji or Sony, they are usually easier to find and at at lower prices as IBM, HP or Quantum branded tapes.

The prices vary, so whichever vendor is cheaper when you buy a batch of tapes should be fine. An LTO-9 cartridge should be only slightly over $100. In time the prices of LTO-9 cartridges should drop. For now they are more expensive than the older cartridges, because they are still relatively new.

I store the tapes in Turtle cases:

https://turtlecase.com/collections/lto

You must check the tape drive requirements for the SAS HBA PCIe card that must be installed in the server, which must have compatible connectors, and you must buy an appropriate SAS cable. I believe that the LTO-9 drives require the newer 12 Gb/s SAS standard and also the newer variant of the external SAS connectors (perhaps SAS HD SFF-8644 connectors).

If you already have a 12 Gb/s SAS HBA that has only internal connectors for SSDs, it is possible to reuse it by buying a SAS internal to external adapter of the appropriate connector types, which must occupy one of the empty expansion slots of the server case and which plugs into the internal connectors, while providing external connectors. Such adapters can also be used with server motherboards that have on-board SAS controllers. If you have a SAS HBA card that has external connectors, but different from those on the tape drive, e.g. SAS SFF-8088, there are cables with mixed SAS connectors that can connect the tape drives. The HBA cards usually have at least 2 external SAS connectors, suitable for 2 tape drives.

With the autoloader, it should be easy to make the backup or retrieval process completely automatic, so that an operator should not have to visit the tape autoloader more often than at a few months interval, except for the initial phase when you would have to write 2 PB on almost 120 tapes (or a double number for improved redundancy, beyond the redundancy added per each archive file; 2 copies can be stored in 2 different geographic locations, to avoid the catastrophic loss of all tapes), so you would want to keep the tape autoloader in an easily accessible place for that time.

The initial cost for writing 2 copies of 2 PB of data, i.e. 4 PB of data, would be not much less than $30k for the tapes. This, together with the autoloader with 2 tape drives, HBA card, cases, cables and maybe adapters, would be in the range of $45k to $50k, so within your estimated budget.

As I have said, it is convenient to have a database with the metadata (including content hashes, made e.g. with BLAKE2b-512 or with BLAKE3-256) of all the files that have ever been archived, which shall be used whenever information must be retrieved and which can also be used for deduplication (for which the content hashes are handy), to check whether a file is already present in some earlier archive, so there is no need for its backup.


I want to add that when you start testing the tape drives, one of the first things that you need to do is to measure the exact capacity of an 18 TB LTO-9 tape cartridge.

For instance, I write the tapes with "dd bs=131072 if="$file_name" of=/dev/nsa0". This means that I am using 128 kB blocks. I have measured that a 6 TB LTO-7 tape cartridge has a capacity of 45905860 such 128 kB blocks.

The position of the read/write head, measured in blocks from the beginning of the tape, can be obtained with "mt rdspos". After you choose some block size, e.g. 128 kB, you should forever stick with it in all your write commands and on all your tapes, so that you will always get consistent information about the position of the read/write head.

The tape capacity can be measured by writing files, preferably of the same size that you will typically use for archives (in order to write a similar number of file marks), until you get a write error.

With the capacity of the tape known exactly, after any writing of a new file you get the current position and you compute the remaining free space on the tape, to know whether you can still append data or you must change the tape.

The position in blocks can also be used to verify that the tape drive works OK. For example when after rewinding the tape you go to the end of the written part, to append new files, you must see the same position as after your last write. Or when writing a copy of a tape, you must see the same positions on both tapes for any file.

For retrieving files, the position in blocks does not matter, but only the ordinal number of a file. You position the read/write head to the beginning of a file with "mt rewind; mt fsf $file_number". Then you read the file, possibly in a loop if you want to read multiple consecutive files.

For going to the end of the written part of a tape, to append new files, you must use "mt locate -e; mt bsf 2; mt fsf", as I have mentioned in a previous posting. The explanation of why this is needed is buried in the documentation about how tape marks and head positioning really work.

Whenever I start using the tape drive, I use "mt comp off; mt status" and I check the status output to be as expected.

The tape is ejected with "mt -f /dev/esa0 rewind".


I really appreciate all this information you have given. It’s extremely useful for me and details I understand and can use.

https://www.cdw.com/product/quantum-superloader-3-with-model...

Perhaps I should just get equipment from CDW and do my own research.

What I wish I had was a vendor who knows this stuff well and had pre-tested Linux/FreeBSD configurations.


At the currently advertised reduced price of $7226, the Quantum SuperLoader 3 would be a good choice.

I would buy 2 of them, which together with all the other items and with 4 PB of tapes for the migration of the existing data would not exceed your estimated budget of $50k.

I assume that for this price you might get the 8-slot version. Quantum SuperLoader 3 can be extended to 16 slots, but I assume that for this you must buy an additional 8-cartridge removable active magazine. You should check the price for that.

Because I had good experience with the reliability of my Quantum tabletop tape drive, I would recommend this Quantum autoloader. Moreover, its datasheet includes all the expected information about reliability parameters, so they are tested by the manufacturer.

I consider the included backup software as useless. You should write your own backup scripts. You might need a few days for this, depending on the previous experience and on the support provided by the utilities specific to the file systems that you happen to use, but then you can be worriless for years, unlike when you depend for all your precious data on a black box proprietary program, which cannot be trusted to do the right thing, and which might write data in a format that cannot be recovered with any other tool (without an extensive reverse engineering work).


Regarding Linux' "mt", there are two versions : the horrible, primitive version that comes with cpio and is almost certainly the one that's installed as default : and "mt-st", the actually usable one.


Great post. You might be able to elide the RAM disk in lieu of the "mbuffer" command. My script uses a combination of dd | pv | mbuffer | mt. I omitted the options because I don't remember any of them. I personally use dd of an ext4 filesystem-on-file that is exactly the size of what will fit on tape. This was simply because I couldn't figure out how to reliably advance the tape head or how to continue a write from one tape to another.


Only Sony or Fuji actually make tapes. The rest are rebranded.


True, but the rebranded tapes are frequently cheaper than Fuji or Sony.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: