One thing Hetzner doesn’t have yet (but it SHOULD!) is Object Storage, and it’s something I’m working on over at NimbusWS[0].
Another awesome thing about Hetzner is that bandwidth internally is free and automatically negotiated (i.e if you send traffic to a Hetzner IP it will flow internally).
One of the hardest things is being able to tell the bandwidth usage and when it’s inside/outside Hetzner.
I mount my storage boxes as CIFS volumes, but random access latency is high, and I can't do multiple heavy IO operations concurrently. It's best suited for use as a backup drive.
Is there a faster way to mount storage boxes? I suppose SSHFS would be even slower.
Oh yeah I'm aware of the storage boxes! They're actually fantastic, and what I plan on building on for Nimbus -- it's not quite as easy as just running MinIO (in fact I won't be using MinIO), but the efficiency is great.
Traditional object storage scales automatically and is very easy to integrate with (most people write apps to interface with S3 these days, and less SFTP), so there's just that small edge! That "sort of" is what I want to get rid of.
Other smaller clouds have services that are a better fit (OVH has object storage, Scaleway, etc), but Hetzner doesn't yet.
I'd be super interested in hearing from someone who has set it up whether it's something an experienced but generalized sys admin could build, or if you need an expert in the matter.
Yup, Ceph is going to be the solution to power it. There are quite a few options in the space but I was basically down to Swift (powers OVH) and Ceph (powers Digital Ocean Spaces) and OpenIO (power's OVH's new experimental offering).
Ceph has much more public resources available (SwiftStack recently got acquired, seemed like they had most of the knowledge in the space), and I've actually set it up a few times now thanks to the excellent Rook[0].
It's definitely a lot easier to set up with Kubernetes (the tradeoff being you need to understand Kubernetes), but it's definitely manageable for a generalized sys admin (albeit one with a bit more experience). I've written about the process:
I am in the process of migrating off of Rook Ceph after using it in production for two years. Setting it up is easy thanks to Rook, but wait until Ceph gets under load, then the real fun begins. If you only need object storage, I suggest looking into SeaweedFS[0]. It's a far more lightweight and performant solution.
Thanks for the suggestion -- I'm definitely aware of SeaweedFS and it was actually a really strong contender but I didn't choose it (and didn't mention it) for a couple reasons:
- Some sharp corner cases are definitely out there (issues/bug reports)
- Supported APIs aren't quite as extensive as the other options yet
- The requirements/expectations for
There's also some previous discussion from 2020[0] that was interesting. I actually planned to use SeaweedFS and dip my toes with what I'm calling the "CloseCache" feature -- on-demand nearby proxies for the data that's really in your object storage. The idea was to take advantage of seaweed's excellent proxying features and kick the tires at the same time.
Somewhat off topic but I'd love to pick your brain, would you mind if I sent you an email?
How would you define object storage vs. block storage? How do I know which one I need? This question is coming from a web dev who mostly uses databases on the backend so I'm unclear what constitutes an "object" in this case.
(Yes, I googled it, but I'm looking for a more practical example)
This is pretty spot on, so there's not much to add but a some somewhat disorganized thoughts:
Object storage and block storage are similar -- the difference is usually how they're accessed but isn't necessarily (ex. s3 via FUSE projects can be thought of as block storage). Some examples of the blurring of this line:
Hard drives are in the business of "block storage" -- you offer them a block of bytes (if you wanted to you could tell the hard drive exactly where to write the bytes, rather than using the file-based interfaces that are common), they put it on some sort of media (rotational, solid state, whatever), end of story.
Applications often only need a higher abstraction on storage -- not bits and bytes, but instead files or "objects". Most applications don't need the ability to access one or more bytes (in... a block) of an area on disk, they often only want access to entire files (roughly a 1:1 mapping with objects, with how 99% of the people use object storage).
Getting back to the question, block storage is usually interacted with via a completely different set of technologies -- iSCSI[0][1], NVMeOF[2], etc. The idea here is that you're talking to a hard drive on the other end, and it make sense to speak the language (or a language) of hard drives. Object storage is normally interacted with via HTTP. The expected/tolerated latencies using these different technologies are different orders of magnitude.
To rehash, object storage is similar, but seeks to up the level of abstraction and give you access to an consistent but opaque access to files. How is the object storage storing your files? You don't know and you don't care (probably) -- what you care about is how fast your HTTP request returns (or doesn't). This interface here is similar enough to reading and writing files locally but more importantly enables multiple applications to share the same storage without sharing the same hard drive. You can also write certain slices to files as well (this requires some more complicated locking and what not just like it would do locally).
I want to also note that there's a conceptual step between "traditional" block storage and "new age" object storage that others might skip over, that's distributed storage systems like NFS. It's NFS's job to present a file system that when changed, prompts a file system on a completely different machine to change in the same fashion. It's easy to imagine a simple way to make functionality work but of course reality is more complicated. Object Storage arises from the realization that you don't actually have to interact with a "fake"/shimmed version of the local filesystem that appears local but is actually remote -- you can just send a HTTP request over the internet to a machine asking for the file you want when you want it.
Here's a fun thing to think about -- is a database an implementation of object storage? You normally don't ship bytes to databases, you ship records (which you happen to decide the structure of) -- records are closer to files conceptually than they are to a "block" of bytes, even though at the end of the day the digital storage we're referring to is going to be bytes on a storage medium somewhere.
If you really want to get a good instinct for this, dedicate some time to skimming/reading through the Ceph documentation and you'll see how they layer the object storage (and other things) on top of a lower level "block" management[3]. The picture on the first page should be quite instructive.
Scaleway and Hetzner are cool. What I'd love to see is a lower-cost (if slightly higher-friction?) version of AWS as it was early on. Some sort of analogues for:
- S3
- VPC
- Compute instances (types don't have to bee too fine grained)
- SQS, SNS
- Some sort of Dynamo and/or RDS functionality.
- Some basic API coordinators, I guess Teraform has providers for lots of stuff these days.
Lambda- and fargate-like things would be a plus, but not strictly necessary.
I feel like 90% of the projects I've ever built could be easily made and scaled with nothing else. Further features hit diminishing returns really fast, and serve to muddy the waters up around what tooling is best, or even what exists.
Noted! So general compute is something I want to add to Nimbus but given how thorny it can be, I'm thinking of going for splitting into very specific use cases:
- Run a container (you give me a container, app.json or a standardized format, and I run it)
- Give me a bundle of files (static hosting), or specify a bucket
- Give me a single function and I'll run it, optionally hooking up a managed domain name through us to it)
Notes on the other things:
- VPC this is actually a bit more complicated, but possible
- SQS/SNS => NATS/Kafka is this enough?
- Dynamo/RDS => Managed Postgres +/- extensions like Citus and Timescale for scale
- Some basic API coordinators => Probably will not do but writing a TF provider is probably far in the future
Paradoxically Lambda and Fargate-like functionality is like one of the easiest to deploy and manage (standing on giants like OpenFaaS/Fission/etc), and would strike me as the easiest for people to actually get started with if I could roll it up with managed DNS.
Yes, but if you factor in the traffic costs, it still comes out 10x cheaper than AWS &co. Plus, the parent was not asking about the cost vs bare metal.
Everyone has (relatively) poor support if you are a budget customer. Scaleway support replied to all my tickets and was professional. Not a 1 hour response but can’t complain for a €5/mo. Been with them for ca 4 years since the ARM days until recently.
Another awesome thing about Hetzner is that bandwidth internally is free and automatically negotiated (i.e if you send traffic to a Hetzner IP it will flow internally).
One of the hardest things is being able to tell the bandwidth usage and when it’s inside/outside Hetzner.
[0]: https://nimbusws.com