Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(SpiderOak / Nimbus.io cofounder here)

In addition to supporting the founders personal ethics about software freedom, we feel an open source backend is important for just the sake of confidence.

Some people will want to purchase the minimum of 10 machines and host a Nimbus.io storage cluster themselves (and we are also making our hardware specs open source.) Other cloud storage providers may even do this. We hope a few people will consider the hosted option, paying Nimbus.io $0.06 per GB.

In any case, all of these are a win for us. We're already spending money every day to maintain a reliable storage backend for our encrypted Backup & Sync business at SpiderOak.com. Nimbus.io is an evolution from that. Community involvement here is most welcome. :)

Aside from that, it's just a design we are excited to share. Every other distributed storage system I could find uses replication instead of parity. A system based on parity sacrifices latency but can deliver higher throughput on individual requests (at about 1/3 the cost.) There are use cases even outside of archival storage where this is attractive.



I don't see how a parity based implementation can work in a meaningful way across multiple datacenters. You certainly couldn't rebuild if you lost an entire datacenter due to disaster. Replication is the only way here.

So any comparison to S3 in that regard is meaningless - Nimbus can't achieve that level of durability, correct?

Additionally, if you're just doing parity across multiple chassis in a single datacenter and lost a couple racks do to a power outage it would seem the network would likely shit the bed trying to rebuild, potentially bringing the whole system down. Have you guys worked through nastier failure cases that architectures like S3 can avoid?


Excellent points.

Geographic redundancy with parity compliments the network topology we find in many cities: a metro area fiber ring connecting many data centers with low cost site-to-site (not internet) bandwidth. It's even lower cost to just buy excess capacity with lower QOS.

Every archival storage provider I've talked to has a write-heavy workload. Write traffic maybe more than 3x read traffic. So for example in this situation replicating between two sites requires a site-to-site connection equal to the size of the incoming data. Since site-to-site connections are full-duplex, in the parity system the bandwidth for reads and writes is provided at a similar price to what would be spent on replication bandwidth for writes.

That said, the first iterations of Nimbus.io won't provide geo redundancy beyond the geo-redundancy that creating an offsite backup inherently provides. We expect to add on geo redundancy storage as an upgrade option at a slightly higher price (still way under S3.)

Replying to your second point: If transient conditions like only a couple racks lost power, the system wouldn't trigger an automatic rebuild right away. It would continue to service requests with parity and hinted-handoff until the machines come back online. In any case, when the system decides a full rebuild is needed, the rebuild rate is balanced with servicing new requests (similar to how a RAID controller can give tunable priority to rebuild vs. traffic.)


I don't see how a parity based implementation can work in a meaningful way across multiple datacenters. You certainly couldn't rebuild if you lost an entire datacenter due to disaster.

Sure you can. Given a system that can tolerate loss of N shares, you need to ensure that no datacenter holds more than N shares. In practice, this means you need many smaller datacenters, not two or three; whether that is economically feasible depends on the provider.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: