Migrating to CockroachDB

brianpgordon · on Feb 7, 2020

> if you want password-authentication, you must configure your cluster to communicate using certificates. You can start each node in "insecure" mode, but then can't use password authentication. There's a issue about this. Like the people posting there, our network is already secured, so this requirement is just an unnecessary nuisance.

Eh, I'm not sure how I feel about this. Obviously you should secure your network as best you can, but how confident are you really that a bad actor will never find a foothold anywhere in your network? I think I would advocate for your services to all communicate securely (TLS et al), even internally, and if your database supports mutual client/server auth like it sounds like Cockroach does then you should use that as well. Particularly if you're depending on at-rest encryption handled transparently by the DBMS to protect your users' data - that won't do you much good if someone can just sniff/MITM network traffic and wait until a bunch of your data has been queried.

pc86 · on Feb 7, 2020

I roll my eyes so hard at people who say "well our network is secured, so it can insecure on the inside" I'm surprised I don't have a repetitive stress injury by now. It shows a shocking lack of foresight and is honestly insanely unprofessional. This is the exact same kind of opinions that lead to just about every major data breach in recent memory.

Secure everything, and assume that everything is compromised. Anything less is downright negligent.

wwarner · on Feb 7, 2020

With all due respect, this is like saying the more locks you have, the more secure your building is. I mean, yes you have to lock some doors to have security, but on the other hand putting a lock on the bathroom door, the light switch and the flush handle isn't increasing security, and could be reducing security. I feel somewhat the opposite -- there should be a very good reason to add security overhead, and without a compelling story the default should be open.

This is why i've been so happy reading about wireguard in the kernel recently. This makes it possible to have a very secure, low hassle default that is completely independent of applications.

bradstewart · on Feb 7, 2020

> putting a lock on the bathroom door, the light switch and the flush handle isn't increasing security

It certainly is if the thing you're securing is the flush handle itself.

Like all security questions, understanding exactly what you need/want to secure and the threat models surrounding it is extremely important.

yjftsjthsd-h · on Feb 7, 2020

We put valuables in a safe even if the door locks.

wwarner · on Feb 7, 2020

The key work is "valuables". You probably wouldn't put your pants in a safe (though I acknowledge that depends on your roommates). I'm saying that going through the motions doesn't make something secure. How many passwords are in slack channels or github repos? Creating a password barrier, but neglecting a secure mechanism for rotating them and distributing them doesn't buy any security.

pm90 · on Feb 7, 2020

The problem with this analogy is that its not a good one. We're humans so we tend to try and explain things in terms of concepts we understand but often latch on to incorrect ones.

The problem with locking every object in your home isn't that its an inherently bad thing, its that it introduces too much overhead to be worth the hassle. Now, if there were some touch ID enabled force-field that unlocked things only for you, within milliseconds and was highly reliable to work, that might change the dynamics a lot.

And that is what happens with crytographic security. There is very little overhead that modern systems impose on securing your systems so there is no reason for you to not enable them (setup may be hard, I acknowledge its a problem but its a one time cost to pay).

Taking the analogy further; if you had to repeat the authentication flow for every digital object it would still introduce a lot of hassle. So digital cryptography gets around this by instead requiring short lived tokens so the cost of enabling cryptography is amortized across all your digital assets. In our analogy; it would be that every object in our home is locked, but we use faceID to get a short lived key that is cached for e.g. 10 minutes. All locked objects check for the existence of that short lived key.

brianpgordon · on Feb 7, 2020

If you take protecting your users seriously then any of their personal info counts as a "valuable" so yes, you should secure everything. And it strains credulity that you might be running any kind of actual business where it would be appropriate not to take protecting your users' data seriously. Even if you're doing something as mundane as hosting cat pictures, you have email addresses and password hashes and analytics data.

bionsystem · on Feb 7, 2020

See, I do infrastructure work in companies that often don't even expose their services. They just want a database and a frontend for a very small internal app that will never get out. And yet, a lot of the tools are designed "for the web" and, rightfully so, enforce https, ssl, certs, you name it.

Now the issue is that they are a real PITA to implement in some of those networks, where you have to use tons of private registries and repositories, and to get everything working with certificates. Because those companies are not used to issue certs, often times it take them months to do so, as they have a very rigorous process, so everyone wonders why this tiny app never goes to "production" and why we are blocked by such restrictions.

Now I'm not even accounting for the fact that you sometimes actually need to go online to install some of those stuff. Recent example was node-sass, dependancy from a composer package, which you would expect to just install from the composer registry ; but no, it has a setup script that goes to nodejs.org or something, which of course you cannot fake because it will have to check certs as well. So again, something designed for the web that'll never work seemlessly elsewhere. (There are workaround of course but I'd love to spend my time elsewhere).

To give you an idea, the cloud I'm working on right now is pretty much entirely offline, for both egress and ingress. There is a nexus mirror-proxy to dockerhub and that's it, even this is in the LAN. So really, having to configure cockroachdb certs (or anything else for that matter, like etcd...) is useless, time consuming, frustrating, and counterproductive.

dkersten · on Feb 7, 2020

> very small internal app

Is cockroachdb the right DB for this use case, though? A more traditional Postgres/MySQL or evening sqlite (depending on needs) is often plenty good for a small internal app without requiring this extra security

> the cloud I’m working on right now is pretty much entirely offline

I fee like “cloud” has lost all meaning it may have once had

bionsystem · on Feb 7, 2020

> Is cockroachdb the right DB for this use case, though? A more traditional Postgres/MySQL or evening sqlite (depending on needs) is often plenty good for a small internal app without requiring this extra security

It's not, but the app I described is just an example. Some use cases involve bigger dataset and resilience.

It's not cockroachdb specifically but any of those "web" tools that, used internally in a closed environment, become a PITA to manage. Language tooling, repositories, dbs... Etcd for example is used as an internal storage for tenants (for example to store DNS information) and the team that manages it cannot upgrade right now because they have to figure out first how to implement ssl.

> I fee like “cloud” has lost all meaning it may have once had

Well, how would you call an on-prem install of OpenStack ? It's their internal cloud. Cloud has not much to do with the web or with the fact that it is publicly exposed, it's a tool to manage infrastructure. And to be perfectly honest, "cloud" isn't a proper term to begin with. Years ago when the word came out one of my professor said they basically renamed "grid" to "cluster" then "cloud". They used to administer hundreds+ of machines and multiples of that of jobs using pssh. They didn't wait for kube, and they didn't need ssl between each and every API and services.

Now don't get me wrong, the public cloud provides a great service and tooling coming out now should be web ready and web safe. Just, please, pretty please, give us a --insecure and --offline even if it means having it all over my ansible code, so we can get the job done without having to spend my days working around those tools.

irfansharif · on Feb 7, 2020

> Just, please, pretty please, give us a --insecure

Have you taken a look at [0]? (All caveats around running an insecure cluster apply)

[0]: https://www.cockroachlabs.com/docs/stable/deploy-cockroachdb...

dkersten · on Feb 7, 2020

That sounds reasonable, thanks for the detailed response.

> And to be perfectly honest, "cloud" isn't a proper term to begin with.

Completely agree, it’s just a marketing buzzword.

brianmcc · on Feb 7, 2020

the built in resilience via N nodes of something like CRDB can be useful even with super-small-scale data/traffic

imtringued · on Feb 7, 2020

You don't need a publicly signed certificate if the services are only used internally. It's not like you want people to look at your database with a browser anyway. All you need to is create a private CA and then two private keys/certificates (one for server, one for client) sign it with the CA and add the server certificate to the database, add the client certificate to your web server that accesses the database and finally add the self signed CA to both database and webserver.

There is no cryptographic security benefit of publicly vs privately signed certificates. After all publicly signed certificates are just privately signed certificates whose CA has been added to your CA store. So just add your own CA to whatever program is needed (do not add your own CA to the whole system). The annoyance is keeping those certificates and CAs up to date. Tools like Hashicorp Vault can automatically generate certificates for you if you want to automate this part.

jupp0r · on Feb 7, 2020

I came here to mention this, too. It's been shown from numerous leaks that defense in depth is a much more effective approach than simply relying on network security.

You mention that query response data can be sniffed by an attacker in case networks are compromised. The much worse scenario is that the passwords themselves can be intercepted, giving attackers full access.

fulafel · on Feb 7, 2020

Relying on network security is usually a bad idea even if you don't have defense in depth. (Ie it's less bad to rely on some other one thing)

latch · on Feb 7, 2020

All of our internal communication already is secured via wireguard and communication can only flow through the wireguard interface. Does adding cockroach's own encryption on top of this add anything?

fulafel · on Feb 7, 2020

What happens when someone compromises your wireguard overlay network?

The cert business is not just about encryption, it's also about authentication over an untrusted network.

_lmm8 · on Feb 7, 2020

Not much if it’s set up in a standard way because each node will only communicate directly with the node it’s expecting too. The communications are secured with the other node’s public key.

fulafel · on Feb 7, 2020

So if you have 10 boxes interconnected in a full mesh, and one of those boxes gets compromised, then it's all fine even if you have been relying on the wireguard vpn being secure?

thenewnewguy · on Feb 7, 2020

I am unfamiliar with how setups like these usually work, but it could an active attacker MITM your network?

vsareto · on Feb 7, 2020

You wouldn't be able to decrypt the wireguard packets unless you had each party's private keys

manigandham · on Feb 7, 2020

CockroachDB annoyingly requires manual certificate creation and distribution to each node, and if you don't set those up then you can't use any authorization at all. It's strange and user-hostile to disable auth completely just because you don't use certs.

Also some environments like Kubernetes might already have network encryption (provided by a service mesh) and don't need another layer.

Certificates serve both encryption and authentication, but encryption can be done with self-signed certs automatically generated by the nodes upon joining the cluster and authentication can be handled in the joining process by using a token or some other bootstrap value that's easier to manage.

knz42 · on Feb 7, 2020

> CockroachDB annoyingly requires manual certificate creation and distribution to each node, and if you don't set those up then you can't use any authorization at all. It's strange and user-hostile to disable auth completely just because you don't use certs.

This comment is not any more true as of CockroachDB 20.1 (Upcoming in Q2 2020).

This version will enable you to set up your own transport-level security with clients using password authentication to crdb.

tyingq · on Feb 7, 2020

One part where it makes sense to me is that CRDB often talks about deploying their product in K8S as a good idea.

In that scenario, it's very easy to end up in a situation where CRDB communications are encrypted twice, with separate certificate management systems. Not just the client->db connections, but also the node->node ones. Especially with things like Istio becoming more popular.

fulafel · on Feb 7, 2020

Yep, and besides these concerns, there's the difficulty in modeling the trust relationships and dependencies in this kind of "secure internal network" world view. The relationships are complex and you have low likelihood of your mental model matching the reality, because the relationships are not explicitly configured.

kccqzy · on Feb 7, 2020

I actually have a similar concern. In a previous job I tried to push for internal services communicating over TLS with a custom certificate authority. It was ultimately not adopted as too much complexity, including operationally when something goes wrong the developer should always be able to issue unencrypted requests from an internal network. It seemed to boil down to removing as many hoops as possible when something terrible happens and automated tools fail and manual recovery is needed.

jrockway · on Feb 7, 2020

I am a huge fan of mTLS but it is quite difficult to set up and I'm sort of taking a "wait and see" approach, especially with Google pushing ALTS, which sounds pretty sane. My dream is that you can ignore "VPCs" (i.e. trusted networking, firewalls, etc.) and let applications decide whether or not to allow traffic. "You have presented an incorrect TLS certificate, no response for you."

The problem I have is that I'm not sure what method you want to use to inject certificates. The current systems seem to be "something will figure it out for you at runtime, and your application will be none the wiser". (This is things like Istio and Linkerd.) I am not sure I'm convinced this adds much security. Anyone that can talk to the apiserver that injects certificates can probably convince the system to give you a certificate for your rogue container. And, I'm not sure that applications have enough visibility into the networking layer to really add any trust. I'd like for each incoming request to have some provenance information attached with dimensions like "this container was built from trusted source code and each commit that it contains was code reviewed", "this container was launched from a manifest written and applied by a trusted engineer from a trusted machine", and of course "this request was authorized by a human user whose data is being accessed". Right now, I feel like you have to hand-wave to get the first two. Sure, maybe Istio thinks the container is legit, perhaps because it asked a container notary (very early stages of support), and so now it can open a TLS connection to anything in the cluster. And, adding a request-scoped JWT is easy enough to provide the "here is the human user that made the request". I just don't think you can hide this from the application with a service mesh; every application needs to be aware of what's going on so it can say no to suspicious requests. And for that, I don't think we're really there yet. We had this at Google and it worked well, but I haven't found anything that works well in the outside world yet.

With that in mind, I totally understand why people trust the "well, it's all behind a firewall" model. It's not very good, but it is easy. I hope that in the next year or two we can get to where I want to be, but I'm also not sure that anyone cares. "Good is the enemy of great" and a private network seems to fit people's mental model pretty well. The only thing that automatic mTLS gets you right now is protection against "SSL added and removed here ;-)", and it's only the NSA that's really doing that. If you aren't a target of the NSA, your data is being exfiltrated by employees with access, or by simple application level bugs (XSRF, SQL injection).

This is all very rambly and I'm not sure where I'm going with this... but I wish that someone made application-level transport security work and it was trivial to set up.

brianpgordon · on Feb 7, 2020

> Anyone that can talk to the apiserver that injects certificates can probably convince the system to give you a certificate for your rogue container.

This is pretty infrastructure-heavy but I think my ideal solution would be to have a trusted orchestrator service on every machine which has its own certificate and accepts human-signed build artifacts to be run in a new container on that machine. It could verify the signature of each uploaded artifact and send CSRs for them to a hardened signing server, which returns the certs to be mounted (or sent over a standard initialization API) into the new containers.

Your options for compromising this are:

1. Get a malicious build artifact signed and submit it to a machine for execution. This shouldn't be possible without compromising an actual developer's credentials to sign the artifact.

2. Send your own custom CSR to the signing server to get a signed certificate. This shouldn't be possible without compromising the certificate from one of the orchestrator services to sign the CSR.

I think this should work as long as you can guarantee that services can't break out of their containers and as long as there's some hardware root of trust ensuring that your orchestrator service is genuine and the only thing that can read its CSR-signing certificate.

sandstrom · on Feb 7, 2020

The lack of useable backups in the open-source version is confounding to me. They are effectively limiting adoption of both their free and paid version.

If they would have proper backups in free, more people would start using it, and then discover that they’ll need some of the premium features (data locality, Role-Based Access Control or follower reads).

Also, why is there no public pricing page? How will I know that pricing won’t change every year? Or be arbitrary based on how much they can squeeze us? (less likely with public, listed prices)

Why would a company with low marginal cost (software) limit their growth like this?

stavros · on Feb 7, 2020

This is a problem with Postgres as well (for different reasons). I currently have a server sitting without proper DB backups (only full backups and dumps) because figuring out how Wall-E (or whatever the latest tool is) works is more effort than "here's my bucket and encryption key, please start backing everything up".

candiddevmike · on Feb 7, 2020

WAL archiving is really helpful for (any) point in time restores and backing up large databases. If neither of those apply, pgdump is great!

Also WAL-E has been replaced by WAL-G (written in Go):

https://github.com/wal-g/wal-g

latch · on Feb 7, 2020

FWIW, I've had a lot of success with barman (1). If you set it up using the newer streaming protocol, which doesn't require a separate ssh channel, then it's relatively straightforward to setup.

https://www.pgbarman.org/

irfansharif · on Feb 7, 2020

CockroachCloud has public pricing. https://www.cockroachlabs.com/docs/cockroachcloud/stable/coc...

PudgePacket · on Feb 7, 2020

Still seems quite hidden compared to the usual https://sas.io/pricing

jturpin · on Feb 7, 2020

I'm glad for the honest writeup of Cockroach. I like CockroachDB, and hope they continue it in the future, but the performance and backup capabilities are something to consider. Still, I'd rather deploy it to Kubernetes whenever I can over Postgres to ensure that I can do rolling upgrades and whatever else. At least the Helm chart they provide lets you use Cert Manager with it, so the TLS isn't as much of a pain.

isoos · on Feb 7, 2020

> But going from our single PostgreSQL instance to a 3 node cluster, performance is worse. This is true even for simple queries involving 1 node (say, getting a record by id or running locally with a single node). Still a few extra milliseconds of latency is a fair price for better availability. But the performance characteristics aren't the same as a relational database.

I think this should be true to any distributed database: you can't really beat an optimized single-machine software with a networked app, no matter how hard you try.

My experience with CockroachDB showed that increasing the cluster size from 3 to 6 and more provided more benefit than I've expected. 1 -> 3 seems to be a meaningless metric for such setup with a replication factor of 3.

Rapzid · on Feb 7, 2020

There is no free lunch with databases. I often argue the position that master-master is overrated, because you have to design your application around the constraints to see the benefits and avoid the pitfalls.

Same thing with CockroachDB Postgresql "wire protocol support". The fact that it only supports "serializable" isolation level when Postgresql utilizes Read Committed OTB should be a screaming red flag to just about anyone that you are in for some challenges on the performance front. Some well-worn modeling techniques are simply infeasible under typical workloads at that isolation level, for one.

Starting with bog-standard RDBMS you have to be deliberate to get great performance. Adding in extra, fancy scaling properties means even more considerations to take into account to maintain consistency while achieving gains.

adevx · on Feb 7, 2020

I've tried CockroachDB as well as YugabyteDB. A distributed ACID transaction Database sounds like the holy grail, but it comes with a price. For CockroachDB the performance penalty was high, even though I'm sure I could have squeezed out more performance if I was to design my DDL specifically for CRDB.

There was an inconsistency issue when deleting a ton of (self referencing foreign key) rows in rapid succession. The results of a "select count(*) from table" returned different results than a "select count(id) from table". After some delay the results came back in sync again.

As I was trying to shoehorn an existing PostgreSQL database into CRDB, I had to find solutions for SELECT .. FOR UPDATE, finding alternatives for triggers, issues with the query builder I use (knex.js). In the end it was too much, not even considering lack of backups and the risk of a VC funded company going belly up. I do hope they gain a solid foothold and these NewSQL databases gain more traction.

irfansharif · on Feb 7, 2020

We take our claims if correctness very seriously. The behavior you're describing, depending on exactly what you're doing, is likely an allowed serializable history, which is the isolation level we claim to support. See [0] for a deeper dive on the subject. Either way, you should show us what you found (if you haven't already).

As for your performance analysis, do share your results and methodology. We published an in-depth, reproducible comparison with YugaByte here [1] in addition to publishing our TPCC-100k numbers[2]. If you're seeing performance that doesn't line up with the above, let us know.

[0]: https://www.cockroachlabs.com/blog/consistency-model/

[1]: https://www.cockroachlabs.com/blog/unpacking-competitive-ben...

[2]: https://www.cockroachlabs.com/blog/tpcc-100k/

adevx · on Feb 7, 2020

Thanks, "serializable history" could well have been the root cause. I did discuss this in the forum but was unable to resolve it at that time.

jazoom · on Feb 7, 2020

After your opening sentence I kept waiting for the comparison to Yugabyte but you never mentioned it again. How did it go?

adevx · on Feb 7, 2020

My testing of YugabyteDB was not as extensive as CockroachDB. I had issues importing my PostgreSQL schema + data (I did use ysql_dump) I got timeouts and an occasional 100% cpu crash. I was able to import after trimming the amount of data and dumbing down the schema (dropping foreign key references) CPU usage was much higher than PostgreSQL and I found the overall tooling and setup less polished compared to CockroachDB.

Nican · on Feb 7, 2020

High-availability is always an issue that I am concerned with self-hosting MySQL or PostgreSQL. I am always worry about the fail-over failing, in case any step in the setup was not properly configured. Let me know if someone has some magical formula for easy replication.

I still find CRDB's replication interesting, since all data is replicated 3-way on every commit, and data also auto-rebalances as needed. With some exceptions, as long as you are connected to any node, you should have access to all data.

btmiller · on Feb 7, 2020

Regarding reliability of your failover process: test it regularly and have it monitored so you get alerted at the first sign of trouble.

SteveNuts · on Feb 7, 2020

We used Percona's XtraDB Cluster for HA and have had very few issues.

Just fronting it with haproxy to read/write to a single node at a time (setting others to "backup") has worked well for us, since we're not doing a huge number of queries.

muffelsong · on Feb 7, 2020

I concur, but consider ProxySQL instead of HAProxy. Percona with XtraDB/Galera failover fronted by ProxySQL is very resilient and performant.

Nican · on Feb 7, 2020

Thank you! That is useful information. I will take a closer look at Percona in the future.

candiddevmike · on Feb 7, 2020

Is your app that critical to warrant that kind of architecture? I would wager the majority of apps people interact with use a single node database that routinely goes down for maintenance or something else.

rbjorklin · on Feb 7, 2020

I have an open PR for Postgres replication handled through repmgr that has been working well enough: https://github.com/ANXS/postgresql/pull/424

sergiotapia · on Feb 7, 2020

We use Elixir and Postgres at work, I didn't realize cockroachdb didn't have geospatial features, unfortunately that makes it a non-starter for us. We need it to track pathing from mobile phones (ios and android).

Thank you for writing this up and putting your experiences out there, clear as day. It's very valuable.

johnnyAghands · on Feb 7, 2020

What a world of difference good tests and coverage can make. Great write up.

pjmlp · on Feb 7, 2020

What new class of RDMS?

Cluster scalability has always been a thing with Oracle, SQL Server, Sybase, Informix, DB2.

klodolph · on Feb 7, 2020

Scalability is relative. To some people a 10TB database is large, but some teams I work with regularly see databases in the 10PB range. Going to that kind of scale you tend to see different design tradeoffs.

pjmlp · on Feb 7, 2020

Those kind of dataloads were already typical 15 years ago in the telecommunication domain, for example Nokia NetAct report processing for the whole operator network, based on HP-UX with Oracle OLAP support.

Or the CERN infrastructure to deal with non-stop handling from accelerator data and storage for posterior data analysis.

Sometimes this industry feels like a continuous reboot.

ed_balls · on Feb 7, 2020

I'm not sure how OLAP relates to CockroachDB. The use case for CockroachDB is quite simple. You have millions of applications that runs Postgres. A large chuck of them have to process more and more data which a single master cannot handle and there is no easy way of scaling. Any option you pick does require a lot of changes to the codebase. This is were CockroachDB should come in handy. The dream is to have a drop-in replacement for Postgres that would scale, so all SELECT .. JOIN still works.

pjmlp · on Feb 7, 2020

Because that specific example made use of Oracle distributed cluster scalability, already 15 years ago.

Postgres still has a couple of tricks to learn from big boys databases.

klodolph · on Feb 7, 2020

I was thinking of OLTP databases, with transactions, joins, random writes, and everything operating at scale. Running reports is much easier.

I doubt CERN is dumping all of their experiment data into a single SQL database.

pjmlp · on Feb 8, 2020

Back in 2000 - 2005 it was in cluster of Oracle ones.

Nowadays no idea.

arjunbajaj · on Feb 7, 2020

Does CockroachDB work with Ecto in Elixir? Can it be used as a drop-in replacement for PostgreSQL?

latch · on Feb 7, 2020

The Postgrex driver works with cockroach as-is, and they're open to making tweaks as necessary (I made a small patch to Postgrex to support the way cockroachdb encoded empty arrays and it was quickly accepted).

I don't think Ecto works as-is, you'd have to check.

For simple apps, it might be a drop-in replacement, or require very few tweaks. In addition to the issues mentioned in the OP, the cockroachdb documentation has a detailed list of differences (1).

(1) https://www.cockroachlabs.com/docs/stable/detailed-sql-suppo...

andreimatei1 · on Feb 7, 2020

Would you mind providing a link to the empty arrays patch? Is this something CRDB should fix?

latch · on Feb 7, 2020

It was fixed in CRDB also: https://github.com/cockroachdb/cockroach/issues/42942

arjunbajaj · on Feb 7, 2020

Thanks for the quick response.

rafiss · on Feb 8, 2020

Our goal is to make switching over from PostgreSQL easy. For Ecto in particular, we have a tracking issue that shows the items in our backlog that would improve compatibility: https://github.com/cockroachdb/cockroach/issues/33441

jsilence · on Feb 7, 2020

Anyone have experience with rqlite and how it compares to cockroachDB?

ed_balls · on Feb 7, 2020

Has anyone tried to run CockroachDB with GCP TrueTime?

irfansharif · on Feb 7, 2020

TrueTime is not a publicly available GCP service.

daxfohl · on Feb 7, 2020

Seems like overkill, and just a shiny new toy to play with for this use case. Stackoverflow.com uses SQL. You aren't exceeding stackoverflow. In a couple of years they'll migrate back, or to something else.

The whole point of cdb is to enable things like Google scale. But all I ever see on it are random devs spinning up three nodes and being so proud.

Which is too bad because I'd love to see someone pushing its limits, and betting mission critical services on it that nothing else could handle.

latch · on Feb 7, 2020

Sorry if I wasn't clear, but the goal of the migration wasn't for scale, it was for HA.

I find it difficult to setup PostgreSQL without having a single point of failure and introducing downtime for things such as upgrades.

We aren't doing life changing stuff, true, but a downtime in our system can result in employees (often at the bottom of the economic ladder) not being able to get to work. That gnaws at me. I guess the better alternative is to go for a hosted approach, but that isn't without its own complications and challenges.

aprdm · on Feb 7, 2020

Out of curiosity what do you feel so hard to set up with PostgreSQL?

Having a master with a read replica and then making the read replica the master in case the master goes down seems to be a very well known methodology. You can use a virtual ip with both servers behind it (keepalived) or a 0 ttl DNS solution (consul). Is you case more complex than that? With master to master replication I feel there are couple of gotchas and you need to design your data in a way to avoid conflict as much as you can. A solution like bdr seems to be an off-the-shelf one that is proven.

I haven't really upgraded postgres in a rapid fashion, usually just keep with whatever version comes with the OS package manager forever.. or whatever docker image I start it with. I guess the same approach for HA can be used to upgrade one at a time, am assuming postgres doesn't break backwards compatibility very often.

You can either automate the failover or have a system to SMS you when it happens. I like to do it manually personally since I have seen the automated failover fail more times than the manual procedure.

I have probably managed over 30 postgres instances in my services that have grown up to 1/2 PB of data and I've never had a single issue with it... postgres, redis, rabbitmq, consul are technologies that you RTFM, set it up, and it just works!

I wouldn't trust a technology that isn't established and widely adopted which is how I see cockroachdb a the moment.

AgentME · on Feb 7, 2020

>Having a master with a read replica and then making the read replica the master in case the master goes down seems to be a very well known methodology. You can use a virtual ip with both servers behind it (keepalived) or a 0 ttl DNS solution (consul).

>You can either automate the failover or have a system to SMS you when it happens.

I don't want to be writing code to make things do automatic failover. I'm not a database engineer. I want a database where its standard setup does this stuff for me.

(I would find it really interesting if there was a tightly-integrated "distribution" of Postgresql plus related services that had a setup process as straight-forward as CockroachDB and had automatic read-replicas and failover.)

aprdm · on Feb 7, 2020

You usually don’t have to write stuff it is all pretty much yum install and config a file

manigandham · on Feb 7, 2020

Postgres is fine but that master/replica/failover connection is fragile, requires a whole bunch of 3rd-party tooling to get right, has downtime while the master is switched over, and is hard to test and be confident in.

Compared to all that, CRDB is just a binary on each server pointing to each other. Operational simplicity matters and that's where CRDB is a great alternative.

aprdm · on Feb 7, 2020

Why do you say it is fragile, do you have any experiences to share ? Honest question as I never ran into anything

manigandham · on Feb 8, 2020

As stated, it's not natively integrated. You need 3rd-party tools to ensure replication is setup correctly, to monitor the connection status and replication lag, monitor schema changes or anything else that might need a reset, monitor the master, initiate the failover, recover or reset the state after the failover happens, handle changes for WAL archival and backups, and handle any network changes to point back to the master/replica correctly for clients.

None of that is automatic and every step introduces problems compounded by the amount of tools used to handle each step. Because of this, the setup is intensive and usually custom to every environment. This is hard to test and be confident in and the source of much downtime or data loss.

Meanwhile CRDB handles all of this seamlessly. As long as you have a quorum of healthy nodes, your database is alive and working. Point a load balancer at all of the nodes and you will always have a database to contact with no other overhead. Postgres could get there, but it isn't anywhere there today.

aprdm · on Feb 15, 2020

That's a fair point, thanks for the reply!

latch · on Feb 7, 2020

No, our setup is basic. We use barman, so I guess repmgr by the same company (2ndQuadrant) would have been the route to go. I could have figured it out, set it up, and been done with it. But, without having done it before, I didn't feel very confident. There seems to be a lot of moving parts, a lot of options, it's changing quite a bit (12.0 introduced additional changes), and I'm still not sure how to do upgrades of major releases without downtime.

We still use Postgresql, and I agree, it's setup and forget. But those aren't HA (and don't need to be).

flowcont · on Feb 7, 2020

Citus released this last year: https://github.com/citusdata/pg_auto_failover It looks interesting although I haven't used it, I tried doing it with corosync + pacemaker.

daxfohl · on Feb 11, 2020

Nah, you're good. I didn't notice that HA was the goal. Saw yet another migrate-to-alt-db post, which is usually about scale, when said scale could easily be handled by SQL, and made the assumption and an unnecessarily snarky comment. That was my fault.

briandilley · on Feb 7, 2020

> I find it difficult to setup PostgreSQL without having a single point of failure and introducing downtime for things such as upgrades.

Amazon does this with RDS, and so can you.

This really isn't very tough to achieve with replicas. And for upgrades you just add a new replica with the latest version and promote it to master and replace the remaining nodes one at a time.

skj · on Feb 7, 2020

Man I love technical advice that has the word "just" in it. It's so simple!

shawabawa3 · on Feb 7, 2020

RDS absolutely has downtime for upgrades, and even for master switchover

irfansharif · on Feb 7, 2020

As someone working for crdb, I disagree with you re: "the whole point is that it enables Google scale". There's a wide gulf up to that point that cockroachdb is still imminently useful for, especially considering the alternatives.

koolba · on Feb 7, 2020

Does StackOverflow use a single SQL Server databases for all writes for the entire platform?

They must at least have read replicas given the traffic for the site.

taspeotis · on Feb 7, 2020

https://stackexchange.com/performance

no_wizard · on Feb 7, 2020

This is a surprisingly small amount of infrastructure considering its size

thrower123 · on Feb 7, 2020

It just highlights how overcomplicated most setups are, really.

I feel like you need more hardware than that for a hello world kubernetes tutorial...

janee · on Feb 7, 2020

Some more info if interested https://news.ycombinator.com/item?id=15427124

Bit outdated but probably still holds true.

It's pretty amazing what SQL Server can do...just wish it wasn't so expensive

kccqzy · on Feb 7, 2020

If devs don't play with a database in an unimportant project to gain some experience, how would they be confident in "betting mission critical services on it" then?

Keverw · on Feb 7, 2020

I could see the benefits though. If you get popular, you can just throw more hardware at it instead of feeling rushed to switch databases or start sharding in the application level.

One downsides I didn't like though is no incremental backups unless enterprise. There's PG dump but not sure if it'd work on a bunch of threads and stuff. If using MySQL there's the MyDumper project that looked interesting.

Then there's no full text search is a downside, but I know it seems people recommend running a separate search solution anyways so that can be a little slow to get started. Easier if you could just run it in one system in the beginning, and who knows maybe search isn't a bottleneck anyways depending on the implementation of the database or you want to be able to still full text search while debugging maybe.

Then no geospatial can be a downside for some types of businesses or use cases, say a place lookup, factoring it into a game matching algo, etc. I've been really interested in MongoDB lately though, but was playing with a few newer SQL based databases for what to use on my own project. Then I know some companies actually run multiple databases too, some stuff might be noSQL while other workloads are SQL databases too, but I know some rather just one a single database so only one source of truth. Then I know PostgreSQL has the idea of foreign data wrappers [0], so I guess then you could treat a MongoDB collection as a table. So maybe you had a users collection, you could join on it I think using this but never played with it. However this seems mostly Postgre itself, other Postgre compatible databases don't implement this as far as I can tell.

However I believe even MongoDB with it sharding can't handle GEO unless you separate that data in a unsharded collection, which is then placed on any replica set in the cluster. So maybe you could split geo results up by city level in different collections but then if you wanted to find near by things that could go into other cities based on the radius, you'd have to write your own logic. Unsure how larger sites like say Yelp handles GEO lookups.

Only big downside I see with MongoDB though is no joins, so say you had a social networking site and wanted to hide posts by users suspended from the results, you'd have to write your own logic to do so. I think it'd be so cool if you could like symlink a value from one document to another. I think OrientDB has something like that though. I was asking a friend who uses Mongo for advice, apparently in that case you'd just update every single post when you suspend that user - not sure if that's really efficient or not but I doubt a single user would have millions of posts... So probably less than 10K posts at the most maybe. However I guess with social feeds themselves, seems it recommend to fan out on write instead of fanning out on read, so in a way you can sometimes precompute things to help the database. All about making decisions and tradeoffs I guess.

[0] https://wiki.postgresql.org/wiki/Foreign_data_wrappers

notyourday · on Feb 7, 2020

> I could see the benefits though. If you get popular

You won't get that popular. And if you do, replacing database layer would be a party with cases of champagne.

Keverw · on Feb 7, 2020

Yep, I heard people throw parties after successful migrations, but it sounds like once you pick a database you are sorta married to it and rewriting things could be a pain if a large or complex application. Then live migrating could be a challenge also unless you are ok with some downtime. So I feel like if it's possible to get it right from the beginning by picking the right tech for scaling, it's worth it then.

notyourday · on Feb 7, 2020

That's premature optimization.

burpsnard · on Feb 7, 2020

sounds like a 70's IBM COBOL VSAM batch job jcl script

manigandham · on Feb 7, 2020

CRDB is also a relational database, and often used for highly-available and durable deployments rather than purely for scaling.

seibelj · on Feb 7, 2020

I’m sorry but the name of the database needs to change. It’s like if I named my company “Disgusting Bug Company” then expected everyone to ignore the fact I named my company after a gross creature. Call it “RockSolidDB” and you get the same point across without invoking gross bugs.

seisvelas · on Feb 7, 2020

I used to agree, but the name actually seems to have an ironic marketing benefit: Everyone ascribes Cockroach's success to its technical merit. I've never heard Cockroach criticized as another dumb web dev trend caused by marketing and fads.

m0zg · on Feb 7, 2020

I propose that they further strengthen this effect by dropping the word "roach" from their name, and rebrand as just "Cock DB". /s

ComputerGuru · on Feb 7, 2020

I don’t know anyone except the non-engineer non-programmer management types that gives a damn what anything is called. I respect the CockroachDB team for sticking to their guns on this.

Anyway, just call it CrDB, it’s an accepted alias. And everyone needs to stop complaining about the name each and every single time.

barjavel88 · on Feb 7, 2020

Actually I think the name is about the myth that a cockroach would survive a nuclear blast, as a metaphor for the resiliency of your cockroach database.

kranner · on Feb 7, 2020

It is not a myth though that they have been around for 320 million years; that is resilience.

nishantvyas · on Feb 7, 2020

What makes it funny is they can survive everything but can’t manage to turn them upside down if they get flipped... kind or ironic....

kranner · on Feb 7, 2020

But they're resilient as a group! Unless you could flip them all at once.

nestorD · on Feb 7, 2020

I believe their choice of name is actually very clever as it sticks into your mind.

If you need to list the names of the databases you know, especially if you are not an expert in the domain, and you have heard of it, it will probably be on your list just because it has a memorable name (regardless of its merits).

maximilianroos · on Feb 7, 2020

OTOH they monopolize the market of people who do like cockroaches

evil-olive · on Feb 7, 2020

This comes up in every single thread about CockroachDB. It's beyond a dead horse at this point.

Keverw · on Feb 7, 2020

It's because cockroaches can be hard to get rid of sometimes if infested. So supposed to be named that since the goal is the database to be reliably and available with replicas that can survive failures.

Skunkleton · on Feb 7, 2020

I get your point, but I personally really like the name. It really sticks with me.

briandilley · on Feb 7, 2020

does the name... BUG you?

It's a nod at resiliency, clever IMO.

taywrobel · on Feb 7, 2020

Obligatory link to BikeShedDB - https://github.com/tbg/bikesheddb

Problem solved!