Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It sounds like Uber is using MySQL as just a data bucket with primary keys

They have a couple posts about "Schemaless", but I still don't understand why they used MySQL as the data store instead of something like Cassandra. ( https://eng.uber.com/schemaless-part-one/ ) From that post it looks like they basically built a no-sql database on top of a relational database.

The only reason given was operational trust ( "If we get paged at 3 am when the datastore is not answering queries and takes down the business, would we have the operational knowledge to quickly fix it?" ). The project took nearly a year to roll out, and in that time the operation knowledge could surely be trained, hired, or contracted.



Operating Cassandra at the scale that Uber is going to require is going to be painful and as operationally draining as MySQL if not more.

There are really not a large number of options here anymore with the departure of FoundationDB from the market. CockroachDB might be an option in a few years, though I'm still confused why they are moving towards a SQL-ish vs key-value interface...


"departure of FoundationDB from the market"

Pissed me off so much. Only thing close to Google's F0 RDBMS on the market, at a reasonable rate, and the beginning of a good offer to enterprises. Then, "poof!" It's a good example of why I tell companies to not put anything critical into something from a startup. If they do, better have a synchronized, backup option tested and ready to go.

"why they are moving towards a SQL-ish vs key-value interface..."

That's easy: most databases and buyers use SQL. Key-value is preferred by startups & non-critical, side projects in big companies you see here a lot but aren't representative of most of the market. Need first-rate, SQL support. I think EnterpriseDB shows that it's also a good idea to clone a market leader's features onto alternative database.


> Only thing close to Google's F0 RDBMS

Did you mean F1 (instead of F0)?


Yeah, yeah. I keep getting the 0 and 1 mixed up. Thank you.


What dbs would you suggest in their scale ? that are easier operationally than cassandra ?


I was at MesosCon and ended up talking to some Uber people. They are currently using Cassandra in prod. I can't speak as to why they use MySQL the way they do though.


I gave a talk at MesosCon about how we (are starting to) run Cassandra across multiple datacenters at Uber (https://www.youtube.com/watch?v=U2jFLx8NNro, https://schd.ws/hosted_files/mesosconna2016/60/mesoscon-uber...).


Thanks for sharing this. I was wondering - the limitations they have listed could be easily overcome with cassandra


So arguably, they are using mysql as a storage engine rather than as a database.

They don't explicitly answer the question "Why didn't you use InnoDB/WiredTiger/etc. for your dataplane?", but you get the idea that they were very happy with the specific characteristics of MySQL for their use case and so they built on top of it. It also sounds like they had some deadlines (specifically, the death of their datastore) that they had to meet :).


I had that same thought, that the time spent rolling their own system could be better spent just learning some existing good-enough thing.

A great way to get familiar with something is to be the folks who write it. It's also much more fun to design and implement something new than to just learn some other fella's software. I'm guilty of this myself.

But I've started to remind myself that "somebody else has had this problem" and there's probably a good enough solution out there already.

Put another way, is what you are trying to do really so novel? In the case of Uber's infrastructure, you would have to talk for awhile to convince me that they really really need something not-off-the-shelf.


IMHO some possible reasons of not using Cassandra could be the following.

You can't use Cassandra if you need atomic increments (yes, they're included but painfully slow due to several trips required to satisfy PAXOS).

Also there are no transaction rollbacks (atomic batches always go one way - forward).

You may hit GC pauses if the JVM is not tuned properly.

If the use case involves its of deletes then tombstone related issues need to be considered.


I wouldn't have trusted Cassandra back then either. 0.9, 1.0 or maybe 1.2 was reaching sufficient maturity to actually be recommended. Modern Cassandra has come leaps and bounds, with the 2.x series finally becoming stable this year and just recently 3.0.x finally getting blessed by the community as stable enough for production. And ScyllaDB hot on their heels.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: