> It sounds like Uber is using MySQL as just a data bucket with primary keys
They have a couple posts about "Schemaless", but I still don't understand why they used MySQL as the data store instead of something like Cassandra. ( https://eng.uber.com/schemaless-part-one/ ) From that post it looks like they basically built a no-sql database on top of a relational database.
The only reason given was operational trust ( "If we get paged at 3 am when the datastore is not answering queries and takes down the business, would we have the operational knowledge to quickly fix it?" ). The project took nearly a year to roll out, and in that time the operation knowledge could surely be trained, hired, or contracted.
Operating Cassandra at the scale that Uber is going to require is going to be painful and as operationally draining as MySQL if not more.
There are really not a large number of options here anymore with the departure of FoundationDB from the market. CockroachDB might be an option in a few years, though I'm still confused why they are moving towards a SQL-ish vs key-value interface...
Pissed me off so much. Only thing close to Google's F0 RDBMS on the market, at a reasonable rate, and the beginning of a good offer to enterprises. Then, "poof!" It's a good example of why I tell companies to not put anything critical into something from a startup. If they do, better have a synchronized, backup option tested and ready to go.
"why they are moving towards a SQL-ish vs key-value interface..."
That's easy: most databases and buyers use SQL. Key-value is preferred by startups & non-critical, side projects in big companies you see here a lot but aren't representative of most of the market. Need first-rate, SQL support. I think EnterpriseDB shows that it's also a good idea to clone a market leader's features onto alternative database.
I was at MesosCon and ended up talking to some Uber people. They are currently using Cassandra in prod. I can't speak as to why they use MySQL the way they do though.
So arguably, they are using mysql as a storage engine rather than as a database.
They don't explicitly answer the question "Why didn't you use InnoDB/WiredTiger/etc. for your dataplane?", but you get the idea that they were very happy with the specific characteristics of MySQL for their use case and so they built on top of it. It also sounds like they had some deadlines (specifically, the death of their datastore) that they had to meet :).
I had that same thought, that the time spent rolling their own system could be better spent just learning some existing good-enough thing.
A great way to get familiar with something is to be the folks who write it. It's also much more fun to design and implement something new than to just learn some other fella's software. I'm guilty of this myself.
But I've started to remind myself that "somebody else has had this problem" and there's probably a good enough solution out there already.
Put another way, is what you are trying to do really so novel? In the case of Uber's infrastructure, you would have to talk for awhile to convince me that they really really need something not-off-the-shelf.
I wouldn't have trusted Cassandra back then either. 0.9, 1.0 or maybe 1.2 was reaching sufficient maturity to actually be recommended. Modern Cassandra has come leaps and bounds, with the 2.x series finally becoming stable this year and just recently 3.0.x finally getting blessed by the community as stable enough for production. And ScyllaDB hot on their heels.
They have a couple posts about "Schemaless", but I still don't understand why they used MySQL as the data store instead of something like Cassandra. ( https://eng.uber.com/schemaless-part-one/ ) From that post it looks like they basically built a no-sql database on top of a relational database.
The only reason given was operational trust ( "If we get paged at 3 am when the datastore is not answering queries and takes down the business, would we have the operational knowledge to quickly fix it?" ). The project took nearly a year to roll out, and in that time the operation knowledge could surely be trained, hired, or contracted.