"...a Byzantine quorum system with 20 nodes would be more decentralized than Bitcoin or Ethereum with significantly fewer resource costs. Of course, the design of a quorum protocol that provides open participation, while fairly selecting 20 nodes to sequence transactions, is non-trivial."
From my experience (and what this looks like) is that there isn't a ton of novel IP to open source here. A lot of the times ML "platforms" are composed of a number of open source components glued together in an elegant way (which I'm not implying is easy, because it's very difficult).
From looking at this from the outside the only things in this platform that may be open-sourceable would be the job scheduling and visualizations - and there are already variants of open source tooling which could be repurposed for those tasks (or may even be powering those components).
The main purpose of this post seems to be Uber's way of standardizing their workflows + a little extra glue (which they're calling their platform). It still provides a lot of value. Also, Uber does have a few cool open source projects: https://uber.github.io/ (but could admittedly have more).
You have a point. Describing their workflow has value to it.
With that said, the title of the post is not "Meet our workflow", It's "Meet X: bla bla". In the world of software, one expects to actually see a product named X that bla blas.
> You have a point. Describing their workflow has value to it.
This is a really valuable thing to acknowledge. There is some sharing of company philosophies, but seldom do I see companies fully "open sourcing" their workflows and strategies. Perhaps because the people at the top see that as the real value their company brings - that knowledge. Nevertheless, it's extremely valuable and I wish I could see more things like that. Basecamp's book "Getting Real" is close to what that might look like, I think.
I don't think you would find an incredible amount of use from an open-sourced Michelangelo. The biggest advantage that Michelangelo has for Uber is that it is easy to integrate into all of Uber's other tools.
Depending on what your machine learning needs are, you could get pretty far with just Spark + MLLib, and wouldn't need any of the customization that Michelangelo has on top.
This is the sense that I got. I am a one-person data science team for my startup, and I basically cobbled together most of the automation described in Michaelangelo over the course of a few months. Spinning off Spark ML jobs on EMR and saving metadata to a database.
Why make all that noise with a detailed blog post then? If it's a custom-fit internal tool, then good for you, the rest of the world doesn't care. Each company has internal tools and stuff.
There is the sharing of ideas. Maybe they couldn't open source it, but were given permission to publish about it. Google never opensourced some of their greatest contributions, just the ideas behind them.
I think blog posts like these are an interesting way to show off what goes on in a large company like Uber.
If you're a tiny startup, then Spark + MLLib is more than enough. Even that would be overkill if your data fits on a single machine.
But if you're at a young, but quickly-growing company with:
- terabytes of data
- tens of thousands of features extracted from the data
- dozens or hundreds of unique machine learning models being tweaked over time
then hopefully a blog post like this is helpful. It shows off various effective patterns for solving machine learning patterns at scale. Presumably, you'll want to build your own internal system with its own set of hooks, but the best practices and lessons learned should be roughly the same.
The fact that their example includes using cljsjs package for externs is troubling. For the meanwhile, to avoid worrying about extern files, I'd stick to using a separate build process for npm packages.
Yeah, but without externs this breaks on advanced compilation, right? So, while using webpack to bundle npm deps is a good idea (that might also get unnecessary in the future with things like node module resolution landing in cljs compiler), you still need externs for the functions you're going to call from your cljs code. Luckily the latest cljs compiler will help you get those right.
That's exactly what the Stellar Consensus Protocol (SCP) is: https://www.stellar.org/papers/stellar-consensus-protocol.pd...
(Full disclosure: I work at Stellar)