Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scale (YC S16) Raises $100M from Accel and Founders Fund at $1B Valuation (bloomberg.com)
125 points by cristinacordova on Aug 5, 2019 | hide | past | favorite | 60 comments


Hey everyone! I'm Alex, CEO/founder of Scale!

I just wanted to chime in that we're a YC company as well (S16), and I'm thankful to the HN community for having been supportive through our whole journey.


It's a great idea, but I can't believe that the market is that large for this kind of data for 2 reasons: 1 - there's certainly a point of diminishing returns; and 2 - having good, clean data that's proprietary is a _huge_ differentiator. If I am the leader in autonomous driving, I doubt I'd want to pay someone else to help them train models that will help my competitors.

The problem I see with wading into other subfields (like my own) that need high quality training datasets, is that the datasets may be proprietary, and may not really overlap that much between companies in the same industry. For example, assembly line datasets for companies making almost the same product may be vastly different. I'm really struggling to see how you can possibly achieve the same scale in other industries.


Congrats! Two quick questions...

Is it weird sharing the same name as a fashion icon ;)

And I'm curious about your ML "stack". Particularly the chicken and egg problem. Are you using something like Tensorflow with pre-trained binaries, perhaps from a vendor? Or is it 100% proprietary. Thanks!


Re 1—It has been a bit of annoyance growing up (for example, Google autocorrects "Alexandr Wang" to "Alexander Wang"), but we run different circles ;)

Re 2—As with most companies working on ML these days, our stack is not fully proprietary. We don't take too strong an opinion on ML framework and use both Tensorflow and Pytorch currently. We generally use neural network architectures from the literature and then iterate on top of them to suit our unique problem requirements.


Astounding. Congratulations, Alex.

If I may, can you please tell us:

As your business has grown, what has changed the most in terms of how you run it?

What were some of the biggest challenges you've overcome and any major obstacles you see in the near future for the business?

Who are your mentors?

Thanks.


The biggest change is your jobs goes from doing things (which makes sense) to building an incredible team that can do things (which is a more unintuitive job). In the limit, it’s always a people business.

Overcome many challenges, but per my last answer, building a team of the best people has been the most important and most challenging. That, and learning how to do sales ;)

Too many mentors. People in Silicon Valley are incredibly helpful. To name a few: Dan Levine, Mike Volpi, Nat Friedman, Adam D’Angelo, Ilya Sukhar, Jonathan Swanson, Albert Ni, Jeff Arnold, Charlie Cheever, and Drew Houston to name a few. I’m very very lucky.


Hi Alex! Congrats on your success so far!

What principles/rules did you stick to when growing your company that you thought helped improve the culture/profits?

Thanks again for acknowledging the Hacker Network community!


We have a rule when hiring people—we look for people with an internal locus of control. Roughly speaking, this means people who believe they have control over outcomes in their life, as opposed to external forces beyond their control.

It’s a small thing, but it’s surprising easy to spot once you look for it. And it really matters—startups are the business of building something from nothing. You need people who believe they can bend the earth.


How do you test for this?


Congratulations on your fast growth! It is always great to see examples of companies like yours actually solving real-world problems in AI with original ideas and obtaining large clients that rely on your work.

I'm really looking forward to more of what Scale will do in the future!


Thank you! We have some exciting stuff cooking that we can’t wait to share with everyone.

In the meantime, check out our open source datasets:

https://scale.com/open-datasets/nuscenes https://scale.com/open-datasets/pandaset https://level5.lyft.com/dataset/


How do you plan to make money once the self-driving hype dies down or becomes a solved problem?


Self-driving is one of many applications of AI/ML to the real world, each of which likely requires high-quality labeled data to truly be production-ready. This includes other robotics, self-checkout like Amazon Go, natural language understanding, and more.

Second, self-driving as a problem space will need labels for a very long time. In an application where (1) verifiable model performance is paramount, and (2) the models need to be extremely robust for cars to be safe, the need for labeled data is only magnified.


I see, I saw that you guys were doing a huge amount of value-add with things like segmentation for self-driving but didn't know you were differentiating yourselves from other competitors in the general labelling space like e.g. Mechanical Turk. Cheers!


I wish you a great success — curated datasets and training/markup APIs for AI applications is a great idea indeed.


I saw someone on Twitter post that this was a real life “Not Hotdog” from HBO’s Silicon Valley, though ironically this company doesn’t actually use AI or ML at all it’s just scaled human contract workers.

There’s some social commentary in there somewhere.


We do use AI and ML to help making the labeling process more efficient, but you are correct we do have scaled human insight that ensures very high quality.

One difference from "Not Hotdog" is that our data is used to power the algorithms of other AI/ML companies like OpenAI, Waymo, Lyft, etc., so it's imperative that we have impeccable quality. That necessitates humans to ensure accuracy, particularly in safety-critical applications like self-driving cars.


I consider this a good thing. I hope the next few years play out well for your company. Good luck!


It is notable that one of Scale's customers has a real yellow Corvette with "#NotHotdog" as a bumper sticker.


I genuinely am not sure who you're talking about, but good to know!


DKan's yellow corvette circa 2017


I don't understand startup valuations well, so would appreciate someone more knowledgeable throwing some light on how these valuations are made. Would I be in the ballpark in assuming that they have a Sales ARR of $125M. At a sales multiple of 8x (for SaaS cos) makes them worth $1B.

The $125M is around 12 large customers with contracts of $10M each, which buys them services of 2500 labeling contractors for 2000 hrs/year at $2/hr ($4K/yr).

At some point they will stop being a services company which carry a low multiple and switch to automated labeling without contractors (ala self driving cars) or develop some unique IP that they sell as a service?


One major risk is that their customers simply build teams in-house. Microsoft has had a very large team for years (and MSR / Ofer Dekel has actually published a lot of useful research on how to handle “crowdsourced” labels). Companies have been building productive off-shore labeling / moderation teams since the early days of Crowdflower. At some point, it’s not just the cost that makes sense, but rather the Product team wants a reliable workforce that they can control.

Another risk is that the well-funded self-driving customers go belly-up. However, one important facet is that dead players don’t release much data. MobilEye has a vast dataset (including images from not just Tesla but other automakers) but that data isn’t going anywhere. Neither is Nvidia’s 180PB of HD recordings. (Release or transfer in part requires dealing with PII of the people in the recordings. Now if only the offshore labelers weren’t handed PII for free...).

The valuation is likely a forward-looking bet on AI as whole versus the current suite of contracts. Anybody using an off-the-shelf model will want some labels after their first proof of concept. I wouldn’t argue that the math makes sense but rather that demand does look underserved.


> It’s built a set of software tools that take a first pass at marking up pictures before handing them off to a network of some 30,000 contract workers, who then perform the finishing touches.

Machine learning indeed.


To be clear, there is real machine learning that makes the labeling more efficient.

You can see some videos of what this looks like in this Twitter thread: https://twitter.com/BW/status/1158407524216909826


Very similar to the Magic Wand tool in Photoshop which gives a good starting point and can be improved on manually in the problem areas where colors are ambiguous.


hey @ayw. your story is really inspiring! I am currently studying full stack web development via online courses as I have few startup ideas that I want to work on that I believe to be promising. Is there any Machine learning course you specifically recommend taking in order to come up to speed with the AI technology and also if you could share some tips on becoming a first time founder especially getting the funding from a silicon valley VC as in what do they really look for when investing in a startup started by someone with no prior startup experience then that would be really great! Thanks ahead of time!


More like machine teaching!


Congrats on the recent purchase of scale.com. Can you share how much was the domain cost or some words about this acquisition?


Bit of an AI novice here, I did Norvig's course a few years ago and never worked in the field, but how can a machine take a "first pass" at labelling without being trained? What information is it using to apply labels to the first set of data? How does this approach differ from a conventional classifier? Would the initial guesses essentially be random?


The business model is, initially, selling human labeling services to owners of data (like Uber or Google), using third world cheap labor to keep costs low. This is very much like a call center service.

Once a sufficiently large corpus of human labeled data is available (across clients and datasets probably), that labeled data is used to train a 'first pass' labeling system.

It then becomes a virtuous cycle. Now the labeling is done in two phases. The first pass system makes its best guess, which is then reviewed by the existing human work force. Over time the first pass labeler gets better and better, till only very tricky/borderline cases need human intervention.

The end game is anyone's guess. Pretty clever biz model hack.


like Google Search using human raters to verify their updated algorithms improve results


So they have a set of pretrained model that they are running as first pass and either verify the result with humans, or filter out the low confidence results and give them to humans.

Note that this creates a positive feedback loop. I.e. as they get more results they can improve the initial stage.


I'm a little confused about what the business model here is. It sounds like they are selling labeled data to companies, and doing this by "label[ing] most of the objects automatically" and then having humans review these labels. So does this mean they are using some unsupervised method to label data, and then selling that to people who want to train supervised models? Why aren't they instead just beating out the people they sell to by solving the same problems without labeled data?


Presumably they have a supervised model that they've trained on all their labeled data so far (possibly pooled across clients). They'd use this to estimate labels for their data, and then have humans correct it. They're basically doing the standard supervised data training loop.

If I had to guess, the long term plan probably is to move up the stack and sell the models to their clients.


I used to train AI to help researchers find more relevant papers at http://iris.ai This was nothing more than just classifying. Would this kind of opportunity be available for data remotaskers at Scale. Best regards for groundbreaking work


Hey @ajw, congrats on the growth. Quick question for you: How does Scale's labeling speed and quality compare to Hive?


We have many clients who have switched from Hive. There’s usually a step change improvement in quality and scalability—up to 10x improvement in error rates.



So what is your competitive advantage? I.e. what cannot be replicated?

From a technical perspective, can someone just post labelling task to mechanical Turk? what is the difference here?


Today: The $100M war chest that lets them undercut the prices of mechanical turk

Tomorrow (Maybe): Huge corpus of previously solved cases by humans, that can be used to train a custom model to replace the humans


Congrats Alex! It's been amazing watching your journey from the first pivot to Scale to now.


What was the first pivot?


Human powered AI for the future! Because artificial intelligence is not.


whats made you more successful than other players in this space?


I really dislike this sort of journalism. Theranos was founded by a 19 year old too. That one didn't work out so well. Was it because the founder was so young? The board so oblivious? (a bit of both if you read the book)

What does it really matter how "old" the founder is, does the business have a workable business plan? Can it be profitable? Do people pay enough money for its goods and services to return a net income? Those are interesting questions. That it was started by a teenager is not, to my way of thinking, particularly relevant.

I'd much prefer that the article focus on these things which helps us understand the value that they bring to the market and what makes them unique.


Incidentally this frame is what got the most upvotes, compared to more neutral framing of the new valuation.

[3 points] Scale AI (YC S16) raises $100M at $1B+ valuation to go beyond AI data labeling: https://news.ycombinator.com/item?id=20615657

[11 points] Scale (YC S16) Raises $100M from Accel and Founders Fund at $1B Valuation: https://news.ycombinator.com/item?id=20614672


We merged this thread into the latter, which was posted earlier and has the less baity title.


Agreed. I don't object to company or founder exposés such as these, but I do find the article's focus on the founders' age to be misplaced and distasteful. I also think that these kinds of breathless adorations of the latest "wunderkind" to contribute to the widespread perception/reality of ageism within the industry.

I understand it makes for a more click-attracting title, but imagine if the title was "Silicon Valley's Latest Unicorn Is Run By an Asian" - it sounds offensive, at least to me. Preferably the articles would focus on something other than a protected class.

Note that age is not technically a protected class, only "advanced age", defined as 40+, but I personally think that making unbounded age a protected class would be beneficial for everyone. Refusing to hire someone because they're "too young" is equally as offensive as refusing because they're "too old".


Startup business is show business, hence obsession with youthfulness and cheap drama.

Customers flock to fame, press and investors do as well - they “know” where the customers will be so they go there, increasing the chances of success in a positive feedback loop. That’s how show business works.

San Francisco is the new Hollywood, basically.


I think it depends. If the age is being used to promote ageism, then this is not good. If however, the age is used to encourage young people to start businesses, that's positive. So... context and tone?


The age is actually testament to how easy it is to get to 1B$ valuation with no need for deep software experience.

This is in general not a good sign for the company or for the VC.


I think we have to just come to terms with the reality that ad-dependent journalism is inherently click-baity. Anyone who says otherwise is delusional or lying.


> I'd much prefer that the article focus on these things which helps us understand the value that they bring to the market and what makes them unique.

Me too. The era of lightweight clickbait headlines can't end soon enough.


I've come up with much dumber business plans at 30 than many originated by 22-yr-olds.

Harshness is in order---regarding the business plan. But the age is a total red herring.


Welcome to the bubble.

Don't fret, it is not likely to last much longer.


In essence, the company pays third worlders a pittance to transfer humanity's skills to the machine. The skill transfer is limited to what can be done with a mouse and screen, but since that's where most human ability is currently manifested, it's hardly a limitation. What happens to the serfs once the transfer is complete? Do they realize they are exchanging temporary wages for eternal futility?

I like how the investors rationalized this devil's deal and the usurpation of the poor: "If you could be pulling a rickshaw or labeling data in an air-conditioned internet café, the latter is a better job."


I'm an American who began doing online gig work (for a different company, not Scale) while homeless. It allowed me to pay down debt and get back into housing under circumstances where a normal job was out of the question.

This worked in part because my income was portable, so I was able to take a train to a more affordable area to get a place within my limited budget. This ability to move at will and take my income with me was historically largely limited to the Jet Set and comfortably well-off retirees.

For many people, doing gig work is a tremendous opportunity with a very big upside. It can be a huge improvement in both their standard of living and quality of life.

Most people decrying such labor arrangements aren't doing anything whatsoever to offer a better alternative. Color me unimpressed.


I work at Scale - I've met a bunch of people that work on our platform, and seeing the impact that it's had on their lives is actually a huge source of inspiration to me. There's a writeup highlighting some of their stories at https://scale.com/blog/positive-externalities - based on my personal experience, I can say that it's not bullshit.


"Do they realize they are exchanging temporary wages for eternal futility?" Yes they do.

and this isn't wrong either

"If you could be pulling a rickshaw or labeling data in an air-conditioned internet café, the latter is a better job."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: