Easy to fix *technically*, but first the issue must be recognized and demonstrat...

mgraczyk · on Aug 1, 2021

And the first cars didn't have seatbelts.

It's fine, these are not complicated problems, and they are much easier to spot and fix than most problems in software engineering at scale. Don't be fooled by the negative PR campaigns and clickbait, there's no reason to be skeptical about ML in general because of this.

Also, Tay attempted to solve a much harder problem than image classification. It's hard to build a safe hyperloop. It's no longer hard to build a safe microwave oven.

SamoyedFurFluff · on Aug 1, 2021

Forgive me, because I’m not an expert in ML. If this is an easy problem to solve why is it still a problem years after it’s so widespread that msm both knows about it and have written continual investigative journalism about it? It’s clearly not cutting edge anymore once it gets to that point and yet it’s still a problem. Why?

Mehdi2277 · on Aug 1, 2021

It's some work, but not hard to solve technically having been at companies that deal with very similar problems. The main difficulty is less technical and more investment needed vs value + investment is partly outside of the modeling engineers making the system. Part of the improvement can be done by classical computer vision techniques. But mixing classical computer vision techniques with modern ones both feels somewhat like a hack and complicates the system. The other big area though is dataset improvement. Engineers building ml systems and the people collecting and organizing the needed datasets are normally different people with mild connections to each other. For companies that rely mostly on existing datasets and finetune from them, having to add a data curation process is a big pain point. Most companies have immature data curation processes. Many of the popular open source ml datasets have poor racial diversity. The most popular face generation dataset is celebA, full of celebrities (mostly white ones).

Other issue is for many of these systems having a racial bias in the error rate has mild business impact which makes it harder to prioritize in fixing. Last issue the work needed to fix this tends to be less interesting than most of the other work to make the system.

So overall, the main issues are lack of good open source fair datasets with loose licensing, cross organizational need to solve it (engineers can not code up a fair dataset), and business prioritization.

edit: Also solve here is getting accuracy across races to be close not zero. ML models will always have an error rate and if your goal is 0 errors related to racial factors that is extremely hard. Modeling is about making estimates of data not knowing the truth of that data.

mgraczyk · on Aug 1, 2021

Overfitting is also a technically easy problem to solve, but high profile cases in which it's not solved with obvious negative consequences could also lead to investigative journalism.

cratermoon · on Aug 1, 2021

The short answer to your question is the same as the one to a lot of programming questions: It's not a technical problem, it's a people problem. Just getting the industry to recognize and acknowledge bias took investigative reporting. The prime example really is the situation with social media and targeted advertising algorithms. We still have people, influential people, like Mark Zuckerberg going around saying that ML isn't really a problem, everything's fine, social media isn't playing any role in destabilizing democracy, targeted ads aren't a threat to anyone's safety, and neither of them have anything to do with the breathtaking levels of economic equality we see.

No doubt there are still plenty of other issues with ML that haven't (yet) made it to popular attention, and the people employing it aren't making decisions based on social value or common good, but simply invoking free markets and capitalism as their guiding philosophies.

cratermoon · on Aug 1, 2021

I'm afraid the problems with ML are less like "whoops. we don't have seatbelts" and more "surely internal combustion engines optimized for power and mass production couldn't cause problems. It's not like there are going to be millions of them crammed together in lines 3 or 4 across crawling around at 10mph every day. Plus, fossil fuels are cheap, plentiful, and really have no downside we know of. Way better than coal at least - much less awful black smoke!"