Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There's a lot of what I call "model fetishism" in machine learning.

Instead of focusing our energies on the infrastructure and quality of data around machine learning, there's eagerness to take bad data to very high-end models. I've seen it again and again at different companies, usually always with disastrous consequences.

A lot of these companies would do better to invest in engineering and domain expertise around the problem than worry about the type of model they're using to solve the problem (which usually comes later, once the other supporting maturity pieces are in place)



This is why my interview question focuses around applying linear regression to a complex domain. It weeds out an enormous number of candidates.

There are 5 ML models that we maintain where I work, and none of then are more complicated than linear regression or random forests. Convincing me to use something more complex would take an enormous amount of evidence. Domain knowledge is king.


Yes! I feel this quite a lot, I've just finished my degree. I remember reading quite a few papers for my thesis where there is little discussion of the actual data that is used, what might be graspable from the data with basic DS techniques such as PCA, clustering and such. Instead, it goes right to the model and default evaluation methods, just a table of numbers.

We did have courses explaining the "around" of the whole process though, but that's not as hyped.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: