Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the point is we are left with uncertainty. Your prior should be that we don't know which competitor is best, and after the competition we are still unsure.


Isn't that the same thing as "not producing useful models"? Like, sure, some of the models may work, but unless you know which ones you can't make use of them.


Yes, very true, but if we're still unsure it may be worth testing them more, while if we've proven they don't work we can abandon them.


How do you test them more, with which nonbiased dataset that does not exist?

What you could do is actually describe the kind of errors the network makes. In the example of CT, false positive, false negative, wrong diagnosis. We can try to analyze what the network is detecting, rather than accept a result on some test set as real.

The millions of trials is an overstatement, but indeed few hundred thousands are needed to actually discern a winner, presuming the network did not cheat by focusing on, say, population statistics - say, certain cranium sizes being more likely to present with problems. Relying on population statistics derived from a small sample (even if representative, which it's not) is very risky...


It's also possible that if you have a lot of models that all score very close to each other that they just ALL work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: