The headline, apart from being clickbaity, is *accepting the null hypothesis*. W...

stdbrouw · on Sept 20, 2019

There's nothing problematic about accepting the null hypothesis, it's just that instead of controlling for Type I error, you need to control for Type II error, i.e. ensure sufficient power.

mlnewbie · on Sept 20, 2019

Yeah, we can't disprove anything, yadda yadda.

If almost every competition on Kaggle has a winner that is not significantly better than the bulk of the field, then that is proof. Chance correlations leading to you not rejecting the null can only take you so far.

skybrian · on Sept 20, 2019

I think the point is we are left with uncertainty. Your prior should be that we don't know which competitor is best, and after the competition we are still unsure.

mlnewbie · on Sept 20, 2019

Isn't that the same thing as "not producing useful models"? Like, sure, some of the models may work, but unless you know which ones you can't make use of them.

skybrian · on Sept 20, 2019

Yes, very true, but if we're still unsure it may be worth testing them more, while if we've proven they don't work we can abandon them.

AstralStorm · on Sept 20, 2019

How do you test them more, with which nonbiased dataset that does not exist?

What you could do is actually describe the kind of errors the network makes. In the example of CT, false positive, false negative, wrong diagnosis. We can try to analyze what the network is detecting, rather than accept a result on some test set as real.

The millions of trials is an overstatement, but indeed few hundred thousands are needed to actually discern a winner, presuming the network did not cheat by focusing on, say, population statistics - say, certain cranium sizes being more likely to present with problems. Relying on population statistics derived from a small sample (even if representative, which it's not) is very risky...

alwayslearning0 · on Sept 20, 2019

It's also possible that if you have a lot of models that all score very close to each other that they just ALL work.