Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From 10,000 feet up, a two step solution:

(1) Get a lot of data.

(2) Do finely grained cross tabulation.

Why? Because cross tabulation is a discrete version of the most powerful foundation but for challenging questions can need a lot of data. Curve fitting is what are pushed into when don't have so much data.

Or suppose build a model of the probability of an auto accident. Okay, want to evaluate the model for 5' 2", 105 pounds, 17, blond, speaks only Swedish and Russian, and is in the US in LA driving an 18 wheel truck for the first time and talking on her cell phone with her sister in Berlin.

So, for that query, instead of a model, just have a lot of data and cross tabulate, and the cell with that person delivers the answer immediately, directly. Moreover the answer is unbiased and minimum variance (least squares). But did I mention, need a lot of data?



This is fundamentally why we have deep neural networks. Fitting a simplified curve in a huge space.

The combination you named may never have been observed before. The curse of dimensionality strikes.


Yup. Did I mention that with cross tabulation we'd need a lot of data!!!! :-)!!!


But isn't that the whole problem? You quickly run into the case where the required sample size is larger than the entire population size.


So, you expect us to boil it all down to just F = ma, and f'get about the apple, the apple tree, the time of day, the phase of the moon, the temperature of the day, what Newton was wearing?????

That was a really good shot at causality. Let me just say, quite generally, causality is super tough to find.


You did!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: