Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Humans don't act based on visual patterns alone though. We act based on our understanding of the world as a whole, including the intentions of other humans.

For instance, when we see a ball rolling onto the street, we know that there is probably a young person nearby who wants that ball back. We don't have to be trained on the visual patterns of what might happen next.

Of course AI can be trained on the visuals of high probability events like this. But the number of things that can potentially happen is far greater than the number of training examples we could ever produce.



> the number of things that can potentially happen is far greater than the number of training examples we could ever produce

Models don't need to have been trained on every single possibility - it's possible for them to generalize and interpolate/extrapolate.

But, even knowing that it's theoretically possible to drive at human-level with only the senses humans have, it does seem like it makes it unnecessarily difficult to limit the vehicle to just that. Forces solving hard tasks at/near 100% human-level, opposed to reaching 70% then making up for the shortcoming with extra information that humans don't have.


>Models don't need to have been trained on every single possibility - it's possible for them to generalize and interpolate/extrapolate.

They do have some in-distribution generalisation capabilities, but human intentions are not a generalisation of visual information.


"human intentions are not a generalisation of visual information" is a bit confusing category-wise. Question would be to what extent you can predict someone's next action, like running out to retrieve a ball, given just what a human driver can sense.

Clearly that's possible to some extent, and in theory it should be possible for some system receiving the same inputs to reach human-level performance on the task, but it seems very challenging given the imposed constraints.

Also, for clarity, note that the limitations don't require the model be trained only on driver-view data. It may be that reasoning capability is better learned through text pretraining for instance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: