Humans don't act based on visual patterns alone though. We act based on our unde...

Ukv · on Oct 11, 2024

> the number of things that can potentially happen is far greater than the number of training examples we could ever produce

Models don't need to have been trained on every single possibility - it's possible for them to generalize and interpolate/extrapolate.

But, even knowing that it's theoretically possible to drive at human-level with only the senses humans have, it does seem like it makes it unnecessarily difficult to limit the vehicle to just that. Forces solving hard tasks at/near 100% human-level, opposed to reaching 70% then making up for the shortcoming with extra information that humans don't have.

fauigerzigerk · on Oct 11, 2024

>Models don't need to have been trained on every single possibility - it's possible for them to generalize and interpolate/extrapolate.

They do have some in-distribution generalisation capabilities, but human intentions are not a generalisation of visual information.

Ukv · on Oct 11, 2024

"human intentions are not a generalisation of visual information" is a bit confusing category-wise. Question would be to what extent you can predict someone's next action, like running out to retrieve a ball, given just what a human driver can sense.

Clearly that's possible to some extent, and in theory it should be possible for some system receiving the same inputs to reach human-level performance on the task, but it seems very challenging given the imposed constraints.

Also, for clarity, note that the limitations don't require the model be trained only on driver-view data. It may be that reasoning capability is better learned through text pretraining for instance.