Google/Alphabet is stuck and will make advances in specific territories but will never get there without a fundamental change in approach. Their approach relies on very detailed mapping/modeling of specific terrain, so they can make a usable case sooner, but outside of the map/model territory, they're literally lost. And maps/models change constantly and rapidly.
Tesla is taking a fundamentally more broad and deep approach - working with the fundamental fact that a pair of visual sensors and a compute engine (eyes & brain) can successfully figure out driving in strange areas in real time, ergo, it should be possible without a map/model or lidar. Once they get it solved, it will be solved once and for all. Bigger gamble, bigger payoff. Equipping the car with dozens of eyes is the easy part. The question is whether enough compute power can be brought to bear on solving the recognition problems, and the edge cases. They have obvious issues with failing to recognize large objects like trucks in unexpected orientations, left turns etc. Using millions of miles of live human driver data as a training set is great, except that the average driver is really bad, so it's entirely polluted with bad examples, ESPECIALLY around the edge cases that get people killed. There, examples from professionally trained drivers, who really understand the physics and limits of the car, adhesion, traffic dynamics, etc, are what you want to train on, but that isn't what they have. It is also possible that even if the set of training data would actually be sufficient, the big question will kill them - perhaps the solution requires orders of magnitude more compute power to approach human performance, and they just don't have the hardware to simulate human compute power. So, have they just hit the limits of what their compute power can do?
I think Tesla's approach is fundamentally the way to go, as it is a general solution, compared to everyone else's limited map/model approach.
But both may require either or both a more specifically programmed higher-level behaviors, and/or something much closer to AGI than exists, something that has actual understanding of the machine-learned objects and relationships, which does not yet exist (if one is known, pleas correct me - I'd love to know about it).
Tesla is taking a fundamentally more broad and deep approach - working with the fundamental fact that a pair of visual sensors and a compute engine (eyes & brain) can successfully figure out driving in strange areas in real time, ergo, it should be possible without a map/model or lidar. Once they get it solved, it will be solved once and for all. Bigger gamble, bigger payoff. Equipping the car with dozens of eyes is the easy part. The question is whether enough compute power can be brought to bear on solving the recognition problems, and the edge cases. They have obvious issues with failing to recognize large objects like trucks in unexpected orientations, left turns etc. Using millions of miles of live human driver data as a training set is great, except that the average driver is really bad, so it's entirely polluted with bad examples, ESPECIALLY around the edge cases that get people killed. There, examples from professionally trained drivers, who really understand the physics and limits of the car, adhesion, traffic dynamics, etc, are what you want to train on, but that isn't what they have. It is also possible that even if the set of training data would actually be sufficient, the big question will kill them - perhaps the solution requires orders of magnitude more compute power to approach human performance, and they just don't have the hardware to simulate human compute power. So, have they just hit the limits of what their compute power can do?
I think Tesla's approach is fundamentally the way to go, as it is a general solution, compared to everyone else's limited map/model approach.
But both may require either or both a more specifically programmed higher-level behaviors, and/or something much closer to AGI than exists, something that has actual understanding of the machine-learned objects and relationships, which does not yet exist (if one is known, pleas correct me - I'd love to know about it).