I am not aware of any autoregressive transformer model that doesn't follow scali...

ShamelessC · on July 14, 2022

Scaling parameter count requires a similar increase in the amount of (accurate, labeled) data. This can be mitigated by “bootstrapping” techniques that make labeling new data easier, but is still likely the bottleneck for training such a model effectively (assuming they can probably spin up a supercomputer to scale their models otherwise).

rafaelero · on July 14, 2022

You are correct. A Masked Language Model architecture would work very well here. Train it on labeled data and then bidirectionally infer missed spots.

That's fun. We should join their AI team.

ShamelessC · on July 14, 2022

I don’t think I would do very well under Elon’s management style ha - the whole butts-in- seats from 8-6 thing is basically a dead end for me (personally).

I’m not sure what architecture they use, but they do indeed already have a pre trained “auto-labeler” that their annotators use. My understanding is that due to hallucinations from the model and the risks involved with driving, they still need to be vetted manually before being added to the dataset.

rafaelero · on July 14, 2022

Makes sense. Fortunately, scale is less of a problem for the autolabeler, so they can put a lot of processing power on this step. But yeah, they will need human labelers for the edge cases where the model is unsure. I wish Tesla would publish their results so we could understand what they are doing and how much it is improving.