Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wouldn't be surprised if the video models were vastly undertrained compared to our text models. There's probably millions of hours of video we haven't used to train the video models yet.

Still seems like early days on this tech. We're nowhere near the limits.

Just a year ago we could only create the distorted video of Will Smith eating spaghetti. A year from now this is going to be even more flawless.



But what does flawless mean, how is this not flawless? I see very few “flaws” in this. But the comprehensiveness of the video training space is probably just miniscule compared to photo and text.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: