Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What do the tokens for an image even look like? I understand that tokens for text are just fragments of text... but that obviously doesn't make sense for images.


The image is subdivided by a grid and the resulting patches are fed through a linear encoder to get the token embeddings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: