Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It appears to be slightly worse than Qwen2VL 7B, a model almost half it's size, if you look at the Qwen's official benchmarks instead of Mistral's.

https://xcancel.com/_philschmid/status/1833954941624615151



But Qwen is not multimodal, or is it?


https://qwen2.org/vl/

>Qwen2-VL is the latest addition to the vision-language models in the Qwen series, building upon the capabilities of Qwen-VL. Compared to its predecessor, Qwen2-VL offers:

>State-of-the-Art Image Understanding

>Extended Video Comprehension

Besides, it'd have been pretty silly for them to mention it on their slides if it wasn't.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: