It appears to be slightly worse than Qwen2VL 7B, a model almost half it's size, ...

kaoD · on Sept 11, 2024

But Qwen is not multimodal, or is it?

Jackson__ · on Sept 11, 2024

>Qwen2-VL is the latest addition to the vision-language models in the Qwen series, building upon the capabilities of Qwen-VL. Compared to its predecessor, Qwen2-VL offers:

>State-of-the-Art Image Understanding

>Extended Video Comprehension

Besides, it'd have been pretty silly for them to mention it on their slides if it wasn't.