The MoE architecture choice here is particularly interesting - the ability to keep only 2B parameters active while maintaining 8B model performance is a game-changer for edge deployment. I've been deploying vision models in production environments where latency is critical, and this sparse activation approach could solve the inference cost problem that's been limiting adoption of larger VLMs. The chart understanding capabilities mentioned look promising for automated document analysis workflows. Has anyone tested the model's consistency across different image qualities or lighting conditions? That's often where smaller models struggle compared to frontier ones.