Your commented is directly contradicted by the article.
> To make multimedia processing fast. It’s very common to get a 10x or more speed improvement from writing assembly code, which is especially important when wanting to play videos in real time without stuttering.
They said they prefer intrinsics which the article says are only about 10% slower(citation needed), you misunderstood and made a comparison against scalar.
Personally I'd say the only good reason to use assembly over intrinsics is having control over calling convention, for example the windows CC is absolute trash and wastes many SIMD registers.
> To make multimedia processing fast. It’s very common to get a 10x or more speed improvement from writing assembly code, which is especially important when wanting to play videos in real time without stuttering.