AVX2 is slower than SSE2-4.x under Windows ARM emulation
If you compile your app for AVX2 and it runs on Windows ARM under Prism emulation, is it faster or slower than compiling for SSE2-4.x?I assumed it would be roughly the same — maybe slightly slower due to emulation overhead, but AVX2’s wider operations would compensate. The headline gives it away: I was wrong.đź’ˇTLDR: AVX2 code runs at 2/3 the speed of equivalent SSE2-SSE4.x optimised code under emulation on Windows 11 ARM.’Should I compile for AVX2 if my app might run on Windows ARM?’ has a clear answer: No. At least if performance matters.This post explains how I found out, what I measured and how, the benchmark results, and why.CuriosityA few weeks ago, in a Hacker News thread on WoW (the game) emulated performance on Windows ARM, I wondered:I’ve been testing some math…