Why is calling my asm function from Rust slower than calling it from C?
Why is calling my asm function from Rust slower than calling it from C?December 27, 2025rustcperformanceThis is a follow-up to making the rav1d video decoder 1% faster, where we compared profiler snapshots of rav1d (the Rust implementation) and dav1d (the C baseline) to find specific functions that were slower in the Rust implementation1.Today, we are going to pay off a small debt from that post: since dav1d and rav1d share the same hand-written assembly functions, we used them as anchors to navigate the different implementations – they, at least, should match exactly! And they did. Well, almost all of them did.This, dear reader, is the story of the one function that didn’t.An OverviewWe’ll need to ask – and answer! – three ‘Whys’ today:Using the same techniques from last time,…