I often use objasm (from Agner Frogg) to see if the proc I write and the datas are well aligned.
In a previous post I saw misalignment generated by PoAsm, with this new version the prioblem
always is present. I know I will win just 0.000000000000000000000001 ms
, but that would be proper.