Yesterday I stumbled over this comment in an older project of mine:
G200: // edi is aligned now. Move 4 bytes at a time
mov edx, ecx
shr ecx, 2
// Do not use rep movsd in negative direction, errors on AMD's
G210: sub edi, 4
mov eax, [edi + esi]
mov [edi], eax
dec ecx
jnz G210
The rep movsd is not the source of the problem here, since only two bytes were moved. However, there were other processor bugs reported for rep movs* for AMD as well as Intel processors (search for 'bug "rep movs"'). Occurence of these bugs usuallay depends on a number of side conditions, so they are hard to reproduce.
I could not reproduce the problem on a Windows XP, Athlon, Pelles 4.50.113.
The original code I used in the example above is from Agner Fog,
www.agner.org/optimize (GPL).