This is mainly due to the limited registers available on a 32bit machine.
If you use the debugger you will see that the inlined functions use different registers arrangement in different parts of the code. The register used depends on which of them are available or more convenient to use in that point. The selection of registers is called 'spilling' in compiler theory.
Anyway different registers have also different timings on some operations, and sometimes push or pop operations are required to 'free' a register for the routine.
As proof observe the usage of the standard called StrLen, because we use always the same code (same registers, same memory accesses, etc) its timing is perfectly constant.