I recently stumbled over gcc's global register variables. Tried them and my interpreter program ran on steroids. :-)
Speed factor 2-3 !!
F.ex. simply changed a variable declaration (there are others) from
int64_t mftos; // top-of-stack
to
register int64_t mftos asm ("r12"); // register-cached top-of-stack
Available registers are within the setjmp buffer frame:
// extract from GCC-W64 setjmp.h:
typedef struct _JUMP_BUFFER {
__MINGW_EXTENSION unsigned __int64 Frame;
__MINGW_EXTENSION unsigned __int64 Rbx;
__MINGW_EXTENSION unsigned __int64 Rsp;
__MINGW_EXTENSION unsigned __int64 Rbp;
__MINGW_EXTENSION unsigned __int64 Rsi;
__MINGW_EXTENSION unsigned __int64 Rdi;
__MINGW_EXTENSION unsigned __int64 R12;
__MINGW_EXTENSION unsigned __int64 R13;
__MINGW_EXTENSION unsigned __int64 R14;
__MINGW_EXTENSION unsigned __int64 R15;
__MINGW_EXTENSION unsigned __int64 Rip;
__MINGW_EXTENSION unsigned __int64 Spare;
SETJMP_FLOAT128 Xmm6;
SETJMP_FLOAT128 Xmm7;
SETJMP_FLOAT128 Xmm8;
SETJMP_FLOAT128 Xmm9;
SETJMP_FLOAT128 Xmm10;
SETJMP_FLOAT128 Xmm11;
SETJMP_FLOAT128 Xmm12;
SETJMP_FLOAT128 Xmm13;
SETJMP_FLOAT128 Xmm14;
SETJMP_FLOAT128 Xmm15;
} _JUMP_BUFFER;
It would be nice to have this feature also in Pelles C. Unfortunately its setjmp.h doesn't tell much.
I don't know if it would be hard to build a wrapper around the Windows library's jmp_buf structure,
and to reserve some CPU registers for it, like gcc seems to do.
Best regards