Okay. I have it working. Having RCX,RDX,R08,R09 saved on the stack
by caller makes it easier to use passed arguments but it causes
wasteful stack usage when those arguments don't exist. I always
used stack-only argument passing in the past because the stack
space is allocated and deallocated by the function that actually
uses the space and only when needed. This makes for smaller code
in the long run and faster execution. Also, if a function is called
many times, the waste is multiplied by the number of calls when
using the MS convention. When letting the called function do
the allocation as needed, the register saving is done in only
one place so it is much more efficient in the long run. But we have
grown used to a lot of waste in the Windows environment. That
makes it a good candidate for replacement IMHO.