When I read your posting first, I thought you can avoid your problem by misusing the profiling functions. So I wanted to verify the profiling functions and was suprised that there was only a function at the function header.
So I looked with the debugger how the profiling works and saw how Pelle uses his __pexit function. I think that is the easiest way for him because every function has only one entry but it can have many exits (also no good coding style).
If Pelle decides to spend an extra call to __pexit at each return, you can be happy to have an easy way for your problem. If he decides that it is to much work, you can look how Pelle keep his stack clean with __penter and __pexit and you can solve your probleme in the suggested way.
I know this is no elegant way, but you need it only one time as a wrapper around the __penter and __pexit and there you can call any function you want.
PS: Perhaps Vortex, our Assembler guru will write such a wrapper and can take away some work from Pelle.