Hi all,
The SDCC team, and especially Philipp Klaus Krause (a Z80 guru), has bring to developers a new function calling convention that will significantly increase the performance of C programs.
This new calling convention, which is the default since SDCC 4.1.12, allows some function parameters to be passed through the Z80 registers rather than through the stack, which is far much faster.
To make it short, the first parameters are sent by the registers and the rest by the stack. Details at the end of the message.
If you program 100% in C, it's very easy to take advantage of it... just update SDCC, recompile and voila! :)
If you use a mix of C and inline assembler (like me) the transition is less straightforward, but nothing very complex either. And... it's worth it!
In this case, what I advise is to first disable by hand the new calling convention on all functions where you read the input parameters and/or set the return value in assembler. To do this, just add the __sdcccall(0)
directive to these functions. And that's it, you can compile and enjoy the optimizations of the new convention on all your pure C functions.
You can then gradually remove the __sdcccall(0)
after adapting your assembly code to the parameter access changes.
For information, functions using the __z88dk_fastcall
directive are not impacted by the changes and keep their own calling convention.
Here is the list of parameter combinations handled natively via the Z80 registers according to the calling conventions:
__sdcccall(0)
is the old calling convention (default until SDCC 4.1.11).
__sdcccall(1)
is the new calling convention (default since SDCC 4.1.12).
__z88dk_fastcall
is an alternative calling convention that works only with 1-parameter functions.
Not only does the new calling convention natively handle more cases than __z88dk_fastcall
, but it also works even if there are more parameters in the function. In this case, the parameters not handled by the registers are passed through the stack. For example, if a function takes 3 x 8-bit parameters, the first 2 will be passed through the registers (A and L) and the 3rd through the stack.
I haven't done a real benchmark yet, but after converting my game library (https://github.com/aoineko-fr/CMSX) I could see a gain of about 20% on my sprite test program (with 32 sprites moving at once).
The gain should vary a lot from program to program — especially depending on the number of functions called each frame — but in any case it should be significant.
I hope that one day soon SDCC will offer the possibility to choose in which register to put each parameter of a C function to help interfacing assembly libraries and BIOS, but in the meantime, it is already a big step that has been made to make C programming more efficient on our beloved 8-bit computers.
Thanks to the SDCC team!