Calling BIOS routines from C without any overhead

Par aoineko

Paragon (1140)

Portrait de aoineko

08-07-2022, 23:49

In my last MSXgl update, I said I found a technique to remove 100% of the C language overhead on calling some BIOS functions.
I don't know if anyone is interested but just in case, here is the explanation. ^^

The idea is to "cast" an address with the signature of a C function so that the calling code initializes the registers properly before calling the BIOS function.
For example, the function GTSTCK (00D5h) which returns the status of the joystick, takes its input parameter in register A and returns its final value in register A. Among the possible function signatures, (u8(*)(u8)) uses these same registers (u8 is unsigned char).
Thus, with this function definition...
inline u8 Bios_GetJoystickDirection(u8 port) { return ((u8(*)(u8))R_GTSTCK)(port); }
...the calling code will put the port number into A, call the GTSTCK function, then read the result into A.
Since the function is inline, we gain C typing verification, while having zero C overhead.

Obviously, this only works with BIOS routines that use registers for which there is a C function signature. There are few of them, but by playing with the sdcccall1 and z88dk_fastcall calling conventions, we still have some possibilities.
In the case of a z88dk_fastcall signature you have to use a typedef to define the signature before you can use it.
Here is a summary of all possible the combinations:

!login ou Inscrivez-vous pour poster

Par jepmsx

Champion (281)

Portrait de jepmsx

09-07-2022, 06:41

Thanks for the explanation of your strategy. Congratulations! It doesn't seem easy to find.

A question from my ignorance, does this technique work in all versions of sdcc or only in the new one that had a different technique for calling functions?

Par ToriHino

Paladin (927)

Portrait de ToriHino

09-07-2022, 09:03

This uses the new calling convention __sdcccall(1) (and some with __z88dk_fastcall) which is heavily optimized on how it uses the calling functions i.e. using direct registers for the parameters.

Btw, that's indeed a nice technique aoineko, I might have some use for it in my projects too.

Par aoineko

Paragon (1140)

Portrait de aoineko

09-07-2022, 09:55

jepmsx wrote:

A question from my ignorance, does this technique work in all versions of sdcc or only in the new one that had a different technique for calling functions?

The __z88dk_fastcall calling convention is quite old so all green cells (in my image) are usable with "old" version of SDCC.
However, the __sdcccall(1) calling convention was introduce in SDCC 4.1.12, so blue cells dont work with older version.

Par Timmy

Master (200)

Portrait de Timmy

09-07-2022, 14:20

Thanks for reminding me about fastcall and overhead caused by SDCC.

z88dk never had this problem with overheads using fastcalls.

That is, until a while back, "for source compatibly with zsdcc", they added an extra push and pop just for sdcc. And these stack operations had to be removed using "__naked".

Note to myself, in future, add "__naked" in my fastcall functions.

Par nuc1e0n

Supporter (6)

Portrait de nuc1e0n

22-08-2022, 23:09

z88dk has konamiman's asmlib integrated into it. There's a BiosCall function that's part of that lib. The wiki documentation is here: https://github.com/z88dk/z88dk/wiki/ASMLIB

Par ToriHino

Paladin (927)

Portrait de ToriHino

22-08-2022, 23:23

nuc1e0n wrote:

z88dk has konamiman's asmlib integrated into it. There's a BiosCall function that's part of that lib. The wiki documentation is here: https://github.com/z88dk/z88dk/wiki/ASMLIB

I use these same calls also as part of RoboPlay using SDCC (f.e. bioscall.c). The routines work great, but are for sure not without C related overhead, what was the intention of this thread. Just look at asmcall.c behind it.

Par nuc1e0n

Supporter (6)

Portrait de nuc1e0n

23-08-2022, 07:09

Calling a function from c will always have some kind of overhead in that variables on the stack will have to be placed in the relevant registers and then retrieved again to do something with the values afterward. If you're looking for something more low level, z88dk has the __asm intrinsic to directly place asm code within the body of a C function. Here's an example of that which detects whether code is running on cpm, an msxdos1 or msxdos2:

int isMSX2(void) {
if(bdos(CPM_VERS, 0) == 0x22) {
__asm
ld a, 1
ld c, 0x6f
call 0x0005
or a
jr nz, NOTMSX2
ld a, b
cp 2
jr c, NOTMSX2
ld hl, 1
ret
.NOTMSX2
__endasm;
}

return 0;
}

Par aoineko

Paragon (1140)

Portrait de aoineko

23-08-2022, 09:02

In SDCC, you have 2 function directives that remove the overhead of function calls in C:
__z88dk_fastcall: which passes a single input parameter via registers.
__sdcccall(1): which allows to pass 1 or 2 parameters through registers (additional parameters are passed through the stack).
In both cases, the return parameter is passed through the registers.

With these directives, you have no C overhead.
For example, if you have a " double " function: u8 Double(u8 val) __sdcccall(1) { __asm__("add a"); }
A call to u8 ret = Double(10); will be translated into assembler in :

ld A, #10
call _Double ; return value in register A
...

Par nuc1e0n

Supporter (6)

Portrait de nuc1e0n

23-08-2022, 10:51

I wasn't aware of the __sdcccall(1) calling convention until now. That's pretty neat. However, as you say this approach wouldn't work for every combination of registers. Some __asm block might still be needed in cases that don't fit this approach right?

Also, is there a particular need for the inline wrapper in the first post of this thread? C compilers don't have to guarantee that functions marked as inline actually will be inlined. Perhaps you could add the type annotations to R_GTSTCK directly with something like this?

typedef u8 (*funccast)(u8);
const funccast Bios_GetJoystickDirection = (funccast)(R_GTSTCK);

Par aoineko

Paragon (1140)

Portrait de aoineko

23-08-2022, 14:22

nuc1e0n wrote:

Some __asm block might still be needed in cases that don't fit this approach right?

When it comes to our own functions, it is easy to use only the register combinations allowed by z88dk_fastcall and sdcccall1 and to avoid in most cases to go through the stack.
The problem comes mainly from the BIOS (or from the assembler libraries that we may want to use) that have not been designed for this constraint.

nuc1e0n wrote:

Also, is there a particular need for the inline wrapper in the first post of this thread? C compilers don't have to guarantee that functions marked as inline actually will be inlined. Perhaps you could add the type annotations to R_GTSTCK directly with something like this?

Without the inline directive you will have a call to a call (and therefore a loss of performance). To my knowledge SDCC does not have automatic inline function mechanics. In other words, functions without the directive will never be inlined.