# Fastest possible multiplication routine?

Page 3/7
1 | 2 | | 4 | 5 | 6 | 7

Very interesting multiplication routine can be found also here:
http://www.mccw.hetlab.tk/92/Multiplication/en.html

Ah, now I got it, what you mean with using lookup tables.
In this case you can also have a look at this:
http://www.cpcwiki.com/index.php/Programming:Integer_Multiplication#Fast_8bit_.2A_8bit_Unsigned_.28using_log_.2F_antilog_tables.29
Works even with bigger numbers, but maybe not always exact.

I think the multiplication algorithm to choose depends on the program using it. The table version NYYRIKKI shown is very nice for small numbers -64 to +63, but for bigger numbers it slows down a bit because the lookup math exceeds 8 bits and of course the table grows. But for many cases especially on an MSX with a low resolution, the range is enough. Prodatrons algorithm has a much bigger range one 8 bit value -128 to 127 multiplied with a 16 bit value and its only about 50% slower than the table version.

Btw, one of my IOCCC entries uses the same hyperbolic functions as the table based multiplication algorithm, but in this case to generate prime numbers.

About R800 multiplications, how to get signed multiplications in a better way that this?

```;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;	Signed int 16 bit x 16 bit -> 32 bit multiplication for r800
;	Called with 1st arg in DE, 2nd arg in BC. Returns with
;	HLDE = DE*BC

global _imul

_imul:
ex	de,hl				; HL*BC

bit	7,b
jp	nz,bc_is_neg
bit	7,h
jp	nz,hl_is_neg
; HL>=0 and BC>=0
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
ret

hl_is_neg:					; HL<0 and BC>=0
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
and		a
sbc		hl,bc			; hl*bc + -1*bc<<16
ret

bc_is_neg:

bit	7,h
jp	nz,bc_hl_are_neg
; HL>=0 and BC<0
push	hl
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
pop		bc
and		a
sbc		hl,bc			; hl*bc + -1*hl<<16
ret

bc_hl_are_neg:
; HL<0 and BC<0
push	hl
push	bc
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
pop		bc
and		a
sbc		hl,bc			; hl*bc + -1*hl<<16
pop		bc
and		a
sbc		hl,bc			; hl*bc + -1*hl<<16 + -1*bc<<16
ret
```

I thinks the sign verification can be done faster

rules
+ + = +
- + = -
+ - = -
+ + = +
that is just a XOR true table

so,

ld a, b
xor h
and \$80
jr z, sign_positive

sign negative

saving all those BIT and separate JR of two stages.

also it means, short code.

anyway, I can't to understand why it executes a sbc instruction for converting the sign when you needs to get a minus

can you to expain why?

and anyway, atleast in the + + = +, you aren't checking looking for overflows. It can return with the sign bit activated.

You already tested the code in all convinations?

I concluded that the routine can't to work, because.....

for multiplication you needs first two absolute positive values, then do UNsigned mult, then to resolve the sign..... and to checks looking for overflows.

now, after having an absolute result, yo needs to do two's complement, that has nothing to do with substracting one of the inputed values....

and less, if it only affects the HIGH WORD of the result, when two's complements needs to convert the 32bits result, not just the 16bits high word.

so, from where you copy that routine?

1) the code works perfectly
2) the -1<<16 is a trivial sign extension to 32 bit of a negative value on 16 bits
3) do one line of pencil+paper math an you will see why each branch does exactly what it should
4) the sole trick is that the 4th term in the - - branch goes outside the 32bits and disappears
5) there is no code on the net for signed multiplications on r800, but you can copy my routine if you want

In case one line of algebre is too much:

if X16<0

X32 = X16 + (-1)<<16

thus

X32*Y32 = X16*Y32+ ( (-1)<<16 ) * Y32

do the rest on Y32 as exercise ;-)

About R800 multiplications, how to get signed multiplications in a better way that this?

First of all, I can confirm the math is correct.
@flyguille: your suggestion of doing 'abs(x) * abs(y)' + some sign correction on the result is also correct. But it's not the only way, and most likely not the fastest way. Remark that if you do a 16-bit-signed * 16-bit-signed multiplication and are only interested in the lower 16-bit of the (32-bit) result, you can simply perform an UNSIGNED multiplication and ignore the upper 16-bits. The lower 16-bits will be indentical as if you had performed a true signed multiplication. Just do the math and you'll see.

@ARTAG: I can only see a few minor tweaks to speedup your routine:

1) In the 'bc_hl_are_neg' case it is not needed to PUSH/POP register BC.

2) In the 'bc_is_neg' case, you have to PUSH/POP a register. In the 'hl_is_neg' case this is not required. PUSH and POP are relatively expensive instructions on R800 (respectively 6 and 5 cycles). So i think it's faster to instead swap HL and BC and avoid the PUSH/POP. Note that swapping HL, BC can be done in 5 cycles (DE <- BC / EX DE,HL / BC<-DE) (the 'obvious' way requires 6 cycles).

3) On R800 a JR instruction is always faster than a 'JP' instruction. See http://openmsx.svn.sourceforge.net/viewvc/openmsx/openmsx/trunk/doc/r800test.txt?revision=11645&view=markup for some R800 timing measurements I did for openMSX.

4) You may want to switch the 'NZ' condition to 'Z' in all your jump instructions, and of course also swap the location of the different cases accordingly. At the moment the 'hl_pos_bc_pos' case is the fastest, both because that routine itself is the fastest but also because on the path leading to that routine, the jumps are all 'not-taken' and thus execute slightly faster. By swapping the condtions, you won't increase the average execution time of your routine, but it will bring the best and worst execution time (a little) closer together.

5) (detail) Your routine starts with a 'EX DE,HL' instruction. Personally I prefer to drop that instruction and change the calling convention of the routine.

Thanks for the hints: extra push/pop removed and jp's changed to jr's !

Do you know how is the C flag after muluw_hl_bc ?
maybe I could also skip the and a 's if I were sure C us reset....

PS
I need the DE thing at point 5) owing to compiler conventions in passing parameters
Same thing for the result that has to be in HLDE

pps current version

```;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;	Signed int 16 bit x 16 bit -> 32 bit multiplication for r800
;	Called with 1st arg in DE, 2nd arg in BC. Returns with
;	HLDE = DE*BC

global _imul

_imul:
ex	de,hl				; HL*BC

bit	7,b
jr	nz,bc_is_neg
bit	7,h
jr	nz,hl_is_neg

bc_hl_are_pos:
; HL>=0 and BC>=0
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
ret

hl_is_neg:					; HL<0 and BC>=0
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
and		a
sbc		hl,bc			; hl*bc + -1*bc<<16
ret

bc_is_neg:
bit	7,h
jr	nz,bc_hl_are_neg
; HL>=0 and BC<0
push	hl
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
pop		bc
and		a
sbc		hl,bc			; hl*bc + -1*hl<<16
ret

bc_hl_are_neg:
; HL<0 and BC<0
push	hl
muluw_hl_bc				; result in DEHL
ex		de,hl			; result in HLDE
and		a
sbc		hl,bc			; hl*bc + -1*hl<<16
pop		bc
and		a
sbc		hl,bc			; hl*bc + -1*hl<<16 + -1*bc<<16
ret
```
Page 3/7
1 | 2 | | 4 | 5 | 6 | 7