Fastest possible multiplication routine? (Development Foros MSX)MSX Resource Center            
                       
English Nederlands Espa�ol Portugu�s Russian                  
 Noticias
   Página principal
  Almacén de noticias
  Temas de noticias

 Recursos
   Foros MSX
  Artículos
  Analisis
  Informe de ferias/RUs
  Álbum de fotos
  Ferias y encuentros
  Encuestas
  Enlaces
  Buscar

 Software
   Descargas
  Tienda Online

 MRC
   Quiénes somos
  Únete a nuestro equipo
  Donar
  Políticas
  Contacta con nosotros
  Enlázanos
  Estadísticas

 Buscar
 
  

  

 Login
 

Login

Contraseña




¿Aún no tienes una cuenta? ¡Conviértete en miembro del MSX Resource Center! ¡Únete a nosotros!.


 Estadísticas
 

Hay 42 invitados y 3 miembros en línea

Eres un usuario anónimo.
 

Foros MSX


Foros MSX

Development - Fastest possible multiplication routine?

Ir a la página ( 1 | 2 Siguiente página )
Autor

Fastest possible multiplication routine?

Prodatron
msx master
Mensajes: 1113
Publicado: Mayo 05 2007, 01:34   
Hi,

today I stumbled over a routine I developed for my old 3D demos and which I uploaded to the CPCWiki some time ago:

http://www.cpcwiki.com/index.php/Programming:Ultrafast_Multiplication

As this is 100% Z80 and not platform related it maybe interesting for you.
AFAIK it's the fastest way of multiplication which is possible on Z80 for this precision (16bit x 8/7bit).
I wonder, if it could be even done faster, and how much faster the R800 does with its inbuilt MUL-commands?

CU,
Prodatron
dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 02:08   
The code is a bit hard to read. How ultra fast is it?
Prodatron
msx master
Mensajes: 1113
Publicado: Mayo 05 2007, 02:18   
Sometimes I like to use ":" and put multiple commands in one line
The whole thing uses (more or less) ONE routine for EACH rigth-side value, that's it. The speed depends on the rigth-side value (8/7 bit) as different routines are selected then.
dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 02:20   
Do you have some cpu cycle numbers? worst case/best case/ average...

Actually I guess I can do that myself. Its not that hard...
dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 02:33   
By doing shifts on DE (or BC) and accumulate on HL, you'll be faster in some cases

MUL073 for example can be implemented like this (input args may need to be modified to fit your code):

MUL073:
        ld      h,d
        ld      l,e
        sla     e
        rl      d
        sla     e
        rl      d
        add     hl,de
        sla     e
        rl      d
        sla     e
        rl      d
        add     hl,de 
        ld      a,h
        ret


Saves 26 cycles if I did the math correct.
dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 02:46   
Also, why do you use:
        PUSH BC
        RET

isntead of
        JP (HL)

dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 02:52   
I did a similar routine not too long ago. Don't have it on this computer but the I think I did the table lookup like this:

Mul:
        ld      h,MulTab
        add     hl,hl
        ld      b,(hl)
        inc     l
        ld      h,(hl)
        ld      h,b
        jp      (hl)

Saves another 20 cycles
nikodr
msx addict
Mensajes: 491
Publicado: Mayo 05 2007, 04:35   
Anyone with a real turbo-r to tell us about the r800 mul routines?Actually i am interested to buy a turbo-r by the way.How much would a good price be? 200-250 euros?
nikodr
msx addict
Mensajes: 491
Publicado: Mayo 05 2007, 04:35   
Sorry for double post
dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 07:00   
I think some of the cases can be optimized quite a bit. here are some examples.
These particular cases are maybe 10-30% faster. But I think most of the cases in your routine are close to optimal. But if say 20% of the cases can be made significantly faster, I think it will be noticeable on the overall performance.

MUL007:
        ld      d,h
        ld      e,l
        add     hl,hl
        add     hl,de
        add     hl,hl
        add     hl,de
        ld      a,h
        ret

MUL011:
        ld      d,h
        ld      e,l
        add     hl,hl
        add     hl,hl
        add     hl,de
        add     hl,hl
        add     hl,de
        ld      a,h
        ret

MUL063:
        ld      d,h
        ld      e,l
        rr      h
        rr      l
        rr      h
        rr      l
        xor     a
        ld      h,l
        ld      l,a
        sbc     hl,de
        ld      a,h
        ret

MUL064:
        rr      h
        rr      l
        rr      h
        ld      a,l
        rrca
        ret

MUL065:
        ld      d,h
        ld      e,l
        rr      h
        rr      l
        rr      h
        rr      l
        ld      h,l
        ld      l,0
        add     hl,de
        ld      a,h
        ret

MUL073:
        ld      d,h
        ld      e,l
        sla     e
        rl      d
        sla     e
        rl      d
        add     hl,de
        sla     e
        rl      d
        sla     e
        rl      d
        add     hl,de 
        ld      a,h
        ret

MUL127:
        ld      d,h
        ld      e,l
        rr      h
        rr      l
        xor     a
        ld      h,l
        ld      l,a
        sbc     hl,de
        ld      a,h
        ret

dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 08:08   
Some of the versions I put here may be wrong. I'll try to find the real implementation and post it here
arnold_m
online
msx lover
Mensajes: 86
Publicado: Mayo 05 2007, 09:13   
Quote:

By doing shifts on DE (or BC) and accumulate on HL, you'll be faster in some cases

MUL073 for example can be implemented like this (input args may need to be modified to fit your code):

MUL073:
        ld      h,d
        ld      l,e
        sla     e
        rl      d
        sla     e
        rl      d
        add     hl,de
        sla     e
        rl      d
        sla     e
        rl      d
        add     hl,de 
        ld      a,h
        ret


Saves 26 cycles if I did the math correct.



When I do the maths i find that this is a MUL021 routine.
My suggestion:
MUL073:
      ld e,l
      ld d,h
      add hl,hl
      add hl,hl
      add hl,hl
      ex de,hl
      add hl,de
      ex de,hl
      add hl,hl
      add hl,hl
      add hl,hl
      add hl,de
      ld a,h
      ret

The instructions
sla e
and
rl d
are relatively expensive two byte instructions.
I count 118+14 cycles for my code above. Prodatron's routine has the same cycle count.
The code of dvik costs 138+32 cycles after correction.

Using
   jp (hl)

instead of
   push bc
   ret[

is possible, but then you need to put the
   ex de,hl

in each of the individual multiplication routines, making the fall-through pattern unusable, so you get severe code-bloat.
GhostwriterP
msx addict
Mensajes: 320
Publicado: Mayo 05 2007, 09:14   
Well, this seems like a good example for a library.
dvik
msx master
Mensajes: 1376
Publicado: Mayo 05 2007, 09:32   
Yeah the MUL73 is not correct. I think I remembered wrong. The shifts are quite nice in some cases even though this was a bad example.

The jp (hl) does indeed make the fallthroughs not work but it saves like 10-15% cpu time. With other changes I think the average execution time could be decreased by at least 20%.

The cost with one optimized for speed is of course bigger code size. This implementation is rather big as well but in apps with a lot of multiplications the extra code size may be worth it.
Prodatron
msx master
Mensajes: 1113
Publicado: Mayo 05 2007, 11:05   
Hey cool, thanx a lot for this feedback! I will update the article with your optimized routines.
Regarding jp (hl) / push bc:ret - yes, it's like both of you say. So I shouldn't call the "routine" in the article "fastest possible" anymore, but there should be the existing one, which saves some memory, and the really fastest one, which has one completely own sub-routine for every value.
 
Ir a la página ( 1 | 2 Siguiente página )
 







(c) 1994 - 2009 Fundación MSX Resource Center. MSX es una marca registrada de MSX Licensing Corporation