Thanks to Arjan and the aPLib routines, I have now a working VRAM depacker routine for BitBuster that uses no RAM buffer (its footprint is 186 bytes). It is almost ready for release, I just need to solve a last bug and make sure it is MSX1 & MSX2 compatible.
I made a speed test on a 256x64 screen 5 image between :
- a RAM depack followed by a BIOS LDIRVM RAM to VRAM transfer
- and a direct VRAM depack
The direct VRAM depack is 2.5 times slower.
only?
Yes ... The speed test was done using the $FCA2 interval counter (INTCNT).
After looking at the Pletter0.5b source, I can confirm that I will also write a VRAM depacker for it. There is no major difference in the algorythm core between Pletter and BitBuster. This means also that the speed difference factor between RAM and VRAM depack will probably be the same for Pletter.
No idea of the real numbers; but the unpack routine is bigger than the others, not very useful if you have in mind a 1Kb ROM
Those SMS aPLib routines are quite big and slow, I optimized the normal RAM version and got it down to 163 bytes (when optimized for size) / 186 bytes (when optimized for speed), and both are much faster than the original, and use only registers, no extra RAM for variables.
http://www.worldofspectrum.org/forums/showpost.php?p=224886&postcount=23
One more byte can be cutted from the small one, with this smaller ap_getbit:
ap_getbit: ld a,ixl add a,a jr nz,ap_getbitexit ld a,(hl) inc hl rla ap_getbitexit: ld ixl,a ret
And CNGSoft did an independient implementation of the aPLib routine, probably slower than mine, but more complete (I removed the compatibility with LZ distances > 32K since they are very rare in 8 bits, but his implementation keeps that), and can be configured to uncompress forwards or backwards (backwards only takes 160 bytes):
The direct VRAM depack is 2.5 times slower.
Hey!, that's really good numbers (only 186 bytes,? really?), excellent work Metalion!
Now, as Metalbrain has a optimized aPLib version, maybe the VRAM unpacking code could be added to the speed optimized version and try to benchmark it against the great version done by Metalion. I think that it must be really hard to get more speed than x2.5...
I think that it must be really hard to get more speed than x2.5...
Well ... I optimized the code for speed and came to 1.8 times slower and 166 bytes
Bug is solved, need to make it MSX2 compatible now. The major drawback is that the depacker needs to be executed between DI/EI because the VDP is accessed by the BIOS during the ints and mess up with the data transfer.
I plan to release a speed optimized version and size optimized version for both BitBuster and Pletter.
Any idea when you think you'll release them? Btw, does it work on MSX1?
Not bad, not bad at all! Looking forward to the final result
btw the best size optimized bitbuster decompressor I've seen was only 90 bytes or so.
Any idea when you think you'll release them? Btw, does it work on MSX1?
Yes it works on MSX1.
So far, only the [BitBuster 1.2 / MSX1 / Optimized for speed] version is ready.
I'd like to release all versions at the same time, but as I won't be able to work on them this coming weekend, it will probably be early next week.
BTW, I think it would perhaps be better to release seperate MSX1 & MSX2 versions instead of speed/size optimizations, because the MSX2 mods needed will probably slow down the depacker on a MSX1.
I do not understand why you need two different versions between MSX1 and MSX2
Maybe you are using VDP commands for MSX2... Are you ?
I do not understand why you need two different versions between MSX1 and MSX2
Maybe you are using VDP commands for MSX2... Are you ?
No ... The problem lies in the 16Kb VRAM limit inherited from the MSX1 VDP.
VRAM address set up for data access is different when working under $3FFF or above.
When you set up VRAM addresses above that, you need to use the r#14 register.
So it means extra code ... I will do a speed comparison to see if there is really a slow down.
BTW, just finished the Pletter VRAM depacker and I can say two things :
1 - Pletter is 25% quicker than BitBuster on the RAM depack / VRAM transfer test
2 - Pletter VRAM depacker is 2.3 times slower than the Pletter RAM depacker and 4% quicker than the BitBuster VRAM depacker