With the RLE method, drawing transparent sprites goes like this:
If color is 0, then do nothing Do not draw this chunk. No blitter setup, no blitter run.
Maybe some counting in cpu registers.
For things that are round or with lots holes, this too sounds like a winner.
The blitter takes roughly 1/3 time per pixel as the cpu zoom.
did you test this on real HW?
Openmsx has a vdp engine that is slower than the one on Bluemsx
IMHO the timings of the two emulators are inaccurate (expecially bluemsx, that has also some nasty bugs in the vdp command engine)
The distribution of column sizes in the game is much like drawing a triangle from 256 pixel column case (best case for RLE engine) to 0 pixel column case (worst case). When "the integral of a triangle = 0.5", I feel this means the "allowed overhead" figure is to be divided by 2.
This tells aslo that probably the best solution is to use both solutions:
for columns whose scaled size >= given treshold you use RLE + HMMV
for columns whose scaled size < given treshold you use cpu zoom + HMMC
The most interesting thing is that probably cpu zoom can be effective also with RLE encoded data
e.g. something like
.... ld b,<Run Length*scale> ld a,<color> core: out (0x9B),a djnz core ....
takes 26 cycles per pixel
This means that the very same RLE data could feed BOTH routines, HMMV zoom and the modfied cpu zoom + HMMC
OK,
I cannot really understand this latter proposal
If you work on horizontal lines you have to take care of celing and floor...
but those part of the screen usually (at least in my code) is not updated if not in the areas where the columns reduced its heigth across frames
There is no update when the column increases its own size
First of all It was not really meant as a serious solution, secondly an improvement would be swapping b and c, this way we could have 1 texture of 256x64 and keep it straight *no longer 90 degrees turned* in ram. The idea behind this is that you do not have four textures of 64x64 but 256 of 1x64 (w x h)... you can make a simple brick pattern with just four 1x64 textures for instance.
And this idea can be implemented either way.
Moreover your inner loop seems slower than the one proposed by hit9918 and by NYYRIKKI
Is it?
Thirdly, switching back to the vertical approach with HMMC, the inner loop can be rewritten to:
ld e,texture ld h,line ;high <offset_scale_by_line_table> ld l,depth .lus ld d,(hl) ; load texture offset inc h ; advance to next line ld a,(de) ; read texture byte out (9Bh),a ; out texture byte djnz .lus ;
Well how about that? Scales both ways (though limited to 256 scales total) or preforms any kind of transformation you store in the <offset_scale_by_line_table>.
And compared to the hit9918 version core, which counts 58 cycles, the above core counts 48 cycles .
The blitter takes roughly 1/3 time per pixel as the cpu zoom.
did you test this on real HW?
No, I remember blitter is in the same ballpark as cpu outi, a ballpark figure But I think that was copy speed. I just found an "msx assemply page" figure "HMMV 60Hz 212 lines sprites off 5888", I assume thats bytes per frame.
That would be 10.1 cycles per byte, 5.5x faster than the 58 cycle cpu loop. Except the blitter for every line got an overhead (I dont know), so drawing vertical columns would be slower.
Still this is just asking to be tried out.
This tells aslo that probably the best solution is to use both solutions:
for columns whose scaled size >= given treshold you use RLE + HMMV
for columns whose scaled size < given treshold you use cpu zoom + HMMC
An extreme version would be that every column got a threshold value estimated by the compressor. Because threshold depends on how many colorchanges the column got.
@ARTRAG, I think the result of the zoom mul needs to be a fractional lengh, or else the result will be a bad mess.
I did some code which would be the cpu RLE version:
;sp ->RLE stream. one byte amount of runs, then length, color, length, color ;hl'(exx) ->multable of current zoom stage (integer parts at hl'+256) ;ix: returnaddress drawcolumn: pop bc dec sp ld b,c ;amount of runs loaded into B ld a,0x80 ;reset fraction adder. starting with "0.5", just an idea. ;maybe later a fraction fed from colum draw might be fed here. ex af,af' ;fraction adder is in af' nextRLE: exx pop de ;d = color, e = length ld l,e ld c,(hl) ;hl -> multable fraction part. table must be 256 aligned. inc h ;hl+256 -> multable integer part ld b,(hl) dec h ;bc = lenght 8:8 fixpoint ex af,af' ;switch to fraction adder add c ;add lengh fraction part to adder jp nc,adderskip inc b ;fraction adder overflowed, draw one dot more adderskip: ex af,af' ld a,b and a jr z,zeropixels ;may happen with small zoom values ld a,d fill: out (0x9b),a djnz fill zeropixels: exx djnz nextRLE jp (ix) ;return after SP abuse
I'm on a script to produce RLE encoded textures
I think that this solution could lead to a real game, all becomes matter of trade-off between vetical detail of textures and framerate
Off topic
http://spectrum.ieee.org/consumer-electronics/gaming/the-wizardry-of-id/0
Tool released (matlab needed)
https://sites.google.com/site/devmsx/rle-textures
It just encodes all the png images it finds in the directory where you put the files
I think the RLE approach is very very promising.
The following 8 textures (actually 4, seen with two levels of light)
https://sites.google.com/site/devmsx/rle-textures/rletextures.bmp?attredirects=0
are stored in 6644 bytes
The plain data for the old (current) code (and is -should be- slower) need 16384 bytes !
same link, new 8 textures compressed in 5712 bytes
the tool now generates also pointers to columns
The following 8 textures (actually 4, seen with two levels of light)
Make the palette so colors with top bit set are twice as bright.
Then turning on brightnes is just another OR 0x88 on the COLOR value.
A more extreme version would send the COLOR values thru a lookuptable.
Original DOOM looked like doing this.
This allows to use much more colors, in our case images could be of all 16 colors.
The palette just needs enough variety for darker stages to look ok.
e.g. an image got a bright brown pixel and a dark brown pixel.
in the darkened stage it would be two dark brown pixels.
in an even darker LUT, it would appear as a dark brown pixel next to a black one.
The LUT lookup increases setup overhead in the RLE version. While in classic zoom it affects the central per-pixel-draw-loop. Another case where the RLE version is fast.
p.s. if one got e.g. bright green and bright red for lights, one could force their LUT values to stay bright, this makes lights in the dark.
When I started to optimize the routine I wrote, I actually ended up to exactly same as hit9918
I don't have compiler or MSX emulator on this machine, so I don't know if this works or not...
ORG #9000 DB 0, low (16384),low (16384/2),low (16384/3) ... ,low (16384/254), ,low (16384/255) DB 0, high(16384),high(16384/2),high(16384/3) ... ,high(16384/254), ,high(16384/255) ; input HL -> texture colum: 64 bytes ; input A = final size in [1...255] ; input DE = (X,Y) coords for the starting point where to plot the scaled colum scaler: ;#9200 LD IXL,A LD B,A LD C,#9B LD A,#24 LD (#99),A LD A,#91 OUT (#99),A OUT (C),E OUT (C),0 OUT (C),D OUT (C),0 OUT (C),0 OUT (C),0 OUT (C),B OUT (C),0 LD A,(HL) OUT (#9B),A OUT (C),0 LD A,#F0 OUT (#9B),A LD A,#AC LD (#99),A LD A,#91 OUT (#99),A LD L,B LD B,H ; (Texture starts also from 256 byte boundary) LD H,#90 LD E,(HL) INC H LD D,(HL) LD HL,0 CORE: ADD HL,DE LD C,H LD A,(BC) OUT (#9B),A DEC IXL JP NZ,CORE ret
@NYYRIKKI
I was studing your code to adapt it to RLE sequences
What is this section for?
Why do you set R#17 = 172 ?
LD A,#AC LD (#99),A LD A,#91 OUT (#99),A