Autor
| GFX9000 Faster but not fast...
| GhostwriterP msx addict Mensajes: 305 | Publicado: Noviembre 04 2005, 16:45   | Whoooo here comes a lot text your way!
Well let's start with saying that compared with a v9958 the v9990 is without doubt
faster. But how much faster? Well to figure that out, I ran a few small tests concernig
the copy command LMMM. Offcourse only on a few different display settings since
the test is more or less targeted on my "The Revenge of The Last Dragon" project.
Second, is for those who compare v9990 with SNES or Genesis (Megadrive). I have
to say that those video processors have all kinds of handy features wich makes
it very hard to compare to either one of them. Surely I believe it is possible to make
something like sonic, but I get back on that later.
Third, multilayer scrolling is other than using (obviously) P1 not possible fullscreen
at a 60 Hz framerate. For those who believe different, do not hesitate to prove me
wrong! 3 Layer scrolling in P1 deffinitely not possible, why? Just read on...
(ps also not on 50 Hz and no parralax techniques just simply multi-directional-
independent scrolling).
Fourth, well lets just say that people have the idea that GFX9000 has to look better
than MSX2, so lets go 8-bit colordepth display resolution of 512x424, result is that
you have to copy 8 times as much (4x for the surface and 2 times for the colordepth).
Question, can the vdp handle that much data? If an original game runs in screen 8 with
a lot off tpsets in it, it might verywell be possible (I am actualy sure of it). But please
keep in mind that copying an entire screen in one interupt is, as far as I know, a fable.
I know I know... I am a bit negative but, now follow a few tables with the test results.
Might be usefull for all off you who are thinking about making a GFX9000 application.
The test: How many 16x16 tiles can be copied in one int? The table shows colordepth,
number of 16x16 tiles and followed by the number of bytes of the LMMM copy command.
B1 60 Hz
color | 16x16 | bytes
----------------------
8-bit | 77 | 19712
4-bit | 106 | 13568
B1 50 Hz
color | 16x16 | bytes
----------------------
8-bit | 94 | 24064
4-bit | 127 | 16256
B1 60 Hz Overscan (192x212)
color | 16x16 | bytes
----------------------
8-bit | 58 | 14848
4-bit | 85 | 10880
Now a few things that I like to point out. Obviously overscan eats away some data
transfer time. But more interesting the amount off bytes in 8-bit mode is larger than
at a 4-bit mode however, in case of copied surface the 4-bit comes out as the winner.
Now there is a good possibility that those numbers are higher when it is done with one
single command (I waste some time on filling the regs  . But I think it provides a good
and practical indication.
So 20 kbyte in a simmular screen as sc8 is a lot. It is about 5 times faster. I was happy
and excited until I tested it on a 4-bit mode... the darn thing seems to lose some time
on that nyble (dots) thing, and copies 7 kb less. But it is still a bigger area and suited
my needs more than enough... at least so I thought.
Lets come back on the P1 mode. 2 Leyers 125 sprites 256 sprite patterns 4x16 colors.
Perfect for games, and yes almost like the Genesis. But now the reason why i am so
sad/negative/disapointed  It appeared that 256 patterns wasn't gonna be enough
all stuff needed to be animated and ther was no room for all those frames in SGEN.
So i thought copying frames to be a perfect solution, and i am not talking about every
small independent thing, but just the main charracters since they have the most frames.
All easy within the 13 kbyte, but in a P1 mode the memory is interleved and this causes
the number off bytes that can be transfered to drop once again, and in combination
with the sprites drop to... just see for yourselves.
P1 60 Hz
command| 16x16 | bytes
----------------------
LMMM | 34 | 4352
BMXL | 42 | 5376
Almost like you are in screen5 (on 50 Hz 192 lines). I am not happy... But at least
it is stil faster than v9958 even if it is not by far.
And do not forget about the 2 layers 125 sprites, after all it can't be copied either.
Oh... right... BMXL seams to be a bit faster. Not much especialy in the B1 modes namely
varying between 1 to 4 tiles more, but here a entire kilobyte ^_^ so just ignore all the
crazy talk and use those tables (or not) before starting a project thats not doable.
But something like sonic is still posible if you just keep low on the 'special' effects wich
are for granted on other (game) systems.
I feel a lot better know this is off my chest, and I do hope a part of this post is usefull for
some. And offcourse if I am wrong I would like to be corrected. You know, get some
feedback on the matter never hurts. | | msd msx professional Mensajes: 608 | Publicado: Noviembre 04 2005, 18:49   | Can I see the test code?
| | ARTRAG msx master Mensajes: 1592 | Publicado: Noviembre 04 2005, 19:04   | I do not know v9990 very well, could you present similar
figures for the v9938/58 in order to have a direct comparation?
What about "The Revenge of The Last Dragon" for standard msx2 ?
In TotalParody I started with a omnidirectional scroll with 2 layers in scr5 @50Hz
it isn't undoable even if very tricky.
| | msd msx professional Mensajes: 608 | Publicado: Noviembre 04 2005, 19:20   | Did you test it on a turbo r in r800 mode?
| | GhostwriterP msx addict Mensajes: 305 | Publicado: Noviembre 05 2005, 12:09   | Besides the in game test i used the following code.
And i tested it on 3.5 MHz and 7MHz and surely 7 MHz is a bit faster. So r800 will be a little
faster too, but i wanted to know what a Z80 can do. Wich answers ARTRAG's question I
did intend to make the game for standard msx2 with moonsound and gfx9000. Now I have
my doubts, so it is probably gonna be a turbor game (yes i am not giving up).
anyway the code:
org 100h
module main
xor a
; inc a ;overscan
out (67h),a
ld a,6
out (64h),a
ld a,10000010b ;6 8-bit cl
; ld a,10000001b ;6 4-bit cl
; ld a,00000101b ;6 P1
out (63h),a
ld a,00000000b ;7 pal 1000b
out (63h),a
ld a,10000000b ;8 ;
out (63h),a
ld a,0
out (63h),a ;9
out (63h),a ;10
out (63h),a ;11
out (63h),a ;12
ld a,00000000b ;13
out (63h),a
xor a ;14
out (63h),a
inc a
out (63h),a ;15 backdrop
dec a
out (63h),a ;adjust
out (63h),a ;scroll spul
out (63h),a
out (63h),a
out (63h),a
xor a
out (64h),a ; register select
out (63h),a
out (63h),a
out (63h),a
halt
halt
ei
/*
ld hl,gfxnaam ; Gfx laden
call df.BuildFCB
call df.OpenFile
call readOut16k
call readOut16k
call readOut16k
call readOut16k
call readOutColor
call df.CloseFile
*/
ei
ld b,200
wachtff
halt
djnz wachtff
di
1 in a,(65h) ;gfx9000 sync
and 64
jr nz,1b
1 in a,(65h)
and 64
jr z,1b
call CopyBlankPeriod
call CopyDisplayPeriod
ei
ld b,250
wachtff2
halt
djnz wachtff2
exit
ld hl,(COPIES)
ld (loWord),hl
call printaantal
xor a
ld ix,0d1h
ld iy,(0faf7h)
call 1ch
ld ix,141h
ld iy,(0faf7h)
call 1ch
ld ix,156h
ld iy,(0fcc0h)
call 1ch
ld de,Txthoi
ld c,9
call 5
ret
printaantal
ld ix,Txthoi
ld hl,hextab
ld d,0
ld a,(loWord+1)
srl a
srl a
srl a
srl a
ld e,a
add hl,de
ld a,(hl)
ld (ix+0),a
ld hl,hextab
ld a,(loWord+1)
and 15
ld e,a
add hl,de
ld a,(hl)
ld (ix+1),a
ld hl,hextab
ld a,(loWord)
srl a
srl a
srl a
srl a
ld e,a
add hl,de
ld a,(hl)
ld (ix+2),a
ld hl,hextab
ld a,(loWord)
and 15
ld e,a
add hl,de
ld a,(hl)
ld (ix+3),a
ret
CopyBlankPeriod
in a,(65h)
and 1
jr nz,CopyBlankPeriod
ld a,32
ld bc,16*256+63h
out (64h),a
xor a
out (63h),a ;32 source x
out (63h),a ;33
out (63h),a ;34 source y
out (63h),a ;35
out (c),b ;36 destination x
out (63h),a ;37
out (c),b ;38 destination y
out (63h),a ;39
out (c),b ;40 number of dots/bytes x
out (63h),a ;41
out (c),b ;42 number of dots/bytes y
out (63h),a ;43
out (63h),a ;44
ld a,11100b ;tpset pset 01100b
out (63h),a ;45 ;log op
ld a,255
out (63h),a ;46 ;write mask
out (63h),a ;47
ld a,52
out (64h),a
ld a,01000000b ;LMMM
; ld a,10000000b ;BMXL
out (63h),a
ld hl,(COPIES)
inc hl
ld (COPIES),hl
in a,(65h)
and 64
ret z
jp CopyBlankPeriod
CopyDisplayPeriod
in a,(65h)
and 1
jr nz,CopyDisplayPeriod
ld a,32
ld bc,16*256+63h
out (64h),a
xor a
out (63h),a ;32 source x
out (63h),a ;33
out (63h),a ;34 source y
out (63h),a ;35
out (c),b ;36 destination x
out (63h),a ;37
out (c),b ;38 destination y
out (63h),a ;39
out (c),b ;40 N x
out (63h),a ;41
out (c),b ;42 N y
out (63h),a ;43
out (63h),a ;44
ld a,11100b
out (63h),a ;45 ;log op
ld a,255
out (63h),a ;46 ;write mask
out (63h),a ;47
ld a,52
out (64h),a
ld a,01000000b
; ld a,10000000b ;BMXL
out (63h),a
ld hl,(COPIES)
inc hl
ld (COPIES),hl
in a,(65h)
and 64
ret nz
jp CopyDisplayPeriod
/*
readOut16k
ld hl,16384
ld de,8000h
call df.ReadFile
ld a,64
ld bc,60h
ld hl,8000h
1 otir
dec a
jr nz,1b
ret
readOutColor
ld hl,48*3
ld de,kleurtabel
call df.ReadFile
ld a,2
ld c,61h
ld hl,kleurtabel
2 ld b,128 ;kleuren doorsturen
1 outi
outi
outi
; out (c),0
djnz 1b
dec a
jr nz,2b
ret
*/
kleurtabel
block 192
gfxnaam
byte "TESTPLT1C64"
hextab
byte 48,49,50,51,52,53,54,55,56,57,97,98,99,100,101,102
Txthoi
byte "0000$"
loWord
word 0
COPIES
word 0,0,0,0,0,0,0,0
endmodule
; include dskio.i
end
| | ro msx guru Mensajes: 2320 | Publicado: Noviembre 05 2005, 12:29   | you're italian?
so uhrm, let's dump that whole gfx module and stop wasting time. we've gotta get with the program and do some standard msx(2) stuff again. enough with the extentions. psg for ever. screen5's pretty fast if you do some good coding. ooh common!
(nice article btw. thanx)
| | msd msx professional Mensajes: 608 | Publicado: Noviembre 05 2005, 12:31   | : You only need to write this once
out (63h),a ;44
ld a,11100b ;tpset pset 01100b
out (63h),a ;45 ;log op
ld a,255
out (63h),a ;46 ;write mask
out (63h),a ;47
Not for every command again
| | GhostwriterP msx addict Mensajes: 305 | Publicado: Noviembre 05 2005, 12:51   | I thought it might not be needed but it is a test and I just prefer to make a test
a bit slower so it is a better indication in real live programs. But it is defenitly
a bit faster (a lot if you have a lot of copies  ).
@ro: Everything is already done for msx2. I prefer something thats not done
before or i have not done before. So I stick to either msx or gfx9000 for now. | | ARTRAG msx master Mensajes: 1592 | Publicado: Noviembre 05 2005, 13:03   | @GhostwriterP
I am not so sure about the fact that everything has been done...
Look at the scr4 command topic for example, or at my scr5 platform with fine 2 layer scroll...
I could say that those exaples are only ideas that wait for being developed
in a true project...
| | GhostwriterP msx addict Mensajes: 305 | Publicado: Noviembre 05 2005, 13:27   | True, but I am not gonna be the one to develop that.
I am busy enough I would say (more than 3 projects) , and than there is vscreen
for those platform games, based on a already proven concept. I am afraid it is not
that appealing to me. But I like your effort though.
| | Maggoo msx professional Mensajes: 576 | Publicado: Noviembre 05 2005, 14:55   | @GhosWriter: I haven't used a V9990 in a looong time but I remember experimenting on it back in the good old days. That VDP is pretty fast (for copy commands and the like). What kind of speed issues or effect would you like to do that you think the V9990 can't do ? When it comes to scrollings and sprites, I think it's pretty close to what a Megadrive or a NEC PC Engine.
On a side note, I did my testing using a Turbo R which was a big plus. With the V9938/58, the VDP commands are kinda slow, which results in the Z80 waiting a lot. With the V9990, it was the opposite, the Z80 was slow compared to the VDP and R800 was improving things a lot. Anyway, if you plan on using the moonsound, you'll definetely need a R800 as the replayers are cosuming quiet some CPU time.
| | msd msx professional Mensajes: 608 | Publicado: Noviembre 05 2005, 14:58   | well 7.16Mhz should work too..
| | GhostwriterP msx addict Mensajes: 305 | Publicado: Noviembre 05 2005, 16:13   | magoo, I want several things but basicaly just a few copies. Not even logical. The basic
things like updating energy bar, a few numbers, player animation, village people, speach
balloons, etc. It all adds up and might suppase the 4-5 kbyte boundery.
Now i have to manage sprite patterns a bit smarter, by keeping as much patterns already
in the Sprite Generator table. I still think it can be done just with a little more effort, and
maby less or fewer details/animation frames.
I really want the game to run on 60 Hz, because scrolling on 30 Hz just looks terrible and 50 Hz
flickers to much in my eyes.
I have already made a (test)part of the game engine, with fake 3 layer scrolling using linesplits
on de 2nd layer, all running nice and smooth. Even on z80. But indeed since I intend to use
moonsound (and nothing else) for the music, I have serious doubts it will run fast enough on
a z80 when the game is finished. But theire is no harm in trying (is theire?) for all those who
don't have a turbor (myself included, but that will be fixed if I win msxdev'05).
And now to answer about that megadrive comparisson. A megadrive can build a list of scroll
offsets per line, and thus making it rather easy to implement 'bar' scrolls (like ground in Sonic2).
Plus different sprite sizes and mirror features that reduses the need for frames by half.
I don't know much about the PC Engine. But comparing to SNES is even more stupid... you know
3 layers, scaling/rotation, transparencies (not easy to mimmic with OR or AND) and sprite mirror
(same for tiles too). I might even left some things out.
But those game consoles do not have such fine bitmap modes, wich are pretty fast, but not fast
enough for a smooth multylayer scrolling game in 60 HZ. Something like coredump (tile scrolls)
could be done, or that stuff Artrag is working on for that matter.
I started the test because I disliked the fixed 16x16 sprite size and trowing away about 12 colors
due to multiple colors in different sets (not to mention transparent color 0). Further more I liked
the ability or functionality to mirrorcopy things (64-color mode). Wich is a must by the way 'cos
it takes twice as much memory to store the frames. But putting object as hole on the screen,
one copy, seems a lot easier than building objects out of sprites (I know I am lazy). Anyway I
concluded it was not going to be possible to scroll in several layers in a bitmapmode and now I am
back to the orriginal plan, with a few adjustments that is. Or at least with the 'limmitations' in mind.
| | POISONIC msx professional Mensajes: 883 | Publicado: Noviembre 05 2005, 17:34   | the V9990 is a powerfull GFX card.........to compair it to older msx vdp's is a bit lame..... if the v9990 Was in line with the rest of the msx gpus v9990 would be 100% backward compatible.......... btw the v9990 is so damm fast it has always has to wait.......
btw ever tried a 4 screen screen split in just a few lines of gbasic code?
the copy command is 30 times faster than on normal msx vdp so calling the v9990 a bit faster is a it lame......
| | Maggoo msx professional Mensajes: 576 | Publicado: Noviembre 05 2005, 18:27   | @GhostwriterP: I'm getting your point here. But as long as you'll develop on the MSX, it will always be all about constraints and finding ways around them. Even if making things with a V9990 is a LOT easier than it would be on "regular" MSX VDP. And if really you want to get all the power and features, get a PC and Blitz Basic  You'll get all the speed, sprites, and layers you need ! But of course it's not much of a challenge... | |
| |
| |