Hardware differences in Dragon Slayer IV speedruns

Страница 1/2
| 2

By NLeseul

Supporter (13)

Аватар пользователя NLeseul

30-11-2021, 08:23

I've been speedrunning a few games on MSX, most notably Dragon Slayer IV.

Something I noticed when I was reviewing at my most recent run of the MSX1 version is that the transitions between screens seemed to be taking a bit longer than the Japanese record for that version. When I counted the black frames in a particular transition in both runs, they were measurably different; my run had a transition lasting around 64 frames; the record had 48 frames in the same transition. I'm running on a physical FS-A1WX. and I have no idea what the Japanese runner is using.

There are a lot of screen transitions in DS4, so a difference in the load time through these transitions will definitely have a significant impact on the time of a run. As far as I can tell from some fiddling with ffmpeg and some fudgey math, about 10 minutes out of my ~50 minute run was black frames during these loads. If I'd been running on a machine like the record's that did the loads in about 2/3 the time, the overall time would have been about 2 minutes faster, as far as I can tell.

I cross-checked against a few random machines in openMSX, and, as far as I can tell:

  • openMSX's FS-A1WX implementation seems to match my real FS-A1WX, at around 64 frames.
  • Two other MSX2+ machines (the Sony HB-F1XDJ and the Sanyo PHC-35J) pretty much matched that, although the Sony was a tiny bit faster.
  • The Panasonic FS-A1, as a MSX2 test case, took about 48 frames, which matches the record run.
  • Several random MSX1 machines (Sanyo MPC-100, Yamaha YIS-503F, Yashica YC-64) all had roughly consistent time at ~36 frames.
  • Another MSX1 machine, the Casio MX-10, took a little longer at ~44 frames.
  • Using openMSX's included C-BIOS settings, the MSX1 JP C-BIOS was pretty consistent with the real MSX1 machines at 37 frames, but the MSX2+ JP C-BIOS was significantly faster at 40 frames.

So, assuming that openMSX's results are accurate, I'm just trying to figure out what conclusions to draw from this. It looks like an MSX1 game like this would probably run significantly better on native MSX1 hardware, which probably stands to reason. But some MSX1 machines like the Casio MX-10 might still be outliers? We could certainly pick an MSX1 configuration known to have good performance and recommend that setting for emulator users, but it's annoying having to tell physical MSX owners who don't own the right machine that their runs will be at a disadvantage. And are there other timing differences that are harder to measure? I know some rooms with lots of sprites feel distinctly laggy when I play, and I'm not sure if I see the same amount of lag in other recorded runs, but that might just be the difference between playing and watching. And is every MSX game that people speedrun going to need the same amount of analysis to determine the optimal machine eventually?

(Not that any of this matters practically in the near term; I don't think there are going to be enough people running the MSX1 version of DS4 for hardware variation to affect the ranking of runs any time soon.)

I'm mainly just info-dumping right now, I guess, and wondering if anyone who knows more about MSX hardware can explain what it is about each of these configurations that makes screen transitions perform differently, and how other MSX games might be affected for speedrunning purposes, and what advice I might give to future MSX speedrunners who want to play on real hardware.

For what it's worth, the only speedrunning community for an MSX game I've found that's at all active and that has made official rules about valid hardware is the one for the MSX version of Metal Gear. They seem to have chosen the FS-A1WSX as their standard platform, but I don't know what reasoning led them to that choice.

Для того, чтобы оставить комментарий, необходимо регистрация или !login

By Manuel

Ascended (18844)

Аватар пользователя Manuel

30-11-2021, 18:23

Wow, how very cool that you're speedrunning MSX game(s) and even on real hardware! Smile And welcome to the msx.org forum!

You bring up some interesting points... my first guess: the game uses certain BIOS routines and some BIOSes have a more optimum implementation of these than others.

I've been involved (mostly in the past) in tasvideos.org and there we did settle on the FS-A1WSX just to get a standard machine (that has many features). It may not have been the best choice in hindsight, but at least it made runs comparable.

For real hardware runs it is of course a bit more demanding to require certain specific MSX models...

By NLeseul

Supporter (13)

Аватар пользователя NLeseul

30-11-2021, 20:34

Oh, neat! I've been using openMSX's TAS features a fair bit to investigate RNG manipulation. I'm interested in working on full TASs one day, but they're really time-consuming, and openMSX doesn't have the greatest workflow for it. (Maybe one day we'll get an openMSX core for BizHawk...)
It would definitely be interesting to consider if any of the existing MSX TASs on the A1WSX could be improved significantly by running on a native MSX1 machine.
I guess I should probably see if I can get some logging from openMSX to see what's actually going on during those screen transitions. If it could be tracked to differences in specific BIOS calls, then that might be useful information when looking for other possible performance differences.
I suspect that all it's doing during the transitions is loading room data from the ROM, processing it into a pattern table, and sending the pattern table data to the VDP. Would that have to go through a BIOS call? I don't know much about the MSX VDP yet, but I'd think that interfacing with it would just require writing to special-purpose addresses with normal load/store instructions.
My thinking is that the main difference comes from the newer VDP chips having less efficient implementations of the older MSX1 screen modes, since the timing is mostly fairly consistent within VDP generations. BIOS-related differences might be responsible for outliers like that Casio machine, though.

By Grauw

Ascended (10602)

Аватар пользователя Grauw

30-11-2021, 20:58

NLeseul wrote:

I suspect that all it's doing during the transitions is loading room data from the ROM, processing it into a pattern table, and sending the pattern table data to the VDP. Would that have to go through a BIOS call? I don't know much about the MSX VDP yet, but I'd think that interfacing with it would just require writing to special-purpose addresses with normal load/store instructions.

The MSX VDP uses indirect access via I/O ports rather than memory addresses, which is different from e.g. the ZX Spectrum, and the MOS 6502 CPU based computers (which doesn’t have a concept of I/O ports) like the C64.

But most commercial MSX games, especially those from Japan, do everything through the BIOS. For newer MSX computers they needed to make room so some functions were moved to a place with slower access, though I’m not sure this applies to the VRAM access functions. Additionally the interrupt handler (which takes a % of time) also gains complexity with new generations and in general it can vary somewhat in time based on region, presence of disk drives, etc.

NLeseul wrote:

My thinking is that the main difference comes from the newer VDP chips having less efficient implementations of the older MSX1 screen modes, since the timing is mostly fairly consistent within VDP generations.

The newer V9938 and V9958 VDPs allow for faster access than the MSX1’s TMS9918. But software always accesses at the same speed no matter the VDP.

There are some complicating factors like some MSX2(+/tR) computers adding an extra wait cycle or two if they use the T9769 MSX-ENGINE. But such a tiny delay doesn’t account for a 33% difference.

Honestly I find a 33% up to almost 80% difference very extreme and difficult to explain in the first place, I would expect differences the be in the range of just (fractions of) percents. Maybe the screen change routine contains a couple of HALT instructions and if there are small speed differences it misses an interrupt and ends up just spending a lot of time idling.

By PingPong

Prophet (3898)

Аватар пользователя PingPong

30-11-2021, 21:11

Maybe the difference is in the BIOS?
For example i remember that circle screen drawing routine is a lot slower on screen 3 on MSX2 machines vs MSX1, despite the fact that the MSX2 VDP is almost twice as faster than TMS one. The reason is that the routine has moved in another slot requiring a slot switch when plotting the circle

By Grauw

Ascended (10602)

Аватар пользователя Grauw

30-11-2021, 21:35

I debugged the game code a little bit, it has a large interrupt handler hooked to H.KEYI which intercepts the standard ISR and does VRAM transfers, but it enables interrupts in the process, so the interrupt is reentrant. I can observe it actually reentering by setting breakpoints on 4321H and 438EH, the start and end of the ISR, which they should alternate but instead you hit the breakpoint on 4321H twice in a row etc.

This is probably is a bug, it doesn’t render the game unplayable but does cause certain parts to repeatedly execute depending on the precise timing of the interrupts.

EDIT: Actually now I can’t get those breakpoints to retrigger anymore. Maybe I was clicking through it too quickly and I misread what I saw.

By gdx

Enlighted (5572)

Аватар пользователя gdx

01-12-2021, 01:53

Programmers who used to program for other machines often considered interrupt timings to be fixed, but on MSX this is not the case. This may vary from machine to machine, which can cause oddities.

By NLeseul

Supporter (13)

Аватар пользователя NLeseul

01-12-2021, 04:08

Thanks for the guidance. Things are beginning to make some sense to me.

It looks like if I randomly break in the debugger during the load, I tend to be in the BIOS code that handles BIOS call 0x5c (copy from memory to VRAM). So it seems like the bulk of what the load is doing is indeed copying into VRAM via the BIOS.

There are definite differences in the BIOS code that implements that call, and it generally grew in complexity across generations. The Yashica MSX1's version is 10 instructions across 14 bytes, whereas the Panasonic MSX2+'s version is 24 instructions across 42 bytes. The MSX2+ C-BIOS is 19 instructions across 31 bytes, and has simpler branching behavior internally. It looks like the load in DS4 is making a lot of small writes to VRAM, so the impact of those differences in complexity will definitely accumulate.

I don't think that specific BIOS call is enough to account for the differences completely; the Casio machine's implementation of that call is identical to the Yashica's, and the Casio is definitely slower regardless, and the FS-A1's version is identical to the FS-A1WX's as well. But I wouldn't be surprised if there are differences in other BIOS calls.

I'm not confident enough in my Z80 assembly to read too much into the code, but it looks like the fancier MSX2/2+ versions have an extra branch that takes advantage of the otir instruction to do a block write to an I/O port in some situations? But that condition never triggers during DS4's screen transitions, for whatever reason, so it eats the cost of the extra conditional branches without benefiting from the optimized write.

Edit: Found the system variables documentation on here. The conditions the Panasonic BIOS is checking are the values of 0xfcaf and 0xfafc, which are the current screen mode and a set of flags pertaining to it. I think it's using the optimized code only in MSX2 screen modes?

By NLeseul

Supporter (13)

Аватар пользователя NLeseul

01-12-2021, 05:55

Well, I just realized that all three of the random MSX1 machines I initially checked in openMSX were actually European machines, so they were running at 50 fps. Oops.

The Casio machine I tested later, which had a longer load time of 44-ish frames, would have been running at 60 fps as expected. I get a similar result cross-checking on Sony HB-10. So that's probably a more accurate expectation for that load time on region-accurate hardware—and it means that C-BIOS is actually a little too fast. And differences in the complexity of that VRAM write routine described above probably account for the difference between a 44-frame MSX1 load and a 48-frame MSX2 load.

Still no idea why the MSX2+ machines seem dramatically slower, though.

By gdx

Enlighted (5572)

Аватар пользователя gdx

01-12-2021, 08:54

Yes, the BIOS calls take longer or shorter depending on the machine and the generation. The same goes for the interrupt routine. All this must be taken into account when developing on MSX, especially during an interruption. For exemple, if an interrupt occurs while they are disabled, it causes a shift. And it is also necessary to take into account the region of the machine (50 or 60 Hz). There may also be different components that can influence.

ASCII should have done specific routines for each generation using a Mapper for the Main ROM instead of adding a Sub-ROM which quickly filled up and takes a slot.

By ducasp

Hero (560)

Аватар пользователя ducasp

01-12-2021, 12:09

NLeseul wrote:

Thanks for the guidance. Things are beginning to make some sense to me.

It looks like if I randomly break in the debugger during the load, I tend to be in the BIOS code that handles BIOS call 0x5c (copy from memory to VRAM). So it seems like the bulk of what the load is doing is indeed copying into VRAM via the BIOS.

There are definite differences in the BIOS code that implements that call, and it generally grew in complexity across generations. The Yashica MSX1's version is 10 instructions across 14 bytes, whereas the Panasonic MSX2+'s version is 24 instructions across 42 bytes. The MSX2+ C-BIOS is 19 instructions across 31 bytes, and has simpler branching behavior internally. It looks like the load in DS4 is making a lot of small writes to VRAM, so the impact of those differences in complexity will definitely accumulate.

I don't think that specific BIOS call is enough to account for the differences completely; the Casio machine's implementation of that call is identical to the Yashica's, and the Casio is definitely slower regardless, and the FS-A1's version is identical to the FS-A1WX's as well. But I wouldn't be surprised if there are differences in other BIOS calls.

I'm not confident enough in my Z80 assembly to read too much into the code, but it looks like the fancier MSX2/2+ versions have an extra branch that takes advantage of the otir instruction to do a block write to an I/O port in some situations? But that condition never triggers during DS4's screen transitions, for whatever reason, so it eats the cost of the extra conditional branches without benefiting from the optimized write.

Edit: Found the system variables documentation on here. The conditions the Panasonic BIOS is checking are the values of 0xfcaf and 0xfafc, which are the current screen mode and a set of flags pertaining to it. I think it's using the optimized code only in MSX2 screen modes?

If you check Grauw's post, game interrupt code allows re-entry of ISR while performing an ISR. PAL MSX machines have more time between interrupts as they have 50 over a second, msx1 machines have faster ISR routines, while looking at a Japanese 2+ you are getting the worse of both, it is possible to get ISR re-entry both because interrupts are more frequent (60 times over a second) as well as 2+ ISR being slower (one of the reasons, it runs a slow keyboard routine more often). So it is more likely that the game will have to re-do the execution of its interrupt hook because it was interrupted before finishing.

Страница 1/2
| 2