Playing back saved replays leading to different results

Door mi-chi

Scribe (37)

afbeelding van mi-chi

23-07-2022, 17:18

Just curious: I was under the impression that savereplays, when played back, should always lead to the same result, especially if played with the "-viewonly" switch, which prevents accidental interference of user input and "taking over" of the playback from that point on.

Now I see a strange result that when jumping to a specific position in time *after* a specific event that I want to diagnose, the event did happen (because a specific score has been reached). Now if I jump backwards in time (or forward to a point before the event), and let the playback continue, it does not happen: The playback continues to the end, but the scoring in the game lead to different results, and in fact, even the input (which is observable with a cursor on the screen) no longer seems to match to what is displayed.
When forwarding (via the timeline), the playback remains in the wrong state (score, progress in the game), but when jumping back, it suddenly recovers to the correct state. However, immediately from there things seem to be played back wrong again.

The replays have been saved with "reverse savereplay filename" and loaded with "reverse loadreplay -viewonly filename", and it happens almost reliably with my test application on different PCs, and is independent of whether the recording was done on the same or on another PC, and not only in the latest release (0.18), but even in the previous openMSX version, 0.17.

The test scenario is a MSX-1 game (ROM), and other than playing back PSG music, there are no exotic things or hardware used in the emulation, just a plain MSX 1 with no extensions other than the debugdevice. The effect is also reproducible with that game on different MSX types (e.g. MSX-2).
It's a megarom, ~ 300 KB, and other than the notification "The size of this ROM image is larger than 256kB, which is not supported on real Konami mapper chips!", there are no messages in the status info of openMSX.

However, it is not easy to create a reproducible isolated environment in terms of writing a simple program (simple in terms of lines-of-code) to demonstrate the effect - I just couldn't make it happen in a simple scenario.
I know that this is impossible to diagnose without a working example, but before going to that length, my question is if this an already known issue under certain circumstances or even an expected behavior.

Aangemeld of registreer om reacties te plaatsen

Van Manuel

Ascended (19059)

afbeelding van Manuel

24-07-2022, 00:25

What you experience is a 'desync'. Can you still reproduce it without the debug device? (I guess so, as it's a write-only device that should not influence the state of the MSX.)
If yes, can you please send us the ROM and the replay so we can investigate?

Van mi-chi

Scribe (37)

afbeelding van mi-chi

24-07-2022, 08:31

Thanks Manuel for the quick reply!

Indeed, it makes no difference with or without debugdevice.

I prepared a ROM and a recording that exposes the issue and sent it to you via email.

Van wouter_

Champion (492)

afbeelding van wouter_

24-07-2022, 09:37

Hi mi-chi,

Manuel already replied, but I'll try to give some more background information.

Emulation in openMSX should be deterministic. That is: if openMSX starts from the same initial state and receives the same input, then it should always produce the same output. There are two main reasons for why this may not be the case:
* There can be a bug in openMSX that causes it to be not 100% deterministic.
* If the replay is created with one openMSX version (e.g. an older version), but played back with a different version (e.g. a newer version), it might be the case that some emulation aspect has changed between the two versions.

Let me elaborate on this 2nd point. We try very hard to be backwards compatible between openMSX releases. Though we cannot be backwards compatible with old emulation bugs. I'll give a fictitious example: suppose there was a bug in the timing of the NOP instruction (e.g. it took 1 cycle too long) and this bug is corrected in a later openMSX version. Loading an old replay in the new openMSX version will succeed, but the behavior is not 100% the same anymore as when the replay was created (because now the NOP instruction does have the correct timing). A 1 cycle difference is small, but over time such small errors can accumulate and cause wildly diverging behavior.

Another more realistic example: in the last months we tweaked the timing of some VDP commands (in the scenario when at the same time the CPU is accessing VRAM). This can cause the command to finish slightly sooner/later. And even a small difference may cause the end-of-VDP-command CPU-polling-loop to execute 1 iteration more/less. And over time these errors accumulate. For example an IRQ may be accepted at a different moment.

I mentioned that with the same start-state and the same input, emulation should be deterministic. And this is exactly how replay files are implemented: they contain a snapshot of the full MSX machine and a list of all user input. With this it's possible to re-create all future states of the MSX machine. However going to such a future state requires fully emulating the machine and for large jumps this can be slow. To mitigate this, a replay file does not contain a single, but multiple snapshots of the MSX machine, taken at different moments in time. (If you look at the 'reverse bar' in openMSX you see some gray vertical stripes, each such stripe corresponds to a snapshot.)

If you jump backward/forward in time, openMSX will start from the closest snapshot before the requested time, and then emulate (forwards) till this time is reached. Normally it shouldn't matter which snapshot is used as the starting point, all should lead to the same result. However this may break when:
* There is non-deterministic behavior (a bug).
* There was a timing bug-fix (in two openMSX versions) between the time of recording and playback. (The snapshots in the replay file were created with the old openMSX version).

@mi-chi: So my main question is: can you reproduce this bug when the replay file was created and replayed with the exact same openMSX executable? If so it's a bug in openMSX and we'd like to fix it.

Van mi-chi

Scribe (37)

afbeelding van mi-chi

24-07-2022, 18:22

Hi wouter_, thank you vey much for that extensive answer and the insight. I was always thinking that those brighter lines are just indicators for "seconds" or something.

What you write makes absolute sense, and there is no point to maintain compatibility between versions at all costs, especially when it comes to such a sophisticated feature (in the end, you are time-warping, and that with a cycle accuracy).

And yes, the replay also goes out-of-sync within the same version of the emulator - even on the same (host) system where the recording was done. My point with the older version was that this does not seem to be a new bug in 0.18, but that it already existed in 0.17 before (i.e. with the recording and the replay done in 0.17).

After what you said about the deterministic nature of the feature, I'm pretty sure that this is a bug. I did some simpler tests and could not isolate the problem to a minimam example, so I prepared a stripped-down ROM of my game where the problem happens reliably and a recording that shows the probelm in a reproducible manner, and sent it to Manuel earlier today.

Let me know if I should sent it to you as well (and if so, to your public email address from your profile?)

Van Manuel

Ascended (19059)

afbeelding van Manuel

24-07-2022, 22:11

Thanks, I already forwarded it to Wouter.