Best way to test this: get BRMSX and run it on MSX1 with -vdptiming. If the screen goes crazy, then you need more NOPs (and sometimes just two NOPs aren't enough... faced that problem several times doing game conversions)
On MSX2 you can do several consecutive OUTIs without problems, so this would be the best solution for MSX2 IMHO
(But I don't know if it's possible, I didnt look at the source yet)
I can confirm the same:
passing from my old HB75P (msx1) to my old Philips MSX2 all the VDP timings disappeared magically.
Was I lucky with my Philips msx2 or the timing specifications have changed passing from msx1 to msx2 ?
I don't have much experience with msx2, but I need to put 2 nops between outs with NoFun using screen 0, width 80.
Look again at the calculation ...
In the worst case (MSX1 - Screen 2), we need to wait 29 T-states before accessing again the VRAM.
If we take a standard OUT (n),A instruction it means we need to wait 18 T-states after it.
So five NOPs (yes, five) are needed (@ 4 T-states each).
In cases like the above I'm of the idea of letting things as they are and add in comments warnings or/and email for assistance.
If someone incurs into the problem on his HW he will signal you that the problem exists.
I think that the # of real HW still being used is quickly decreasing in time and that the most basic and ancient machines are those less utilized nowadays.
This could be a nice way to know if real MSX machines with this timing flaw are still used actively.
@ Metallion, yes you really need 29 T-states between outs. Many times 28 is enough but it requires some knowledege about how the application is used (sprites etc).. Since your depacker is general purpose you need to use the worst case.
You need to count the extra M1 wait cycle to each instructions. In general you add 1 T-state to each instruction and one extra for instructions that start with DD ED, FD and CB. So a nop takes 4+1 T states to execute, an out (n),a takes 11+1 an out (c),r takes 12+1+1 and an outi takes 16+1+1.
If you want I can look at the code and add nops or other instructions where needed. I'm pretty good at this after writing several demos that optimize bandwith to the VDP. Let me know if you want some help.
Actually, about the pletter code, I do not see any case where two
successive out to 0x98 are closer than 28T-states.
The rule occurs only between successive data out,
between address outs at 0x98, and between address and data setting
there are no T-state constrains (or not ?)
I'm quite sure you need to space 0x99 outs and ins too. The VDP is only reading the bus at certain time slots so you need to keep the data on the bus for long enough so the VDP gets a chance to read it.
*EDIT* Actually I'm not sure if its only VRAM access that is restricted or if its all VDP access.
But are addresses and data on different buses ?
Is the reading of the address ruled by the same timing of the reading of the data ?
I should look closely at the TMS specs
I believe that there's no need to space the outs on $99, there's an address register that holds the data when it's being initialized. You only need to take care of avoid any access to VDP while this data is being sent (usually a pair of bytes). I'm pretty sure I've used things like out (c),e;out (c),d (being c=$99) without flaws...