New MSX archive format (proposal / idea)

Pagina 1/3
| 2 | 3

Door ren

Paragon (1932)

afbeelding van ren

08-01-2021, 09:43

Forked from /forum/msx-talk/openmsx/openmsx-wish-list?page=2#comment-397035. I'll repeat the post:

I agree, a .dsk database would be pretty nice actually (keeping the clutter down) :)

Although (I don't know if such an idea has been discussed before?) personally I'm more an advocate of a self-contained (g)zipped package, that would combine a description with dumps of anything belonging to that release (a ROM, a disk, multiple disks, rom+disk, rom+tape, whatever). It could have it's own file extension.

It would be (very) nice if such a package could be read by a PC emu/tool, as well as e.g. SofaRun.

For descriptors I prefer Yaml. It could look something like this:

vspec: 1

title: Night Knight
type: game
abstract: You're playing Sir Bernard ...
date: 2019-05-..

  code: Juan J. Martínez
  gfx: Juan J. Martínez
  sound: Juan J. Martínez

    size: 32k
    mapper: normal
    filename: nknight.rom
    sha1: 42f79da674a8f5533abf6d85be116dc11cdd4114
    crc32: 3a7965de

  gen: 1
  ram: 16K

  support: psg

  mode: [480i, 576i] 
  preferred: 480i 
  notes: (576i should be fine too ;-))


  ver: 1.0.3
  date: 2019-05-18
  notes: fixed minor bug

  cart: rom



For PC use, it would be nice to have screenshot, box covers etc. as well, those could go in an 'extra' package. (Too big for MSX use obviously, unless brought down in size..)


Grauw wrote:

That’s a pretty nice idea I think. Also for ROM files. A decentralised manifest (descriptor).

That way, if I share a build of a game I could just release it in this format and I wouldn’t need to rely on either wonky autodetection, manual selection by the user, or a 3rd party database that is updated infrequently. People could just drop the zip on the emulator and it would read the mapper type from the manifest.

If the information would be used by and presented in emulators, I could imagine some people starting to make collections in such a format. Not only selecting mapper type, but also for example auto-selecting a suitable machine and extensions.

Aangemeld of registreer om reacties te plaatsen

Van ren

Paragon (1932)

afbeelding van ren

08-01-2021, 10:10

What is more efficient on MSX, zip or gzip? The latter compresses (just) a little better I think. Probably speed/RAM usage should be the decisive factor. ('Reliability'?)

The manifest could be simply zipped with the other files into the archive, or, .e.g. be binary concatenated onto the archive.

Advantage former: file can be easily opened/inspected with any archive tool.
Disadvantage: manifest has to be extracted before it can be read.

For the latter: vice versa.

File hashes aren't actually required / that relevant inside the manifest I think, but don't hurt either I figure: it allows for easy identifying, and e.g. cross-linking with other (current) databases. (Integrity is / should be safeguarded by the archive format itself?)

Crc32 seems appropriate for MSX use, sha1 for PC.

These manifests could be generated from an online database as well (and this database will contain the file hashes anyway).

I suppose opening a repo / project page for this would be neat (I'll drop a message here when in place).

(Of course, conceivable efforts are intrinsically linked to interest from emu/tool developers Smile)

Van FiXato

Scribe (1742)

afbeelding van FiXato

08-01-2021, 13:00

ren wrote:

For descriptors I prefer Yaml. It could look something like this:

Since I doubt you need the complexity of YAML, perhaps TOML might be more appropriate, and easier to create a parser library for the MSX for. :)

And if you remove the nesting from the package section, perhaps plain old INI files would even suffice. :)

Van Grauw

Ascended (10711)

afbeelding van Grauw

08-01-2021, 13:50

Zip and gzip are identical in terms of compression. But gzip can only compress single files, zip archives multiple. If you want to do the same with gzip you additionally need to use “tar” (.tar.gz might seem a familiar extension).

The only disadvantage from my point of view is actually the compression. Although an open source gzip/zip implementation exists for MSX, it’s not trivial code… Tool authors must be willing to go the extra mile. I could imagine tar as a simpler alternative, however I don’t think that would be a very popular choice. You want users to easily create and extract the archive.

Also whatever format is used for the manifest, care should be taken to implement really strict parsers, to ensure invalid markup doesn’t become widespread and we have to write all kinds of wonky parsers. Additionally they should ideally also validate the contents, such as path names and hashes.

Van reidrac

Expert (98)

afbeelding van reidrac

08-01-2021, 13:58

I guess you could use a ZIP with some expected files; like .jar does: jar file format

Change the extension and define the manifest, and you're done.

Van pgimeno

Champion (328)

afbeelding van pgimeno

08-01-2021, 14:50

FiXato wrote:

And if you remove the nesting from the package section, perhaps plain old INI files would even suffice. Smile

Nested INI files are not necessarily an issue.

key = value
key = value
key = value

Van reidrac

Expert (98)

afbeelding van reidrac

08-01-2021, 15:10

Those INI files; you're basically describing something that looks like TOML, with the benefit that TOML is a well defined format with a formal standard:

YAML is likely to be too much (and significant whitespace is not my cup of tea); TOML would be perfect, IMHO.

Van sdsnatcher73

Prophet (3851)

afbeelding van sdsnatcher73

08-01-2021, 16:07

Instead of zipping things up which will lead to increased load times (on real MSX’s). We could just have the descriptive file sit next to the rom/dsk (and tools could show just 1 entry in their list). For example you could have:


For PC you could combine these in a zip but for MSX (SofaRun) could read the dir and all .dsc files. One could consider creating e.g. a consolidated .dsc with a tool that one could run on PC or MSX which would combine the info for all .dsc’s in a single file. That could speed up reading by SofaRun.

Van ren

Paragon (1932)

afbeelding van ren

08-01-2021, 17:47

Grauw wrote:

You want users to easily create and extract the archive.

I figure end-users wouldn't need to bother with this, whilst 'pro'-users / developers should have no problem creating / dealing with it.

.gz: ah yes, of course, single file Smile (that's why we have .tar.gz indeed Wink)
There's no .tar utility for MSX I believe?

Something like ID3 came to mind as well.
Advantage: fixed 3 to 4 byte 'frame' identifiers (no chance writing an unsupported frame (key) / making a typo);
Disadvantage: proprietary, specific reader/editor/writer required.

Another option could be JSON.
I think an MSX JSON parser wouldn't be that hard to realize? (Upd:

Otherwise a subset of YAML could be implemented. Probably the following would suffice:

* indentation level (hash tables);
* (inline) arrays;
* multi-line text perhaps (e.g. for the abstract).

I could live with TOML as well I suppose.

Anyway, an auditing tool (verifying the manifest etc.) would be appropriate.

So .jar == zip w/ manifest inside. I also proposed the idea of not archiving the manifest with the other file(s), 'bolting' it somehow onto the archive. Obvious drawbacks here of course, but it could be beneficial as well (not needing to extract to read it)?

Are there any minimum MSX requirements dealing with .zip? SofaRun supports 'em right? When HDD is available that can be used for temp files, otherwise sufficient RAM is needed. I figure there's a difference in extracting a single file vs. the whole lot (in case of e.g. multi-disk)?

@sdsnatcher: yeah, non-zipped support is cool I suppose, just like e.g. a MAME romset can be used in both ways.
One advantage of proposal would be that e.g. a 7-disk game would be a inside a single archive as well. I know there's SofaRunIt with concatenated disk support as well (openMSX issue: 1241 ;)

To be clear: my own expertise lies within web dev (JS, PHP & MySQL/MariaDB). I could make some (not very optimized) non-web (CLI) stuff with that as well, but if this would to be to take off, it would have to be a multi-(wo)man effort (esp. the MSX-related stuff) :)

Van FiXato

Scribe (1742)

afbeelding van FiXato

08-01-2021, 17:58

ren wrote:

multi-(wo)man effort

multi-person, so it's also non-binary inclusive. Wink

Now to play the devil's advocate for a bit:
while this idea is nice for new releases, especially when the archive/manifest is created by the developers themselves, how do we prevent multiple package versions being released for the same dumps, potentially with contradicting info, for the large catalogue of existing dumps out there?

Van ren

Paragon (1932)

afbeelding van ren

08-01-2021, 18:37

@FiXato: these times... Wink

I figure that's where a/the central online database would come in.

So you would have 'certified' releases/manifests. I figure we could hash all of the archive contents, the manifest itself as well. The hashes will be stored in, and can be matched with online db. (Although the archive itself could be hashed as well, I think that's of no use as verification, as I think it would be nice if people can create the archive themselves, using the right files + manifest.)

Giving releases an unique alphanumeric ID is something to consider as well. In this case e.g.: nknight or nknig103 (not sure whether of how to use the version number)

A dev could register their release online, or one of the admins creates the manifest ASAP after release.

Pagina 1/3
| 2 | 3