Assembler Optimizer

Page 28/57
21 | 22 | 23 | 24 | 25 | 26 | 27 | | 29 | 30 | 31 | 32 | 33

By santiontanon

Paragon (1639)

santiontanon's picture

01-02-2021, 16:13

The extended instructions of sjasm are already supported, like "ld a,(hl++)", etc.. But I have not looked at sjasmplus yet! Is there any open source assembler project developed using sjasmplus that I can use to test? (I basically supported all the assemblers for which I could find open source projects for testing hehe Smile ).

By nihirash

Rookie (21)

nihirash's picture

01-02-2021, 16:22

Well, you can use any my project.

They all using sjasmplus.

For example, small and bad(very very bad code) written game for Jupiter Ace: https://github.com/nihirash/rocksnghosts/

By santiontanon

Paragon (1639)

santiontanon's picture

01-02-2021, 16:58

Cool!!! Thanks for the link, with that I can start initial support for sjasmplus! Added to the to-do list Smile

By Bengalack

Hero (590)

Bengalack's picture

01-02-2021, 19:21

santiontanon wrote:

- the good news is that all of those macros are now supported, and that the Windows path issue was a small issue that was trivial to fix

Awesome! Just tried it on the the game.c ==> game.asm ==> game.mdl.asm. 500+ cycles shaved off (even after using opt-flags when compiling). Used "-poapply -po", and the adjustments was very easy to track and learn from. It seemed like correct "wierd" SDCC-notation was used too. Like, always prepend any number with "#" and the indexing like "ld -8 (ix), #0". What I'll try next is to bake this into the build-script and have the linker used the mdl'ed file as input instead Smile

santiontanon wrote:

- the bad news is that optimizing files in the SDCC dialect that contains macros will be problematic.

Ok, would be cool if this could work, but it is not as important as the rest already implemented. I assume my own asm has less to optimize than the c-generated code. Also, I will learn from output of the tool, and try to apply in my own asm.

What I can do though, is that I can see if I can split my main asm-file in "using macros" and "not using macros", and run it on the latter :-)

Thank you for making this available for the rest of us Smile

By santiontanon

Paragon (1639)

santiontanon's picture

01-02-2021, 21:42

Wohoo!! Great to hear it worked, and that it was able to find some decent number of optimizations!! If any other issues arise, by all means, let me know, and I'd be happy to address.

But yes, definitively, I have been thinking about the macro issue, and I have added a few ideas to my to-do list. So, probably in the next release or the one after that, those changes will be in! I'm focusing my time on the "reorganizer optimizer" now (started coding it yesterday), but as soon as a 1st version of that is done, I'll come back to look at the SDCC+macros and the sjasmplus scenarios!

By Grauw

Ascended (10603)

Grauw's picture

02-02-2021, 01:05

You saw my earlier message santiontanon?

By santiontanon

Paragon (1639)

santiontanon's picture

02-02-2021, 05:18

Oh! yes, apologies, forgot to respond! Thanks a lot for the info! I had figured a couple of the things you mentioned (start with a label and end with an unconditional jump/ret), but I was still considering what to do with the conditional jumps, so your post is definitively very helpful!

What I am working on right now to get started is just block detection. I am starting from, first of all, identifying the "top-level blocks" (for example, code that is to be assembled in one page of a MegaROM cannot be moved to another). I can move blocks within this top-level blocks, but no across them. These "top-level blocks" are assembler dialect dependent, so, I'm starting with that, which should take a few days to set up before I move on to actually analyzing the code within the top-level blocks to identify the code-blocks you described in your message Smile

By santiontanon

Paragon (1639)

santiontanon's picture

06-02-2021, 06:10

muhahaha! (evil laugh) A first version of the reorganizer is working already!!! (not in the release yet, as it is not very well tested, but hopefully later this weekend)

I am impressed with the number of "reorganizations" that it finds actually! I have only been trying it on XSpelunker (a 32KB ROM), and I was expecting it to find only a handful of code-block movements to optimize, but it actually found 60! which translate in 166 bytes saved (and 680 t states). Not bad! (or maybe I was just a sloppy coder in XSpelunker hahaha).

Also, this is with a VERY simple version of the division of the code in blocks (only breaking by rets and unconditional jumps). I still need to see if I can get any further gains with conditional jumps (considering several conditional jumps in a row (with the last one maybe being an unconditional one) as a "SELECT" as you mentioned above grauw).

But for now, I'm going to just start adding unit tests on this version to make sure it all works as expected, and start testing it in different assembler code bases to see what it can do! Smile

By pgimeno

Champion (318)

pgimeno's picture

06-02-2021, 12:41

Does the reorganizer properly handle cases where a relative jump goes out of reach after moving the block?

You said that the reorganizer works like this:

C1  ; this is a block of code
jp C2
C3  ; some other code
C2  ; target block

; That is converted to:
C1
C2
C3

But what if e.g. C2 contains a relative jump to somewhere within C3, and the target of that jump is out of reach after the reorganization? For example:

; pre-reorganization:
C1
jp C2

; begin of C3
more than 127 bytes
Label:
less than 126 bytes
; end of C3

; begin of C2
jr nz, Label
rest of C2
; end of C2


; post-reorganization:
C1
; begin of C2
jr nz,Label  ; ERROR: Label is out of reach
rest of C2
; end of C2

; begin of C3
more than 127 bytes
Label:
less than 126 bytes
; end of C3

I can imagine more scenarios similar to this one, for example when the relative jump is in C3 and jumps to C2.

By santiontanon

Paragon (1639)

santiontanon's picture

06-02-2021, 15:28

Thanks for the reminder of this case!!! But in this case yes! it is currently handled (although only "half-handled" Smile ). For now, what the code does is to "try the reorganization", then test if any relative jump (jr/djnz) got out of reach. If any did, then undo the reorganization. In future versions, what I'd like is for the code to check if the user specified "size" or "speed" optimizations, and in case the goal is to optimize for speed, then we can just turn the jrs that went out of range into jps (only if it is a "djnz", then it'll undo the reorganization).

But there are a few other edge cases that I have in my to-do list, but are not yet handled. For example, I have seen in some projects some part of the code expressed directly as "db byte, byte, byte" (for example, when the assembler used does not support some of the unofficial instructions). Currently, MDL will not detect these as assembler instructions and will probably propose wrong optimizations. Another case that is not handled is self-modifying code. And finally, the last case that is tricky is when we have a "jp (hl)" or "jp (ix)", since it is hard to know where the instruction is going to jump and MDL will not build the correct code-flow graph.

I have all of these noted in my to-do list (to at least detect and prevent any harmful optimizations). But not handled yet Smile (but I still need to think to see if there are other cases).

Page 28/57
21 | 22 | 23 | 24 | 25 | 26 | 27 | | 29 | 30 | 31 | 32 | 33