How is basic stored in the memory?

Page 1/3
| 2 | 3

By Vampier

Prophet (2415)

Vampier's picture

14-10-2007, 11:23

hi, I'm trying to write a TCL script for openMSX that can take a basic listing directly from the memory. In order to do this I need to find out how basic listings are stored in the memory:

I already got some info from Bifi. Programs mostly start on 0x8000 (if not otherwise descripted by 2 addresses 0xf676 and 0xf677 )

I also found this:

ABS      06   2E82                DATA      84     485B
AND      F6                       DEF       97     5010
ASC      15   680B                DEFDBL    AE     4721
ATN      0E   2A14                DEFINT    AC     471B
ATTR$    E9   7C43                DEFSNG    AD     471E
AUTO     A9   49B5                DEFSTR    AB     4718
                                  DELETE    A8     53E2
BASE     C9   7B5A/7BCB           DIM       86     5E9F
BEEP     C0   00C0                DRAW      BE     5D6E
BIN$     1D   65FF                DSKF      26     7C39
BLOAD    CF   6EC6                DSKI$     EA     7C3E
BSAVE    D0   6E92                DSKO$     D1     7C16

CALL     CA   55A8                ELSE      A1     485D
CDBL     20   303A                END       81     63EA
CINT     1E   2F8A                ERASE     A5     6477
CIRCLE   BC   5B11                ERL       E1     4E0B
CHR$     16   681B                ERR       E2     4DFD
CLEAR    92   64AF                ERROR     A6     49AA
CLOAD    9B   703F                EOF       2B     6D25
CLS      9F   00C3                EQU       F9
CMD      D7   7C34                EXP       0B     2B4A
COLOR    BD   7980
CONT     99   6424                FIELD     B1     7C52
COPY     D6   7C2F                FILES     B7     6C2F
COS      0C   2993                FIX       21     30BE
CRSLIN   E8   790A                FN        DE     5040
CSAVE    9A   6F87                FOR       82     4524
CSNG     1F   2FB2                FPOS      27     6D39
CVD      2A   7C70                FRE       0F     69F2
CVI      28   7C66
CVS      29   7C6B

-------------------------------------------------------                                                        

GET      B2   7758                MAX       CD     7E4B
GOSUB    8D   47B2                MERGE     B6     6B5E
GOTO     89   47E8                MID$      03     689A
                                  MKD$      30     7C61
HEX$     1B   65FA                MKI$      2E     7C57
                                  MKS$      2F     7C5C
INKEY$   EC   7347                MOD       FB
INP      10   4001                MOTOR     CE     73B7
INPUT    85   4B6C
INPUT$        6C87                NAME      D3     7C20
INSTR    E5   68EB                NEW       94     6286
INT      05   30CF                NEXT      83     6527
IMP      FA                       NOT       E0     4F63
IPL      D5   7C2A
IF       8B   49E5                OCT$      1A     65F5
                                  OFF       EB
KEY      CC   786C                ON        95     48E4
KILL     D4   7C25                OPEN      B0     6AB7
                                  OR        F7
LEN      12   67FE                OUT       9C     4016
LEFT$    01   6861
LET      88   4880                PAD       25     7969
LINE     AF   4B0E                PAINT     BF     59C5
LIST     93   522E                PDL       24     795A
LFILES   BB   6C2A                PEEK      17     541C
LLIST    9E   5229                PLAY      C1     73E5/791B
LOC      2C   6D03                POINT     ED     5803
LOG      0A   2A72                POKE      98     5423
LDF      2D   6D14                POS       11     4FCC
LOAD     B5   6B5D                PRESET    C3     57E5
LOCATE   D8   7766                PRINT     91     4A24
LPOS     1C   4FC7                PSET      C2     57EA
LPRINT   9D   4A1D
LSET     B8   7C48

-------------------------------------------------------

READ     87   4B9F                TAB       DB
RENUM    AA   5468                TAN       0D     29FB
REM      8F   485D                THEN      DA
RESTORE  8C   633C                TIME      CB     7911/7900
RESUME   A7   4950                TO        D9
RIGHT$   02   6891                TRON      A2     6438
RETURN   8E   4821                TROFF     A3     6439
RND      08   2BDF
RSET     B9   7C4D
RUN      8A   479E                USING     E4
                                  USR       DD     4FD5
SCREEN   C5   79CC
SET      D2   7C1B
SGN      04   2E97                VAL       14     68BB
SIN      09   29AC                VARPTR    E7     4E41
SAVE     BA   6BA3                VDP       C8     7B37/7B47
SPC      DF                       VPOKE     C6     7BE2
SPACE$   19   6848                VPEEK     18     7EF5
SPRITE   C7   7A48/7A84
SOUND    C4   73CA
SQR      07   2AFF                WAIT      96     401C
STEP     DC                       WIDTH     A0     51C9
STICK    22   7940
STOP     90   63E3                XOR       F8
STRIG    23   794C
STRING$  E3   6829
STR$     13   6604
SWAP     A4   643E

Then I wrote this

100 S=0
110 FOR I=0 TO 250
120 A$=HEX$(PEEK(&H8000+I))
130 'PRINT A$+"|"+CHR$(VAL("&h"+A$)),S
140 IF A$="EF" THEN PRINT"=";:NEXT
150 IF A$="20" AND S=0 THEN S=1:NEXT
160 IF A$="20" AND S=1 THEN S=0:NEXT
170 IF S=1 THEN PRINT CHR$(VAL("&h"+A$));:NEXT
180 IF A$="80" THEN PRINT CHR$(13)+CHR$(13)+"<line num>"+HEX$(PEEK(&H8000+I+1)*PEEK(&H8000+I+2))+"</line num>";:I=I+2
190 IF A$="E0" THEN PRINT "NOT";
200 IF A$="A8" THEN PRINT "DELETE";
210 IF A$="30" THEN PRINT "MKD$";
220 IF A$="2F" THEN PRINT "MKS$";
230 IF A$="EB" THEN PRINT "OFF";
240 IF A$="F7" THEN PRINT "OR";
250 IF A$="97" THEN PRINT "DEF";
260 IF A$="86" THEN PRINT "DIM";
270 IF A$="6" THEN PRINT "ABS";
280 IF A$="F6" THEN PRINT "AND";
290 IF A$="15" THEN PRINT "ASC";
300 IF A$="0E" THEN PRINT "ATN";
310 IF A$="E9" THEN PRINT "ATTR$";
320 IF A$="A9" THEN PRINT "AUTO";
330 IF A$="C9" THEN PRINT "BASE";
340 IF A$="C0" THEN PRINT "BEEP";
350 IF A$="1D" THEN PRINT "BIN$";
360 IF A$="CF" THEN PRINT "BLOAD";
370 IF A$="D0" THEN PRINT "BSAVE";
380 IF A$="CA" THEN PRINT "CALL";
390 IF A$="20" THEN PRINT "CDBL";
400 IF A$="16" THEN PRINT "CHR$";
410 IF A$="1E" THEN PRINT "CINT";
420 IF A$="BC" THEN PRINT "CIRCLE";
430 IF A$="92" THEN PRINT "CLEAR";
440 IF A$="9B" THEN PRINT "CLOAD";
450 IF A$="9F" THEN PRINT "CLS";
460 IF A$="D7" THEN PRINT "CMD";
470 IF A$="BD" THEN PRINT "COLOR";
480 IF A$="99" THEN PRINT "CONT";
490 IF A$="D6" THEN PRINT "COPY";
500 IF A$="0C" THEN PRINT "COS";
510 IF A$="E8" THEN PRINT "CRSLIN";
520 IF A$="9A" THEN PRINT "CSAVE";
530 IF A$="1F" THEN PRINT "CSNG";
540 IF A$="2A" THEN PRINT "CVD";
550 IF A$="28" THEN PRINT "CVI";
560 IF A$="29" THEN PRINT "CVS";
570 IF A$="84" THEN PRINT "DATA";
580 IF A$="AE" THEN PRINT "DEFDBL";
590 IF A$="AC" THEN PRINT "DEFINT";
600 IF A$="AD" THEN PRINT "DEFSNG";
610 IF A$="AB" THEN PRINT "DEFSTR";
620 IF A$="BE" THEN PRINT "DRAW";
630 IF A$="26" THEN PRINT "DSKF";
640 IF A$="EA" THEN PRINT "DSKI$";
650 IF A$="D1" THEN PRINT "DSKO$";
660 IF A$="A1" THEN PRINT "ELSE";
670 IF A$="81" THEN PRINT "END";
680 IF A$="2B" THEN PRINT "EOF";
690 IF A$="F9" THEN PRINT "EQU";
700 IF A$="A5" THEN PRINT "ERASE";
710 IF A$="E1" THEN PRINT "ERL";
720 IF A$="E2" THEN PRINT "ERR";
730 IF A$="A6" THEN PRINT "ERROR";
740 IF A$="0B" THEN PRINT "EXP";
750 IF A$="B1" THEN PRINT "FIELD";
760 IF A$="B7" THEN PRINT "FILES";
770 IF A$="21" THEN PRINT "FIX";
780 IF A$="DE" THEN PRINT "FN";
790 IF A$="82" THEN PRINT "FOR";
800 IF A$="27" THEN PRINT "FPOS";
810 IF A$="0F" THEN PRINT "FRE";
820 IF A$="B2" THEN PRINT "GET";
830 IF A$="8D" THEN PRINT "GOSUB";
840 IF A$="89" THEN PRINT "GOTO";
850 IF A$="1B" THEN PRINT "HEX$";
860 IF A$="8B" THEN PRINT "IF";
870 IF A$="FA" THEN PRINT "IMP";
880 IF A$="EC" THEN PRINT "INKEY$";
890 IF A$="10" THEN PRINT "INP";
900 IF A$="85" THEN PRINT "INPUT";
910 'IF A$="6C87" THEN PRINT "INPUT$";  -- not valid
920 IF A$="E5" THEN PRINT "INSTR";
930 IF A$="5" THEN PRINT "INT";
940 IF A$="D5" THEN PRINT "IPL";
950 IF A$="CC" THEN PRINT "KEY";
960 IF A$="D4" THEN PRINT "KILL";
970 IF A$="2D" THEN PRINT "LDF";
980 IF A$="1" THEN PRINT "LEFT$";
990 IF A$="12" THEN PRINT "LEN";
1000 IF A$="88" THEN PRINT "LET";
1010 IF A$="BB" THEN PRINT "LFILES";
1020 IF A$="AF" THEN PRINT "LINE";
1030 IF A$="93" THEN PRINT "LIST";
1040 IF A$="9E" THEN PRINT "LLIST";
1050 IF A$="B5" THEN PRINT "LOAD";
1060 IF A$="2C" THEN PRINT "LOC";
1070 IF A$="D8" THEN PRINT "LOCATE";
1080 IF A$="0A" THEN PRINT "LOG";
1090 IF A$="1C" THEN PRINT "LPOS";
1100 IF A$="9D" THEN PRINT "LPRINT";
1110 IF A$="B8" THEN PRINT "LSET";
1120 IF A$="CD" THEN PRINT "MAX";
1130 IF A$="B6" THEN PRINT "MERGE";
1140 IF A$="3" THEN PRINT "MID$";
1150 IF A$="2E" THEN PRINT "MKI$";
1160 IF A$="FB" THEN PRINT "MOD";
1170 IF A$="CE" THEN PRINT "MOTOR";
1180 IF A$="D3" THEN PRINT "NAME";
1190 IF A$="94" THEN PRINT "NEW";
1200 IF A$="83" THEN PRINT "NEXT";
1210 IF A$="1A" THEN PRINT "OCT$";
1220 IF A$="95" THEN PRINT "ON";
1230 IF A$="B0" THEN PRINT "OPEN";
1240 IF A$="9C" THEN PRINT "OUT";
1250 IF A$="25" THEN PRINT "PAD";
1260 IF A$="BF" THEN PRINT "PAINT";
1270 IF A$="24" THEN PRINT "PDL";
1280 IF A$="17" THEN PRINT "PEEK";
1290 IF A$="C1" THEN PRINT "PLAY";
1300 IF A$="ED" THEN PRINT "POINT";
1310 IF A$="98" THEN PRINT "POKE";
1320 IF A$="11" THEN PRINT "POS";
1330 IF A$="C3" THEN PRINT "PRESET";
1340 IF A$="91" THEN PRINT "PRINT";
1350 IF A$="C2" THEN PRINT "PSET";
1360 IF A$="87" THEN PRINT "READ";
1370 IF A$="8F" THEN PRINT "REM";
1380 IF A$="AA" THEN PRINT "RENUM";
1390 IF A$="8C" THEN PRINT "RESTORE";
1400 IF A$="A7" THEN PRINT "RESUME";
1410 IF A$="8E" THEN PRINT "RETURN";
1420 IF A$="2" THEN PRINT "RIGHT$";
1430 IF A$="8" THEN PRINT "RND";
1440 IF A$="B9" THEN PRINT "RSET";
1450 IF A$="8A" THEN PRINT "RUN";
1460 IF A$="BA" THEN PRINT "SAVE";
1470 IF A$="C5" THEN PRINT "SCREEN";
1480 IF A$="D2" THEN PRINT "SET";
1490 IF A$="4" THEN PRINT "SGN";
1500 IF A$="9" THEN PRINT "SIN";
1510 IF A$="C4" THEN PRINT "SOUND";
1520 IF A$="19" THEN PRINT "SPACE$";
1530 IF A$="DF" THEN PRINT "SPC";
1540 IF A$="C7" THEN PRINT "SPRITE";
1550 IF A$="7" THEN PRINT "SQR";
1560 IF A$="DC" THEN PRINT "STEP";
1570 IF A$="22" THEN PRINT "STICK";
1580 IF A$="90" THEN PRINT "STOP";
1590 IF A$="13" THEN PRINT "STR$";
1600 IF A$="23" THEN PRINT "STRIG";
1610 IF A$="E3" THEN PRINT "STRING$";
1620 IF A$="A4" THEN PRINT "SWAP";
1630 IF A$="DB" THEN PRINT "TAB";
1640 IF A$="0D" THEN PRINT "TAN";
1650 IF A$="DA" THEN PRINT "THEN";
1660 IF A$="CB" THEN PRINT "TIME";
1670 IF A$="D9" THEN PRINT "TO";
1680 IF A$="A3" THEN PRINT "TROFF";
1690 IF A$="A2" THEN PRINT "TRON";
1700 IF A$="E4" THEN PRINT "USING";
1710 IF A$="DD" THEN PRINT "USR";
1720 IF A$="14" THEN PRINT "VAL";
1730 IF A$="E7" THEN PRINT "VARPTR";
1740 IF A$="C8" THEN PRINT "VDP";
1750 IF A$="18" THEN PRINT "VPEEK";
1760 IF A$="C6" THEN PRINT "VPOKE";
1770 IF A$="96" THEN PRINT "WAIT";
1780 IF A$="A0" THEN PRINT "WIDTH";
1790 IF A$="F8" THEN PRINT "XOR";
1800 NEXT

For some reason it almost works, but I seem to be missing vital parts of information. Can someone help me?

Login or register to post comments

By Sonic_aka_T

Enlighted (4130)

Sonic_aka_T's picture

14-10-2007, 11:33

Clueless as to how all this works, but maybe you should have a look at the SAVE "",A routine from basic itself?

By AuroraMSX

Paragon (1902)

AuroraMSX's picture

14-10-2007, 11:38

Cool idea Hannibal

I'm missing some code that evaluates constants (strings, numbers etc). I guess that's what messing up your output...
And just do use that system variable at &HF676 that tells you where BASIC begins -- much more reliable than just starting at &h8000!

By jltursan

Prophet (2619)

jltursan's picture

14-10-2007, 12:08

The BASIC interpreter tokeniser routine maybe could help you (quoted from the MSX Red Book):


This routine is used by the Interpreter Mainloop to tokenize
a line of text. On entry register pair HL points to the first
text character in BUF. On exit the tokenized line is in KBUF,
register pair BC holds its length and register pair HL points
to its start.
Except after opening quotes or after the "REM", "CALL" or
"DATA" keywords any string of characters matching a keyword is
replaced by that keyword's token. Lower case alphabetics are
changed to upper case for keyword comparison. The character "?"
is replaced by the "PRINT" token (91H) and the character "'" by
":" (3AH), "REM" token (8FH), "'" token (E6H). The "ELSE" token
(A1H) is preceded by a statement separator (3AH). Any other
miscellaneous characters in the text are copied without
alteration except that lower case alphabetics are converted to
upper case. Those tokens smaller than 80H, the function tokens,
cannot be stored directly in KBUF as they will conflict with
ordinary text. Instead the sequence FFH, token+80H is used.
Numeric constants are first converted into one of the
standard types in DAC (3299H). They are then stored in one of
several ways depending upon their type and magnitude, the
general idea being to minimize memory usage:

0BH LSB MSB ................... Octal number
0CH LSB MSB ................... Hex number
11H to 1AH .................... Integer 0 to 9
0FH LSB ....................... Integer 10 to 255
1CH LSB MSB ................... Integer 256 to 32767
1DH EE DD DD DD ............... Single Precision
1FH EE DD DD DD DD DD DD DD ... Double Precision

There is no specific token for binary numbers, these are left
as character strings. This would appear to be a legacy from
earlier versions of Microsoft BASIC. Any sign prefixing a
number is regarded as an operator and is stored as a separate
token, negative numbers are not produced during tokenization.
As double precision numbers occupy so much space a line
containing too many, for example PRINT 1#,1#,1# etc. may cause
KBUF to fill up. If this happens a "Line buffer overflow" error
is generated.
Any number following one of the keyword tokens in the table
at 43B5H is considered to be a line number operand and is
stored with a different token:
0DH LSB MSB ................... Pointer
0EH LSB MSB ................... Line number
During tokenization only the normal type (0EH) is generated,
when a program actually runs these line number operands are
converted to the address pointer type (0DH).

The operand tokens are, if I'm not wrong:

79H ... + 46H ... OR
79H ... - 3CH ... XOR
7CH ... * 32H ... EQV
7CH ... / 28H ... IMP
7FH ... ^ 7AH ... MOD
50H ... AND 7BH \

And this is an example of a little BASIC program detokenised:

10 KEYOFF:SCREEN 1:WIDTH 32:
20 FOR I=&H1800 TO 6911
30 A=INT(RND(-TIME)*256)
40 VPOKE I,A
50 NEXT I
60 GOTO 60

Memory contents at &H8000:

(00)
(12 80)(0a 00)(cc)(eb)(3a)(c5)(20)(12)(3a)(a0)(20)(0f 20)(3a)(00)
8012   10     KEY OFF :   SCREEN  1    :  WIDTH   32     :
(24 80)(14 00)(82)(20)(49)(ef)(0c)(00 18)(20)(d9)(20)(1c)(ff 1a) 00
8024   20     FOR     I   =   &H  1800       TO          6911
(39 80)(1e 00)(41)(ef)(ff 85)(28)(ff 88)(28)(f2)(cb)(29)(f3)(1c 00 01)(29)(00)
8039   30     A   =   INT    (   RND    (   -   TIME)   *   256       )
(43 80)(28 00)(c6)(20)(49)(2c)(41)(00)
8043   40     VPOKE   I   ,   A
(4b 80)(32 00)(83)(20)(49)(00)
804B   50     NEXT    I
(55 80)(3c 00)(89)(20)(0d 4a 80)
8055   60     GOTO    POINTER:804A
(00 00)
END OF LISTING

By PingPong

Enlighted (4155)

PingPong's picture

14-10-2007, 16:10

How basic is stored in memory? In a very inefficient way (slooooooooooooooooooooooooooooooooooooow)

MSXBASIC is one of the slowest basic for 8 bit machines (thx, m$)

By Sonic_aka_T

Enlighted (4130)

Sonic_aka_T's picture

14-10-2007, 17:19

How basic is stored in memory? In a very inefficient way (slooooooooooooooooooooooooooooooooooooow)

MSXBASIC is one of the slowest basic for 8 bit machines (thx, m$)It's also the most powerful BASIC... (as in complete)

By AuroraMSX

Paragon (1902)

AuroraMSX's picture

14-10-2007, 18:38

How basic is stored in memory? In a very inefficient way (slooooooooooooooooooooooooooooooooooooow)Actually, the tokenized version is not that inefficient and very similar to how other BASICs store their programs in RAM.
It's also the most powerful BASIC... (as in complete)No it's not. Have a look at the BBC BASIC: thats like MSX BASIC plus a couple of nice features like procedures and loop constructs like WHILE/DO and DO/UNTIL...

By Vampier

Prophet (2415)

Vampier's picture

14-10-2007, 18:53

Thanks for the long reply there Smile

TCL-ing as we speak Smile

By dvik

Prophet (2200)

dvik's picture

14-10-2007, 20:06

A much easier way of getting the basic listing out of an emulator is to do:

LLIST

which sends the listing to the printer port. Then in blueMSX or openMSX you save the printer output to a file.

By Sonic_aka_T

Enlighted (4130)

Sonic_aka_T's picture

14-10-2007, 20:19

I always use SAVE "",A with dirasdisk...

By cax

Prophet (3741)

cax's picture

14-10-2007, 20:28

AFAIK there already exist some tools that untokenize basic programs, and they even made their appearance on the main news page in the past...

Page 1/3
| 2 | 3