Pascal Implementation by Steven Pemberton and Martin Daniels
Although the assembler and the interpreter are presented here as one program they are two distinct phases as far as the operation is concerned.
The assembler puts the P-code program in the array code and the associated constants in the store array. The interpreter executes the P-code instructions using the case statement starting on [687] with the help of some additional procedures between lines [438-542]. The first procedures pmd and errori are concerned with errors that may arise at run time. The function base is used for manipulating the linkage between stack frames. Procedure compare is used for string comparisons. Finally procedure callsp contains the routines to deal with the embedded standard P-code procedures and the special requirements of input and output. A detailed analysis of the interpreter now follows.
[438-49] procedure pmd (post-mortem dump).
This procedure is called when an error during run time has occurred.
[439] The main objective is to print out the contents of the stack and heap. The variable s holds the address of a location to be written out.
[441-9] procedure pt (print)
[442] prints the address and contents of one location. [443] tests for a peculiarity of the CDC machine which is able to store an integer value that is larger than is possible to print out. S and i are adjusted ready for the next value to be printed.
[452] This statement outputs some of the register values and the offending opcode value.
[456-7] These statements print the contents of the stack. The next two lines [458-9] print the contents of the heap.
The result of this procedure is a straightforward dump. It would be much more useful if all the values of the registers were printed, and the output formatted to show the structure of the stack frames.
[462-5] errori
This procedure has two functions, to print out the error message, (which will be an indication of the type of error that has occurred) which was passed as a parameter, and to call the post-mortem dump procedure. Control is transferred to label 1 at line [1018] to halt execution. (In the assembler in a similar situation [170] the non-standard halt procedure was used).
[467-74] Function base
The parameter, ld, which is passed to this function is the depth of nesting and is used to calculate an address to be returned as the value for the function. This function is used to follow the static links back to a textually enclosing block and to return a base address. The effect of this while statement is
for i := 1 to ld do ad := store[ad + 1].vm
[476-85] Procedure compare
The purpose of this procedure is to compare two strings. The strings will be of equal length and the procedure returns true if the strings are the same. The addresses of strings are found on the top of the stack and assigned to i1 and i2, and their length is defined by the q operand of the P-code instruction being executed. The main body of the procedure [482-4] works as follows. While each successive pair of characters are equal, the test proceeds, as soon as a difference is detected b is set false and control returns to the calling procedure.
[483] It is not clear why the strings are referenced with the tag field vi when it would be more appropriate to use vc.
The procedure compare could be much improved as it only performs
half the task of comparison e.g. for GEQ
[829-30] the comparison is
continued to test for greater than. The procedure could be
reprogrammed as follows:
type same = (lessthan, equal, greaterthan); function compare: same; . . . end; (* of compare *)
the code for GEQ
could then become:
store[sp].vb := compare in [greaterthan, equal]
There are a number of standard procedures that are available to P-code programs. They appear in the source, for example, as follows:
CSP WRC
where the operand of the instruction specifies the required standard procedure.
The majority of these standard procedures deal with file handling, and to aid their interpretation there is a set of procedures between lines [491-542] which will be discussed before the detailed description of each standard procedure.
Each of these procedures performs, generally, a similar set of operations:
The file addresses refer to one of four locations in the stack frame that is generated for the main block. The addresses are as follows:
5: input from the input file
6: output to the output file
7: input from the prd file
8: output to the prr file
The contents of each of the these addresses are used to hold the buffer variables.
The variables adptr and adelnt although declared are not used.
[491-7] readi (read integer)
An address is taken from the penultimate position on the stack and an integer is read into this address. The buffer variable location is updated, f^ contains the next character to be read and this is placed in the buffer variable location [495]. Line [496] sees the two addresses removed from the stack.
[499-505] readr (read real number)
This procedure is effectively the same as readi except that in [502] a real number is read in.
[507-15] readc (read a character)
A character is read and placed in the address found on the stack. [513] appears to be unneccessary as it overwrites the effect of line [512].
[517-27] writestr (write string)
For this procedure the stack contains two extra numbers which give an indication of the field width required (k) and the number of characters in the string (j). If k is greater than j then the difference of these two numbers gives the number of spaces that must be printed out before the string. [525] writes the characters of the string. The stack pointer is decremented by four.
[529-5] getfile
This procedure closely follows the general outline. Its function is to do a get on a file, that is, to advance the file window to the next component and update the buffer variable [533].
[536-41] putfile
This procedure performs the reverse function to getfile. i.e. it appends the value of the buffer variable to the file.
The next group of instructions deals with the execution of the standard procedures proper. [543] The case statement is driven by the value of the operand, q. In each case which involves file transfers there is a further case statement to separate the individual files - which are denoted by the numbers 5,6,7 and 8.
[545-50] GET
Store[sp].va defines the file on which the get is to be performed. This is only permissible on input files and therefore in cases 6 and 8 there is a call to procedure errori to print out the error message and halt the program.
[551-6] PUT
This is very similar to GET
.
[557-60] RST
This instruction corresponds to release. The heap pointer gets set to the value on the stack. The comment is meaningless. The instruction name is not very helpful (suggesting reset perhaps?).
[561-72] RLN
(readln)
This instruction performs a readln on the specified file and
updates the buffer variable accordingly. RLN
is applicable to input
files only, hence the error conditions for the two output
files.
Unfortunately there is an error in the code for the prd file, which refers to input instead of prd. It should read:
7: begin readln (prd); store[prdadr].vc : = prd ^ end;
[573-80] NEW
The new procedure is used to allocate space on the heap. The stack
contains an address and a value giving the space requirements. A
test is made to ensure that the stack and heap will not overlap
when the space has been allocated. Provided this check succeeds, NP
is updated and this new value is returned in the address found on
the stack, which will be used to access the new locations on the
heap.
[581-8] WLN
(write line)
The purpose of this procedure is to perform a writeln on the specified file.
[589-93] WRS
(write string)
A string of characters is output to the selected file by calling the procedure writestr.
[595-602] ELN
(end-of-line)
This procedure performs a test to determine if the end-of-line, during input, has occurred. A temporary boolean variable, line, is used. This value is left on the stack.
[603-32] WRI
, WRR
and WRC
(write integer, write real and write
character).
These three routines are identical excluding the tag fields. In each case there are three values on the stack, the top value is the file identifier, followed by the field width and the value to be written. Taking [605] as an example this statement writes an integer (as shown by the tag field vi) store[sp-2].vi, with a field width of store[sp-1].vi, onto the output file.
[633-50] RDI
, RDR
and RDC
(read integer, read real and read
character)
Similarly, these procedures only differ in the actual procedures that they call which have already been discussed [491-515].
[651-6] This section covers the six mathematical functions available, sin, cos, exp, ln, sqrt, and atan. In each case the top of the stack contains a real number. This is replaced by a real number which is the mathematical function of the original
[657-60] SAV
(save)
This instruction is called as a result of the mark procedure. It is used to remember the heap pointer for later use in relinquishing space on the heap via the release procedure.
[661-2] This marks the end of the CSP
case statement and the end of
the callsp procedure.
There is no real need to separate the standard procedures from the ordinary P-code instructions; the standard procedures could simply be parameterless instructions. This would save this second case statement [544] (and incidently the second linear search in the assembler [321].)
There is a lack of consistency in the choice of which are standard
procedures and which are instructions. For example, why has EOF
been selected as an instruction?
All the file handling operations have a case statement and the individual files are represented by the four numbers. In fact names have already been defined for these files and their associated addresses [34-7], that is inputadr, outputdr, prdadr and prradr. These should be used instead of numbers because in other implementations different numbers may be involved.
This is the start of the controlling segment for both the assembler and interpreter. The assembler is self-contained within one procedure, load, and this is called in line [666]. The interpreter is not separated in the same way, the bulk of the code appears in the following statements.
[665] rewrite(prr)
This statement is not included for the assembler/interpreter as neither use the prr file - it is included in case the user program that is to be intepreted uses it. Normally rewrite and reset would be included at the P-code level but neither are provided in this P-code implementation. These functions have therefore to be provided by the interpreter, hence [665] and line [670] which partially fulfill reset(prd). However these additions do not fully substitute the need for true reset/rewrite functions - for example, multiple passes of the prd file are not possible. It is a trivial job to add these functions to the compiler and the assembler/interpreter.
[668-70] Initialisation takes place in these lines. The registers
are set up with the PC
pointing at the first executable instruction
and the stack frame registers ready to generate the first stack
frames. SP
is pointing to an empty stack and NP
is pointing to an
empty heap. EP
is assigned the value five - but this value is
arbitrary as it is changed by the first ENT
instructions. [669-70]
initialises the buffer variables for both input and prd files.
[671]
The variable interpreting is set true and is only set false by the
STP
(stop) instruction [1003].
[673-1017] The remainder of the program is the body of the
interpreter where the P-code machine is simulated. This machine
follows traditional lines of a fetch-execute cycle. The fetch phase
is between [676-682]: the primary objective is to determine the
opcode of the next instruction together with its parameters p and
q. The only peculiarity here is again the fact that the
instructions have been packed two to a word, as in [419-24]. The PC
is incremented in order to point at the next instruction in
sequence.
The
interpretation, or execute phase, of the individual instructions
now begins. The assembler has appointed unique opcodes to each
instructions and also each instruction with a collection of
different possible types e.g. SRO
has the opcodes 3, 75, 76, 77,
78, 79 - one for each type of possible operand. The interpreter,
however, in most cases is able to lump them together again as it
assumes that all primitive types occupy one word. In other
implementations it is likely that each case would have to be
treated separately.
The case statement [685] acts like the instruction register of a processor and directs the execution according to the opcode of the instruction.
[687-91] LOD
(load contents of address at level p)
p gives the depth of nesting and this is used by the function base to find a base address. q is used as an offset to this base to locate the contents required. This value is put onto the stack.
[693-7] LDO
(load contents of base-level address)
This instruction represents a special case of LOD
where the
address required is known to be at the outermost level of nesting,
that is in the main program, and therefore it is not necessary to
search down the static links to find the location. q in this case
is a simple offset to the first stack frame and it is from this
location that a value is found and put on the stack.
[699-702] STR
(store contents at address at level p)
This is the complementary instruction to LOD
and operates in a
similar manner.
[704-7] SRO
(store at base-level address)
This is the complementary instruction to LDO
.
[709-11] LDA
(load level p address)
This instruction is similar to LOD
except that the address is
loaded onto the stack, rather than its contents.
[713-5] LAO
(load base-level address)
The operand q is a base-level address and this is left on the
stack. This is the base-level version of LDA
.
[717-21] STO
(store indirect)
In this instance the stack contains two entries, the uppermost is a value to be stored in the location whose address is in the next location.
[723-31] LDC
(load constant)
Four different types of constant may be loaded with this instruction. The first is an integer (p=l) (the begin and end are not required here [725-6]), the second (p=6) is a character, the third (p=3) is a boolean value, otherwise (p=0) the nil value will be loaded on the stack. The nil value created is maxstr which is an address that cannot be generated by the user program.
[733-5] LCI
(load constant indirect)
The value of q is a pointer to the table of constants, this address is left on the stack.
[737-41] IND
(indexed fetch)
The top of the stack contains an address, and the value of q is an offset to the address. (The comment means that the compiler will compensate for a particular primitive type having more than one word per variable.) The offset and address produce a new address and the contents of this location are placed on the stack.
[743-4] INC
(increment)
The top of the stack is incremented by the value of the operand, q.
[746-57] MST
(mark stack)
The MST
instruction is used to build a mark stack for a new stack
frame and is part of calling a procedure. [750] sets up the static
link, and the dynamic link is assigned in the next statement. [754]
stores the value of EP
i.e. the extremity of the current stack
frame, so that the current procedure may be reinstated. The
stack pointer is moved by five units i.e. the size of the mark
stack. After the MST
instruction, the parameters (if any) for the
procedure will be loaded into the stack frame.
[759-63] CUP
(call user procedure)
The first statement for this instruction is to set the MP
to the
beginning of the new stack frame that has been partially
constructed by the previous MST
instruction. p denotes the amount
of space required for parameters and the number four relates to the
size of the mark stack (for this implementation only). As p has
been declared as 0..15 it means that the number of parameters is
limited to 15 (though this is never checked). [761] the PC
value is
stored for the return address back to the calling procedure. [762]
q is the address of the instruction of the procedure being
called.
[765-73] ENT
(enter procedure)
Two ENT
instructions will be found at the start of every user
procedure. The operands of these instructions define the length of
the remaining segments in the stack frame. If p=l then the value of
q denotes the extra space required for locally declared variables
and SP
is updated accordingly. The statement should be as
follows:
sp := mp + q - 1;
This is because SP
always points at the top item on the stack; that
is, to push a value involves incrementing the stack pointer first.
Without the additional -1 one location is always wasted. A check is
made [767] to ensure that the stack frame does not encroach into
the heap. If p=2 then q denotes the size of the local stack and EP
is updated and again a check is made to ensure that the stack frame
does not overlap the heap. Only this second check is necessary.
[775-82] RET
(return from procedure or function)
The RET
instruction provides the mechanism to return to the calling
procedure and release the stack frame of the just-completed
procedure. If returning from a function then the returned function
value must be left on the stack [777], the five values of p
allowing for the different function types. If p equals zero then a
procedure return is being performed. The remaining three statements
of this instruction revive the calling procedure's stack frame i.e.
PC
takes up the return address value, EP
the old EP
value and
finally MP
the old MP
value or dynamic link.
[784]
CSP
(call standard procedure)
This is dealt with by a separate procedure which has already been described [487-662].
[786-90] IXA
(indexed address)
This instruction is for array subscripts. Two values are on the stack, the uppermost is an integer and the next an address. [789] The address is modified by the product of the integer and q and this new address is left on the stack.
[792-877] Condition instructions.
The next six instructions are each treated in similar ways.
[792-804] EQU
(test for equal)
The two values to be compared are on the top of the stack, except
for the strings case where pointers to the strings are present.
[794] Two integers are compared and the boolean result is pushed
onto the stack. The other types are treated in a similar way,
except for strings [800-2]. Here, the procedure compare returns in
variable b a boolean giving the result of the equality test which
in the case of EQU
is in the correct form. With all the other
conditional instructions this boolean value must be modified.
[806-l8] NEQ
(test for not equals)
Similar to EQU
.
[820-33] GEQ
(test for greater than or equal)
An additional test is required for the string comparison. Procedure
compare only returns a result indicating whether the strings are
equal or not. The pointers i1, i2 and i are left by compare
pointing to the place in the strings where a difference has
occurred and these are used to test the two characters for GEQ
.
Note that if b = true, i1 and i2 point to garbage.
[835-77] GRT
, LEQ
, LES
(tests for greater than, less than or equal,
and less than) Similar to previous tests.
This concludes the conditional instructions.
The case labels for the conditional instructions are in a curious order.
[879]
UJP
(unconditional jump)
q specifies the address to where control is to be transferred.
[881-3] FJP
(jump on false)
This instruction is used by if, while, repeat and for statements.
If the boolean value on the top of the stack is false then a jump
is made to the address held in q, otherwise the next instruction
following the FJP
is executed.
[885-888] XJP
(indexed jump)
The address in q is modified by the integer on the top of the stack to index a table of jumps. This instruction is used for the implementation of the case statement, the integer being the selector.
[890-2] CHKA
(check address)
This is a run time error checking instruction. It is used to ensure that pointers to the heap are correct. q will only take the values zero or one, in the first case the address can either be nil or a real address, while in the second case only a real address is allowed i.e. a nil pointer cannot be dereferenced.
[894-7] CHK
(check value between bounds)
q is a pointer to the constants table where the upper and lower bounds are stored. The value on the stack is checked to ensure it lies between these bounds - if not an error message is printed and execution halted.
[899-3] EOF
(end-of-file)
If the address on the stack is inputadr then a boolean is returned denoting whether end-of-file on the input file has occurred or not, otherwise an error condition is flagged. (An unnecessary begin end here). This is an unnecessary restriction which standard Pascal does not have, i.e. eof(prd) should be allowed.
[905-7] ADI
(add integers)
Two integers on the top of the stack are added together and the result is left in their place.
[909-19] ADR
, SBI
, SBR
(add reals, subtract integers, subtract
reals)
Similar structure to ADI
.
[921]
SGS
(create singleton set)
A set is created with a single element defined by the integer on the stack.
[923]
FLT
(float integer to real)
Converts the integer in the top stack location to a real number.
[925]
FLO
(float integer to real - lower stack position)
Same as FLT
except it is performed on the element next to the top
of the stack. Having two 'float' instructions allows operators with
a real and an integer operand to be evaluated without putting any
restriction on the order. Wherever FLO
is used, the topmost stack
element will already be a real number.
[927]
TRC
(truncate)
Truncates a real number and converts to an integer.
[929-41] The instructions in this group all involve a single statement and a simple transformation of the stack element. They are briefly:
NGI
negate integer
NGR
negate real
SQI
square of an integer
SQR
square of a real number
ABI
absolute value of an integer
ABR
absolute value of a real number
NOT
logical inversion
[943-9] AND
, IOR
(logical and, logical inclusive or)
These two instructions perform a logical operation on the top two stack locations and leave the result on the top.
[951-61] DIF
, INT
, UNI
(set operations - difference, intersection
and union). Each of the set instructions is performed on the top
two stack locations and the result is left on the top.
[963-6] INN
(test set inclusion)
The uppermost element of the stack contains a set. The next element is an integer and this instruction tests to see if this integer is a member of the set; a boolean value is left on the stack.
[968-70] MOD
(modulo)
The top value of the stack takes the modulo of the next integer on the stack.
[972]
ODD
(test for an integer)
A boolean true value is left on the stack if the integer tested is odd otherwise false will be left.
[974-88] MPI
, MPR
, DVI
, DVR
Each of these instructions involves an arithmetic operation on the top two elements of the stack. They are as follows:
MPI
multiply integers
MPR
multiply real numbers
DVI
divide integers
DVR
divide real numbers
[990-4] MOV
(move)
The stack contains two addresses, the top address points to the first of a number of storage locations that are to be moved. The second address on the stack is the start of the reception area for the moved locations. q defines the number of locations to be moved.
[996-998] LCA
(load address of constant)
q is an address of a string constant, this is pushed onto the
stack. This is the address version of LCI
.
[1000-1] DEC
(decrement)
The complementary instruction to INC
. The integer on the top of the
stack is decremented by the value of q.
[1003] STP
(stop)
The terminating instruction. Apart from a run time error, this is the only way to exit from the interpreter.
[1005-10] ORD
, CHR
These instructions are not propagated by the assembler.
[1012] UJC
(padding instruction)
This instruction is used only to fill the table of jumps
constructed for XJP
instructions for non-existent selections. If an
attempt is made to execute a UJC
instruction then a run time error
will result.
This completes the code for the interpreter.
Pascal Implementation by Steven Pemberton and Martin Daniels
Copyright © 1982, 2002 Steven Pemberton and Martin Daniels, all rights reserved.