Backlog

by Ter13
Welcome to my brain's backwash
ID:2340092
 
The purpose of this document is to collect my notes on virtual machine design. I find that it is often helpful for me to explain things as though I'm talking to someone. Sometimes I find myself bothering my friends on discord with shit they don't care about just so I can wrap my head around a concept. This is rude, and I feel bad about doing it, so I'm just posting it here to one of my private forums as though it were a tutorial. If you've stumbled across this, feel free to use anything you find useful. As though anything in this document is useful to anyone.

What is a Virtual Machine?

A virtual machine is an emulation of a computer system. This basically means that we're creating a facsimile of physical hardware operations in software. DM's language interpreter is fundamentally a Virtual Machine.

Basic Interpreter Structure

A simplified interpreter will have the following properties:

  • block_ptr: A pointer to a memory region that contains a series of instructions.
  • offset: The interpreter's current offset, or which instruction in the block_ptr is currently being read.
  • memory_ptr: A block of memory allocated for the interpreter's current longterm storage state.
  • stack_ptr A block of memory representing the interpreter's current shortterm storage state.
  • stack_len A value indicating how large the stack currently was at the depth increase.
  • stored_jump A value indicating a pending offset to jump to.


A few points of interest here, the block pointer references a linear sequence of commands. Commands are operations. Operations may have operands. Operands are like values that an operation uses to mutate data.

The memory pointer stores pointers to values sequentially. The compiler may be given the names of the memory locations, but the interpreter does not need them. Instead, they are boiled down to numeric offsets starting at 0.

The stack pointer stores pointers to values sequentially. Stack structure can be generalized as:

<return_value> <break_offset> <this> <local memory> <storage>

Standard Operations

key:

<label> - remove value from stack and use it

<(label)> - use value from stack without removing it

[label] - peek into memory

(+X) - the next instruction from block_ptr following this one

X:Y - Y of X.

Stack Management:
  • RTN return: pop the last value from stack, set <return_value>,
    and decrease stack depth.
  • PEK peek: add the last value in the stack to the stack again
  • POP pop: remove the last value in the stack
  • PSH (+1) push: add a constant value to the stack using the next instruction in the code block as the operand
  • IVK invoke: (operands: POP POP POP) increase stack depth, PUSH null (return_value), PUSH offset+1 (break_point), PUSH operand1 (this), PUSH operand2 (local memory), set offset to operand3


Memory region 0 = global
Memory region 1 = local

General Memory Management:
  • ACC (+1) (+2) access: PUSH memory region operand1's:operand2.
  • ASG (+1) (+2) assign: Set memory region operand1:operand2 to PEEK.


Local memory is flexible and can expand and shrink as needed.

Local Memory Management:
  • DCL (+1)declare: Add a number of values to local memory
  • UDC (-1) undeclare: Remove a number of values from local memory


Valuation:
  • GET (+1)get: Get property POP:operand2
  • SET (+1) set: Set property operand1:POP to PEEK.
  • IDX index: PUSH POP[POP]


Abstract patterns like goto, if/else, while, for, switch, break, and continue are all compilation patterns that utilize jumping. Jumping moves the current offset by a positive or negative number.

Jumps:
  • JMP (+1) jump: add operand1 to offset.
  • CND (+1)conditional jump: POP operand2. If operand2 false, add operand 1 to offset.
  • CSJ (+1) set jump point: set interpreter's stored jump point to offset + operand1
  • CSE (+1) case equality jump: PEEK operand2. If operand2 is equal to operand 1, POP, then go to jump point stored by last CSJ
  • CSR (+1) (+1) case range jump: PEEK operand3. If operand3 is greater or equal to operand 1,
    and less or equal to operand 2, POP, then go to jump point stored by last CSJ.


All single-operand operators POP, perform their operation, then PUSH.

Single Operand:
  • OP_BNT operator binary not: ~operand1
  • OP_LNT operator logical not: !operand1
  • OP_NEG operator negation: -operand1
  • OP_INC operator increment: operand1+1
  • OP_DEC operator decrement: operand1+-1


All dual-operand operators POP, POP, then perform their operation, then PUSH

Dual Operand:
  • OP_EXP operator exponent: operand2 ** operand1
  • OP_MUL operator multiply: operand2 * operand1
  • OP_DIV operator divide: operand2 / operand1
  • OP_MOD operator modulo: operand2 % operand1
  • OP_ADD operator add: operand2 + operand1
  • OP_SUB operator subtract: operand2 - operand1
  • OP_EQ operator equality: operand2 == operand1
  • OP_NEQ operator inequality: operand2 != operand1
  • OP_LT operator less than: operand2 < operand1
  • OP_GT operator greater than: operand2 > operand1
  • OP_LEQ operator less or equal: operand2 <= operand1
  • OP_GEQ operator greater or equal: operand2 >= operand1
  • OP_RSH operator right shift: operand2 >> operand1
  • OP_LSH operator left shift: operand2 << operand1
  • OP_BA operator binary and: operand2&operand1
  • OP_BO operator binary or: operand2|operand1
  • OP_BX operator binary xor: operand2^operand1
  • OP_LA operator logical and: operand2&&operand1
  • OP_LO operator logical or: operand2||operand1



Abstract Pattern Examples

key:

[SOMETHING] - Whatever bytecode has compiled from SOMETHING region in the code snippet.
<position> - a reference to a jump point. Compiles down as whatever offset is required to jump the instruction offset to where the matching <position> is in the pattern example.</<
ternary operator pattern:

CONDITION ? CLAUSE1 : CLAUSE2


[CONDITION]
CND <else>
[CLAUSE1]
JMP <end>
<else>
[CLAUSE2]
<end>


if patterns:

if(CONDITION)
BODY


[CONDITION]
CND <clause2>
[BODY]
<clause2>


if(CONDITION)
BODY1
else
BODY2


[CONDITION]
CND <clause2>
[BODY1]
JMP <clause3>
<clause2>
[BODY2]
<clause3>


if(CONDITION1)
BODY1
else if(CONDITION2)
BODY2


[CONDITION1]
CND <clause2>
[BODY1]
JMP <clause3>
<clause2>
[CONDITION2]
CND <clause3>
[BODY2]
<clause3>


if(CONDITION1)
BODY1
else if(CONDITION2)
BODY2
else
BODY3


[CONDITION1]
CND <clause2>
[BODY1]
JMP <clause4>
<clause2>
[CONDITION2]
CND <clause4>
[BODY2]
JMP <clause4>
<clause3>
[BODY3]
<clause4>
while() loop patterns:

while(CONDITION)
BODY


<begin>
[CONDITION]
CND <end>
[BODY]
JMP <begin>
<end>


do
BODY
while(CONDITION)


<begin>
[BODY]
[CONDITION]
CND <end>
JMP <begin>
<end>
for() loop patterns:

for(PRE,CONDITION,POST)
BODY


[PRE]
<begin>
[CONDITION]
CND <end>
[BODY]
[POST]
JMP <begin>
<end>


for condition pattern:

for(CONDITION)
BODY

//syntactic sugar for:

while(CONDITION)
BODY


for range pattern:

for(w in x to y)
BODY

//syntactic sugar for:

for(w in x to y step z=1)
BODY

//syntactic sugar for

for(w = x, w>=x&&w<=y , w += z)
BODY

//see standard for pattern.


for in pattern:

for(x in y)
BODY

//syntactic sugar for:

for(var/z = y.Copy(), var/w=1, var/v = length(z); w<=v; w++)
x = z[w]
if(!istype(x)) continue
BODY
Switch patterns:

switch(EVALUATE)
if(1)
CASE1
if(2-10)
CASE2
else
CASE3


[EVALUATE]
CSJ <case1>
CSE 1
CSJ <case2>
CSR 2 10
[CASE3]
JMP <end>
<case1>
[CASE1]
JMP <end>
<case2>
[CASE2]
<end>


switch(EVALUATE)
if(1,5,11-20)
CASE1
if(2-4,6-10)
CASE2


[EVALUATE]
CSJ <case1>
CSE 1
CSE 5
CSR 11 20
CSJ <case2>
CSR 2 4
CSR 6 10
JMP <end>
<case1>
[CASE1]
JMP <end>
<case2>
[CASE2]
<end>
continue and break pattern:

while(CONDITION1)
BODY1
if(CONDITION2)
BODY2
continue
else if(CONDITION3)
BODY3
break
BODY4


<beginwhile>
[CONDITION1]
CND <endwhile>
[BODY1]
[CONDITION2]
CND <clause2>
[BODY2]
JMP <beginwhile>
<clause2>
[CONDITION3]
CND <endif>
[BODY3]
JMP <endwhile>
<endif>
[BODY4]
JMP <beginwhile>
<endwhile>
functions:

functions are identified by a specific address, and must always end with a return.

Local memory is determined by counting the number of arguments in the function header, plus the greatest total number of simultaneous variables declared during the function.

Variable declaration is always done at the top of a function, and undecleration is always handled at the bottom of a function.

function(a,b,c) //3 arguments (local variables 1,2,3)
var/d //local variable 4
var/e //local variable 5
var/f //local variable 6
if(CONDITION)
var/g //local variable 7
var/h //local variable 8
var/i //local variable 9
BODY
var/j //local variable 7, because 7,8,9 are assumed undeclared
var/k //local variable 8
var/l //local variable 9


What this winds up looking like:

DCL 9
[CONDITION]
CND <endif>
[BODY]
<endif>
RET (UDC 9)


Undeclaration is implied. It does not need to be specified, as the return operation sets the return value in the stack and empties the stack down to the return value.
resolving local jumps:

The compiler stores labels in a hashmap. The temporary name of the label is associated with a line number to jump to.

If a jump references an unresolved label, it will store it in a second hashmap which associates a code offset with a string.

When a function has completed compilation, the unresolved jump hashmap is iterated through. The label is looked up in the resolved label hashmap. If the label does not resolve, a compilation error will occur. If the label does resolve, the code offset stored as unresolved is set to the code offset of the identified label. Once all labels have been resolved successfully, function compilation is considered to be finished.