How to create "an interpreter of assembler"

credit: ShinuToki @ codewars
We want to create a simple interpreter of assembler which will support the following instructions:

  • mov x y - copies y (either a constant value or the content of a register) into register x
  • inc x - increases the content of register x by one
  • dec x - decreases the content of register x by one
  • jnz x y - jumps to an instruction y steps away (positive means forward, negative means backward), but only if x (a constant or a register) is not zero

Register names are alphabetical (letters only). Constants are always integers (positive or negative).

Note: the jnz instruction moves relative to itself. For example, an offset of -1 would continue at the previous instruction, while an offset of 2 would skip over the next instruction.

The function will take an input list with the sequence of the program instructions and will return a dictionary with the contents of the registers.

Also, every inc / dec / jnz on a register will always be followed by a mov on the register first, so you don’t need to worry about uninitialized registers.

Example

simple_assembler(['mov a 5','inc a','dec a','dec a','jnz a -1','inc a'])

''' visualized:
mov a 5
inc a
dec a
dec a
jnz a -1
inc a
''''

The above code will:

  • set register a to 5 ,
  • increase its value by 1 ,
  • decrease its value by 2 ,
  • then decrease its value until it is zero ( jnz a -1 jumps to the previous instruction if a is not zero)
  • and then increase its value by 1 , leaving register a at 1

So, the function should return

{'a': 1}

I have an answer already for this, but my code was too long for system. I’m trying to make it more efficient. Thank you

Here is my idea (insane):

def simple_assembler(program):
    mov = 'mov'
    inc = 'inc'
    dec = 'dec'
    jnz = 'jnz'
    a = 'a'
    b = 'b'
    c = 'c'
    d = 'd'
    dash = '-'
    ab = 'a b'
    ac = 'a c'
    ad = 'a d'
    ba = 'b a'
    bc = 'b c'
    bd = 'b d'
    ca = 'c a'
    cb = 'c b'
    cd = 'c d'
    da = 'd a'
    db = 'd b'
    dc = 'd c'
    a1 = 0
    b1 = 0
    c1 = 0
    d1 = 0
    count = 0
    list_1 = []
    for string in program:
        if mov in string:
            count += 1
            if a in string.split():
                list1.append(a)
                for el in string.split():
                    for le in el:
                        if le is dash:
                            cob = el.replace('-','')
                            if cob.isdigit():
                                    a1 = -int(cob)
                    if el.isdigit():
                        a1 = int(el)
            if b in string.split():
                list1.append(b)
                for el in string.split():
                    for le in el:
                        if le is dash:
                            cob = el.replace('-','')
                            if cob.isdigit():
                                    b1 = -int(cob)
                    if el.isdigit():
                        b1 = int(el)
            if c in string.split():
                list.append(c)
                for el in string.split():
                    for le in el:
                        if le is dash:
                            cob = el.replace('-','')
                            if cob.isdigit():
                                    c1 = -int(cob)
                    if el.isdigit():
                        c1 = int(el)
            if d in string.split():
                list1.append(d)
                for el in string.split():
                    for le in el:
                        if le is dash:
                            cob = el.replace('-','')
                            if cob.isdigit():
                                    d1 = -int(cob)
                    if el.isdigit():
                        d1 = int(el)
            if ab in string:
                a = b
            if ac in string:
                a = c
            if ad in string:
                a = d
            if ba in string:
                b = a
            if bc in string:
                b = c
            if bd in string:
                b = d
            if ca in string:
                c = a
            if cb in string:
                c = b
            if cd in string:
                c = d
            if da in string:
                d = a
            if db in string:
                d = b
            if dc in string:
                d = c
            
        if inc in string:
            count += 1
            if a in string.split():
                a1 += 1
            if b in string.split():
                b1 += 1
            if c in string.split():
                c1 += 1
            if d in string.split():
                d1 += 1
        if dec in string:
            count +=1
            if a in string.split():
                a1 -= 1
            if b in string.split():
                b1 -= 1
            if c in string.split():
                c1 -= 1
            if d in string.split():
                d1 -= 1
        
        if jnz in string:
            count += 1
            for el in string:
                for el in string.split():
                    for le in el:
                        if le is dash:
                            cob = el.replace('-','')
                            if cob.isdigit():
                                    j1 = -int(cob)
                    
                    if el.isdigit():
                        j1 = int(el)
            if j1 < 0:           
                jlist = program[((count -1) + j1): count]
            if j1 > 0:
                jlist = program[count : (count + j1)]
            if a in string.split():
                while a1 > 0:
                    simple_assembler(jlist)
            if b in string.split():
                while b1 > 0:
                    simple_assembler(jlist)
            if c in string.split():
                while c1 > 0:
                    simple_assembler(jlist)
    list2 = set(list_1)
    dict = {}
    if a in list2:
        dict['a'] = a1
    if b in list2:
        dict['b'] = b1
    if c in list2:
        dict['c'] = c1
    if d in list2:
        dict['d'] = d1
    return dict

Break this into smaller parts with object oriented programming
simple_assembler(['mov a 5','inc a','dec a','dec a','jnz a -1','inc a'])

mov a 5
What size register? 32bit? 64 bit?
Your object should contain data structures to represent a register
an int will do

Your object should parse this input - recognize mov, register a, and the value 5 and call an appropriate function in this case mov(register: a, value: 5)
That function should do what you expect to the appropriate register.

As should every other function.
The basic logic here isn’t hard for the functions but the parsing and low level logic can be a real nightmare.

should this object be a separate function? And then use this function within the main function?

Given the above is a loop, we only need one dec a instruction.

dec a
jnz a -1

The loop will terminate when a is zero.

If you make one function for each instruction

def mov(...)
    ...

and associate them with their names a dict
{'mov': mov, ...}
Then you can iterate through the instructions in a single loop, calling each corresponding function and passing them the arguments.

state = {'a': 0, 'b': 0, ...}
for instruction, args in program:
    ops[instruction](state, args)

If your loop won’t be able to trivially read the instructions directly from the input, then before the program does anything else you’d break the input into neat data:

[ ('mov', 'a', 1)
, ('inc', 'a')
, ('dec', 'a')
, ...
]

and don’t test whether data is one thing or another, instead just use the data directly e.g. not:

if x == a then do a
if x == b then do b
if x == c then do c
instead:
do x

1 Like

Since the suggested approach is to use a dict, why not just use a class?

class Assembler(object):
    def __init__(self, asm_list):
        self.asm = asm_list
        self.reg = {
            'a': [],
            'b': [],
            'c': []
        }
    def __repr__(self):
        return "%r" % self.asm
    def mov(self, r, x):
        try:
            self.reg[r] = [int(x)]
        except ValueError:
            raise ValueError
    def get_reg(self, r):
        try:
            return self.reg[r]
        except KeyError:
            raise KeyError

app = Assembler(['mov a 5','inc a','dec a','jnz a -1','inc a'])

print (app)

#app.mov('a', '5')
#print (app.get_reg('a'))

ins = app.asm[0].split()
if hasattr(app, ins[0]):
    if ins[0] == 'mov':
        try:
            app.mov(ins[1], ins[2])
            print (app.get_reg(ins[1]))
        except ValueError:
            raise ValueError
['mov a 5', 'inc a', 'dec a', 'jnz a -1', 'inc a']
[5]
1 Like

I really like this class design you put up here. I tried to do a minor spin off of it.

def inc(self, r):
           self.reg[r] += 1

For some reason, I get a : TypeError: ‘int’ object is not iterable.

1 Like

Yeah I basically tried doing this, but I’m still trying to work some kinks out. If I use a class like @mtf, I set up a function for each command, but that jnz one is something else. I tried creating a new list based soley on the jnz data, for example: If jnz has -1 as the value, I’ll create a list one index backwards from the jnz index. The problem is that the list is created in the middle of the function, while typically you assign a list at the beginning and call the object. Is it possible for an object to iterate over two different lists without resetting its data from iterating through the first list?

The registers are lists, not values (by design) so to mutate a value (rather than overwrite the existing list) we need to include the index.

    def inc(self, r):
        try:
            self.reg[r][0] += 1
        except ValueError:
            raise ValueError

It’s still in the sketching stages, so the first thing I’m attempting to do is get each step to work. The following is the call to inc

ins = app.asm[1].split()
if hasattr(app, ins[0]):
    if ins[0] == 'inc':
        try:
            app.inc(ins[1])
            print (app.get_reg(ins[1]))
        except ValueError:
            raise ValueError

Here is the working sketch…

class Assembler(object):
    def __init__(self, asm_list):
        self.asm = asm_list
        self.reg = {
            'a': [],
            'b': [],
            'c': []
        }
    def __repr__(self):
        return "%r" % self.asm
    def mov(self, r, x):
        try:
            self.reg[r] = [int(x)]
        except ValueError:
            raise ValueError
    def get_reg(self, r):
        try:
            return self.reg[r]
        except KeyError:
            raise KeyError
    def inc(self, r):
        try:
            self.reg[r][0] += 1
        except ValueError:
            raise ValueError
    def dec(self, r):
        try:
            self.reg[r][0] -= 1
        except ValueError:
            raise ValueError
    def jnz(self, r, s):
        pass
        

app = Assembler(['mov a 5','inc a','dec a','jnz a -1','inc a'])

print (app)

#app.mov('a', '5')
#print (app.get_reg('a'))

pc = 0

ins = app.asm[pc].split()
if hasattr(app, ins[0]):
    if ins[0] == 'mov':
        try:
            app.mov(ins[1], ins[2])
            print (app.get_reg(ins[1]))
        except ValueError:
            raise ValueError

pc += 1

ins = app.asm[pc].split()
if hasattr(app, ins[0]):
    if ins[0] == 'inc':
        try:
            app.inc(ins[1])
            print (app.get_reg(ins[1]))
        except ValueError:
            raise ValueError

pc += 1

ins = app.asm[pc].split()
if hasattr(app, ins[0]):
    if ins[0] == 'dec':
        try:
            app.dec(ins[1])
            print (app.get_reg(ins[1]))
        except ValueError:
            raise ValueError

pc += 1

ins = app.asm[pc].split()
if hasattr(app, ins[0]):
    if ins[0] == 'jnz':
        rep = app.asm[pc + int(ins[2])].split()
        while app.reg[rep[1]][0] > 0:
            if hasattr(app, rep[0]):
                if rep[0] == 'dec':
                    try:
                        app.dec(rep[1])
                        print (app.get_reg(rep[1]))
                    except ValueError:
                        raise ValueError
            
pc += 1

ins = app.asm[pc].split()
if hasattr(app, ins[0]):
    if ins[0] == 'inc':
        try:
            app.inc(ins[1])
            print (app.get_reg(ins[1]))
        except ValueError:
            raise ValueError
>>> 
['mov a 5', 'inc a', 'dec a', 'jnz a -1', 'inc a']
[5]
[6]
[5]
[4]
[3]
[2]
[1]
[0]
[1]
>>> 

This is an interesting kata. Following a procedure similar to that laid out by @ionatan , it took two steps: (1) come up with something that works, and (2) refactor to get it under Codewars’ 12 second limit on all tests.
Outline:

def mov():    
    # code for mov
def inc(i):
    # code for inc
... etc

def simple_assembler(inst):
    # set up empty dict for registers, empty list for instruction_list
    # Part 1: one time around instruction set
    for item in inst:        
        # identify the instruction (ins), register (reg) and value (val), e.g., from 'mov d 100'
        instruction_lst.append((ins, reg, val))
    # now instruction_lst is a list of tuples, (ins, reg, val), like this:   [(mov, 'd,' 100), (inc, 'd', 1 ), (jnz 'c' -5)]
    # Part 2: now begin to work through instruction list       
    i = 0 
    while not i >= len(inst):        
        # unpack instruction, register, value from instruction_lst[i]        
        # call instruction
        ins(registers, reg, val)  # this calls functions defined at the beginning
        i += 1        
    return registers

As is usually the case, my posted code can’t be used as is and is only there to convey the idea.

My code does iteration (for-loop) but that’s not at all what a cpu does. Did you read that program counter wiki page? For one thing it mentions overwriting the program counter (therefore it’s not iteration) and for another it makes an excellent suggestion on where to store it (just another register)

You can have as many iterators as you want (they are values), the only way you’ll run into trouble is if you modify the underlying data (you should typically consider this an action that invalidates an iterator, but lists do not enforce this)… But yeah, you don’t wanna iterate anyway.

classes

For a lot of cases I will argue against using classes. I don’t see how they are actually doing anything here.

What classes are especially good for, in python anyway, is to create objects with specific behaviour, like making them iterable/callable/addable/stringifyable/indexable/whateverable

If you have a dict containing all the state, then you can pass in that + instruction to the operation function, and that operation will be free to do anything. If the program counter is contained in that dict as part of the overall state then jnz is trivially implemented.

Don’t… do that. That’s a no-op

You can save some nesting by not doing these as well:

If the code has invalid instructions then the program should crash, no point testing.

These instructions should be carried out one after the other shouldn’t they? Carry one out, read next instruction, carry that one out, and so on, instead of carrying out one instruction, and as part of it, look at the next. The nesting seems weird to me. And, all their logic should be inside their respective functions anyway, right.

1 Like

Definitely glad to receive your input, Jonatan. Just making it interesting enough was my side goal.

This is a raw sketch, willing to break a few rules as long as the flow makes it to the last line. The ultimate goal will be to make it usable in a general way, but that would be a ways off. Even getting it to run on just the current instruction list will be sufficient. Especially if that makes sense.

The nested stuff was all part of the sketch. Still need to construct a jnz method, but have this to learn from.

We can’t really jump around in memory, and using Python logic it made the most sense to me to use the instruction list as a jump to point, given an offset from the current instruction. I’m kind of fixated on that concept in this sketch.

Sure appreciate your input. I’ll give all this some more thought over the next few days.

that’s what a function call does, isn’t it?
hardware, python, this assembler language - they all jump by setting the program counter to the next place to execute

1 Like

If you remove pretty much all the if-statements that aren’t part of an instructions own logic

Then, just execute the code itself.
The code starts out as text, but you can replace the instruction names with functions, so for example 'mov a 1' becomes

# mov is a function
   |
   v
(mov, 'a', 1)

… this is still code, but now you can execute it because you parsed the text 'mov' into its function equvalent: instruction[0](instruction) You can send the instruction to the instruction itself and let it deal with things like how many arguments there are and what they mean.

So pass the entire instruction element and parse it then? That makes sense.

No you’d parse all the other values too to keep the instruction code itself to a minimum. See for example that I converted the 1 to an integer.
But you would for example not care about how many arguments you need to send to mov, so your main loop would need no specialized knowledge about individual instructions, instead it’s just:

while 1:
    read instruction
    execute instruction

There’s no need to figure out which function to call. It has already been parsed at an earlier stage. Part of the instruction is a function.

1 Like

Not to make a point; just a preliminary follow-up on your earlier suggestion…


ins = app.asm[pc].split()
rep = app.asm[pc + int(ins[2])].split()
while app.get_reg(rep[1]) > 0:
    app.dec(rep[1])
    print (app.get_reg(rep[1]))

I’ve modified the get_reg method.

return self.reg[r][0]

The design is clunky, but for now it is a definitive container of the register value. I’d rather start with [] than 0 or "".