Play With Lua!

Making a toy programming language in Lua, part 4

without comments

This is part four of my series on writing a toy programming language using LPeg. You should start with part one here.

Last time, we used the new AST generation to make conditionals and loops. Now, we’re going to add the ability to define functions, complete with lexical scoping.

Parser changes

The first thing to do is to modify the parser to recognize a function definition. We’re going to add two more options to STMT, to represent a function call and a function definition:

STMT = 
    Ct( Cc("assign") * V("REF") * "=" * spc * V("VAL") ) +
    V("IF") +
    V("WHILE") +
    V("CALL") +
    V("FUNCTION") +
    V("EXPR"),

Notice that we’re also moving EXPR down to the bottom, because a CALL could look like a malformed EXPR, and LPeg, with no backtracking, could get confused. With a little more work we could factor EXPR out of this completely, but this is fine for now.

Functions and calls can also be values (calls can be used in expressions, definitions can be assigned to variables), so we’ll modify VAL as well:

VAL = V("CALL") + V("ARRAY") + V("FUNCTION") + V("EXPR")

So, what do CALLs and FUNCTIONs actually look like? We’re going to use a Javascript-like syntax:

x = function(a){ 2*a+3 }
x(5)

We don’t need much extra syntax for this since it already look just like an “if” or “while” statement from the last version:

CALL = Ct( Cc("call") * V("REF") * spc * lparen * V("VAL_LIST") * rparen ),
FUNCTION = Ct( C("function") * spc * lparen * V("NAME_LIST") * rparen * V("LIST") ),
NAME_LIST = Ct( (name * comma^0)^0 )

The only special nonterminal we need is NAME_LIST, to be a list of parameter names in a function declaration. The calls use VAL_LIST just like arrays.

Scopes

We want our functions to have their own scopes. And, since functions can return functions (functions can return any value), we want them to capture things defined in their containing scope in a closure. Basically, we want a Graham accumulator to work:

accum = function(){
  a = 0;
  function(n){
    a = a+n;
  }
}

acc = accum()
acc(1) // returns 1
acc(1) // returns 2

When you call accum, a scope is created and a new variable “a” is declared in it, then a new function is returned. When that function references “a”, it’s referencing the one from the scope in which it was declared.

In order to make this work, we’ll need to make some architectural changes to the way we do variables. We used to have a single table called VARS; now, we’re going to have a chain of tables going from the outer, global scope to the innermost scope that we’re evaluating in now. When we define or reference a variable, we’ll find the one in the narrowest scope possible.

So, replace the declaration of VARS with this:

Scopes = { {} }

“Scopes” is a list of tables with one, empty, table in it representing the global scope. When we create a new scope, we’ll make a table that has the last entry in Scopes as its metatable __index, so that anything we try to look up, Lua will automatically search the entire scope chain.

Everywhere we refer to VARS, replace it with Scopes[#Scopes] and lookups will work. There’s one more thing we have to do for assignments though: if you assign a value to a variable that exists in an outer scope, we want to alter that variable, not make a new one in the inner scope. So, in assign, we need to add this special case before we set the value:

while current[next_index] and not rawget(current, next_index) do
    current = getmetatable(current).__index
end

So, if the variable can be seen from this scope but isn’t defined in this scope, walk back to a scope where it is defined.

Defining functions

Now let’s alter eval some. To define a function, we need to save three things: the names of the arguments (so we know what to assign to what when we call it), the scope it was defined in (so we can find variables that were captured in its closure), and the code to execute (so we know what to actually do to call it). Add this case into eval:

elseif ast[1] == 'function' then
    return { args=ast[2],
             code=ast[3],
             scope=Scopes[#Scopes] }

We can store the code by just storing the parsed AST. That way to run the function we just evaluate the AST.

Calls are a little trickier.

Calling functions

Calling a function requires doing a few steps. First, we need to look up the actual function to call:

elseif ast[1] == 'call' then
    local fn = eval(ast[2])

Then, create a scope for the call. This scope needs to have the current innermost scope as a parent:

    local scope = setmetatable({}, {__index=fn.scope})

Now, we need to define some things in that scope. The function call has a bunch of argument values, and the declaration contains the names we should assign them to:

    for i, name in ipairs(fn.args) do
        scope[name] = eval(ast[3][i])
    end

Then, we add the scope we created to the end of Scopes, and eval the function’s code:

    table.insert(Scopes, scope)
    local result = eval(fn.code)

Finally we remove the new scope and return the result:

    table.remove(Scopes)
    return result
end

The reason we can safely remove the scope is that might not be the only reference to it. If any function was defined in this one and stored in an outer scope (or the return value), then that function will store a reference to scope it was defined in, and the environment of this particular function call will live on, stored in that declaration, until it’s needed again.

Accumulator

So, let’s write an accumulator to test this:

stmt:match("make_acc = function(){ a = 0; function(n) a = a+n; }")
stmt:match("acc = make_acc()")
assert(stmt:match("acc(1)") == 1)
assert(stmt:match("acc(1)") == 2)

If you run this step-by-step you can see the new scope being added for the call, saved in the return value, and then referenced again when you call acc.

Next steps

At this point we’re pretty much finished. You have an interpreter that does the basic elements of a Lua-like language: some control structures, variables, functions, and closures. There are several directions you can go from here:

  • Standard library functions like “print”, “read”, “cos”, and so on. If you do this you can stop accepting EXPRs as statements, and use “print” for output instead.
  • Strings, tables, any other data type besides numbers and arrays of numbers. With closures and tables you can make an OO system pretty easily.
  • Error reporting. This one is kind of tricky, but right now if a program has a syntax error LPeg won’t even tell you what line it’s on. If you modify spc to update a global variable whenever it sees a newline, then you can tell the number of lines you were able to parse before finding a problem.
  • More control structures. “If” is good but it’s weak without “else”. If you have tables you could add one to the scope stack with a “using” statement. And you may want to create a “try” / “catch” system to handle runtime errors (which there could be several of in this language, like accidentally CALL-ing a number)
  • Literally any other feature you can think of. The point of writing a new language is to try out new ideas in languages. If there’s a language construct you want to play with, try to implement it here, see how well it works. Trying to implement it is also a great way to find any problems you may not have thought of.

The final code is here. Have fun!

Written by randrews

June 21st, 2015 at 3:54 pm

Posted in Uncategorized