Nested variable scoping

Jan 18, 2017

Overview

How does a compiler translate a source language with nested lexical scopes to a destination language with only immediate scope (e.g. x86 assembly, or a hypothetically constrained javascript[1])?

To handle the case of a variable that is declared in an outer scope of a function, the compiler passes a static link when a function is called. The static link can be used by the compiler to unwind the call stack of the function to find the variable's position where it was originally declared.

Explanation

For example, if we have something like the following, with a variable x in the outermost scope, a function f that refers to x directly, and then a different scope, g, that calls f:

var x = 1;
function f() { 
  return x + 1; 
}

function g() { 
  var x = 10; 
  return f(); 
}
g(); //=> returns 2

At runtime, when g has called f, the stack looks like the following (growing downward):

*OUTERMOST FRAME*
X => 1
*G FRAME*
X => 10
*F FRAME*

Here, we need the call to f to be able to resolve x in the outermost scope (i.e. 1). At compile time, we know that when x is referred to by f, the nearest enclosing x is actually part of the outermost scope.

Keep in mind that in our target language, our functions cannot have references to other scopes, so in the body of f, we can’t simply refer to outermost_scope; however, we can pass in a reference to outermost_scope.

Let’s make the scopes explicit:

var outermost_scope = {};
outermost_scope.x = 1;

function f(hopefully_the_correct_scope) { 
  var f_scope = {};

  // How do we get outermost_scope here?
  return hopefully_the_correct_scope.x + 1;
}

function g(hopefully_another_correct_scope) { 
  var g_scope = {};
  g_scope.x = 10; 
  return f(/* some other scope? */); 
}

g(/* some scope? */);

The problem we’re trying to solve is how do we pass f and g the correct scopes, such that hopefully_the_correct_scope in f is actually outermost_scope?

Since there is only a single outermost scope in the whole program, we can trivially solve our problem by simply passing outermost_scope through every function call:

var outermost_scope = {};
outermost_scope.x = 1;

function f(outermost_scope) { 
  var f_scope = {};
  return outermost_scope.x + 1;
}

function g(outermost_scope) { 
  var g_scope = {};
  g_scope.x = 10; 
  return f(outermost_scope); 
}

g(outermost_scope); //=> returns 2

That’s not very interesting, but the idea of passing around scopes is useful.

Let’s make things more interesting by adding another layer of functions:

var x = 0;
function a() {
  var x = 10;
  
  function b() {
    var y = "a";
    return x + 5;
  }
  
  function c() {
    var x = 100;
    return b();
  }
  
  return c();
}

a(); //=> returns 15

Now let’s look at our stack at it’s deepest, when b is invoked:

*OUTERMOST FRAME*
x => 0
*A FRAME*
x => 10
*C FRAME*
x => 100
*B FRAME*
y => "a"

Now, b will need a reference to a’s scope, but because a is a function, it could be invoked at multiple places on the stack, so it isn’t a constant that we can know ahead of time. Let's see if passing outermost_scope through to every function solves our problem here:

var outermost_scope = {};
outermost_scope.x = 0;

function a(outermost_scope) {
  var a_scope = {};
  a_scope.x = 10;
  
  function b(outermost_scope) {
    var b_scope = {};
    b_scope.y = "a";
    return outermost_scope.x + 5; // Uh oh! We actually wanted a_scope.x
  }
  
  function c(outermost_scope) {
    var c_scope = {};
    c_scope.x = 100;
    return b(outermost_scope);
  }
  
  return c(outermost_scope);
}

a(outermost_scope); //=> actually 5, but we wanted 15

That didn't work! Inside the body of b, we actually want a reference to a_scope, not outermost_scope. Fortunately, every location where b can be called will be at a fixed offset from a’s scope. This means that we can look at every call of b and figure out how many layers we need to unwrap to get to a’s scope. In our example, the call to b will always have one extraneous scope in between b and a, so we need to go up two levels.

If we were to add a call to b(); just before return c();:

var x = 0;
function a() {
  var x = 10;
  
  function b() { ... }
  
  function c() { ... }

  b();  // Let's look at the call stack during this invocation.
  return c();
}

a();

that particular invocation would always have a spot on the stack directly below a’s scope:

*OUTERMOST FRAME*
x => 0
*A FRAME*
x => 10
*B FRAME*
y => "a"

In that case, to find the value of x inside the body of b() we’d only need to go up a single level.

One way the compiler can keep track of how many levels we need to go up the call stack is known as a static link. A static link is an invisible (to the programmer) argument that is passed to each function that points to the scope in which the called function was defined.

When the compiler encounters a function call, it will compute how many layers it needs to unwrap to get the scope in which the called function was defined, then emit code that always passes that scope as an argument (the static link) to that function. Then at runtime, the function can use that scope for variable and function lookups as needed.

We can make this visible and see how it looks after our computation:

var outermost_scope = {};
outermost_scope.x = 0;

function a(a_parent_scope) {
  var a_scope = {}
  a_scope.parent = a_parent_scope;
  a_scope.x = 10;

  function b(b_parent_scope) {
    var b_scope = {};
    b_scope.parent = b_parent_scope;
    return b_scope.parent.x + 5;
  }
  
  function c(c_parent_scope) {
    var c_scope = {};
    c_scope.parent = c_parent_scope;
    
    function d(d_parent_scope) {
      var d_scope = {};

      // By adding the static link machinery, the compiler is ensuring
      // that d_parent_scope is always c_scope, so while we can't refer
      // to c_scope directly, whenever we need it, we'll use
      // d_scope.parent.
      d_scope.parent = d_parent_scope;

      // We know that b is defined in a_scope, which we can't refer to
      // directly, but we know is always the scope that is the parent
      // of c_scope, which we're referring to as d_scope.parent, so
      // a_scope can be referred to as d_scope.parent.parent.
      return b(d_scope.parent.parent); 
    }
    d(c_scope); //=> 15

    // We use c_scope.parent because we know that b and c are defined in
    // a_scope, which in the body of c, we can only refer to as
    // c_scope.parent.
    return b(c_scope.parent); 
  }

  // We use a_scope because that's the scope in which b is defined.
  b(a_scope); //=> 15

  // We use a_scope because that's the scope in which c is defined.
  return c(a_scope);
}

a(outermost_scope); //=> 15

This example has parent as an explicit property on each scope to make it clear that our references are legal (i.e. not jumping scopes), and to illustrate that we can keep refering upwards to ancestor scopes with this mechanism (parent.parent...).

Look specifically at our call to b(d_scope.parent.parent). We know at compile time how far away d_scope is from b’s enclosing scope (a_scope), so we know how many .parents to unwrap so that we pass our equivalent reference to a_scope to b as b_parent_scope.

Now we have a straightforward, if repetitive, way to handle nested scopes in a target language that doesn’t support enclosing references. Adding this argument to every single function call is suboptimal if the child function never needs access to the parent scope, but we’ll leave that optimization for later.

If you found this interesting or if I could have stated something more clearly, let me know on twitter or shoot me an email.

“these examples use javascript both as the compiled and target language simply for brevity and illustration; to approximate the issues of a target language, we’ll hand-wave and assume that our target javascript doesn’t have the features that we’re trying to implement, but has all the other features that our implementation depends on.” ↩