compiler-construction


How to track the index of variables in an interpreter


I'm creating an interpreter (a bytecode interpreter, so also a compiler) and I've found a problem I just can't solve. I need to store variables somewhere. Storing them in a dictionary and looking them up at runtime would be way to slow, so I'd like to store them in registers and use their indexes instead of their name.
So at compile time I give every variable an index, and create an array of registers. That's fine for monolithic scoped languages. But the language I'm creating the interpreter for has nested scopes (and function calls). So another approach could be that I have a set of global registers, and a stack of register lists for function calls. So my virtual machine would have something like:
Register globalRegisters[NUMBER_OF_GLOBALS];
Stack<Register[]> callstack;
But there's another thing. My language allows functions inside functions. Example:
var x = 1;
function foo() {
y = 2;
function bar() {
z = 3;
y = y - 1;
}
}
Function bar() refers to a variable that belongs to foo(). So that means that the virtual machine would have to look at the register list under the top one on the stack. But what if bar() is recursive? What if the number of recursions are defined by user input? Then the virtual machine just wouldn't know how many stack elements does it have to go under to find the set of registers containing the value of y.
What could be an effective solution for this problem? This is the first time I'm dealing with registers, calculations happen on a value stack.
The usual way to represent closures is to create a struct that contains both the function pointer itself as well as its environment. The simplest way to represent the environment would be as a pointer to the stack frame of the outer function. The inner function could then simply dereference that pointer (with the offset of the given variable of course) when accessing variables of the outer function.
However there's another problem you have to consider: What if foo returns bar and then bar is called after foo already returned? In this case a pointer to foo's stack frame would be invalid as that stack frame would no longer exist at that point. So if you want to allow that scenario (instead of simply making it illegal to return functions), you need another solution.
One common solution would be to represent all local variables as pointers to heap-allocated values and storing copies of all those pointers in the function's struct.
Another solution would be to restrict closures, so that variables of the outer function can only be accessed by the inner function if they're never re-assigned. That's what closures in Java 8 do. Then the struct could simply contain copies of the variables instead of pointers.
I think the underlying question here is very different from the one you have written, so I've written an explanation to why I think the question is malformed and an answer to what I think the underlying question is. If I am mistaken I apologize, but bear with me a bit :-)
The problem with the question
I'm creating an interpreter (a bytecode interpreter, so also a compiler)
An interpreter is not a compiler, even if it's for a low-level language - unless you meant that your program both compiles some language to some bytecode interpretation, and then interprets it. In any case, unless you're jitting code, the program that is actually running is the interpreter.
Storing them in a dictionary and looking them up at runtime would be way to slow, so I'd like to store them in registers and use their indexes instead of their name.
In an interpreter, forcing target language variables into registers doesn't smell right to me. For instance, say you have a method for interpreting a specific statement which uses variables. You can pull the variables quickly since you force them into registers, but then you have too few registers for running the operations in your own method efficiently. Also, saying "yea, I'll just store those in registers" makes me suspect you great overestimate the number of registers available to you.
I'm guessing "registers" here is a misnomar and you just care about some efficient way of storing and accessing locals, in the presence of nested scopes and recursion. So I think your question can really be phrased as "I want some data structure storing locals, how do I do that in the presence of nested scopes and recursive functions?" If I'm wrong I'm sorry, but if not:
My Answer
To answer "I want some data structure storing locals, how do I do that in the presence of nested scopes and recursive functions?", I think it's best to first clarify the distinction between scopes and frames, in the context of locals.
A scope is some mapping of identifiers to local variables. Inside a scope, you know all the instances of the x identifier refer to the same thing (roughly). A scope is something you care about when you're parsing the input language - it's what you use to understand the semantic of the code ("oh, the x the coder is incrementing is the same x from 2 lines ago").
A frame is the memory allocated (typically on the stack) when calling a function. Each local usually gets a reserved place on the frame to store its value.
When you parse the code, dealing with scopes, you don't care about recursion (since you're not running anything, just parsing). You do care about nested scopes, but those are never unbound - since the code itself (not its execution, just the code) is always finite. The standard way to deal with locals in scopes when parsing is to keep of stack of dictionaries. Create and push a new dictionary when a scope is opened, pop it when it's closed. Whenever x is accessed, look for it in the dictionary at the topmost dictionary in the stack - if not there, continue to the next one, and so on.
The code you generate (or straightforward perform, in an interpreter) will then know exactly to what location each instance of x is referring to. And those memory locations will be allocated when creating a frame. That way you also don't care about recursion - you have mapped variables to locations in the current frame, and that is valid regardless of where that frame was called from.
Finally, a word about closures
In all the languages I can recall right now, closures work by capturing enclosing variables at definition time. In Java, for example, every local accessed in an inner class that belongs to an outer class will, in practice, be passed to the inner class at the moment of its creation - think of it as just another argument to the construction of the inner class. C++ is more explicit about which variables it captures, but otherwise it works the same - the lambda object just gets those variables (by value or by reference, depending on the directive) passed to it on creation. In any case the captured object is distinct from the original object after it was captured (they may be both pointers to the same place, but that does not make them the same object), so it shouldn't be hard to parse.

Related Links

What do you think about interpreter of usually compiled languages?
Implementation of ll(k) to ll(1) convertor !
What are the highest level languages that can be compiled?
Unreachable code: error or warning?
does assembler output differ between operating systems?
Anywhere I can find good LR(1) and LALR(1) state generation examples or reading material?
How can I modify the text of tokens in a CommonTokenStream with ANTLR?
Understanding compiler error messages [closed]
Generating Assembly For an x86 Processor
Building interference graph for register allocation
GLR parsing algorithm resources
Do any languages have neither an interpreter nor a compiler?
How to parse an if statement in bison
not clear with the job of the linker
What happens to identifiers in a program?
Compiling multiple languages together to make them run on different platforms

Categories

HOME
pandas
cloud
atom-editor
fluentd
alfresco
gis
at-command
malloc
disassembler
angular-ui-bootstrap
windows-server
echarts
applepay
nano-server
row
quicklook
modelica
apache-cayenne
decomposition
pugjs
http-status-code-504
lucene.net
alpine
nodatime
jtextfield
accessor
extjs5
dynamic-featured-image
traffic
tooltipster
buck
msys2
tasklet
srcset
madlib
bcd
yadcf
airconsole
copying
code-contracts
vsts-build-task
protovis
gammu
io-redirection
suricata
xcode-extension
lto
libvpx
theano.scan
splice
mapzen
elmah
texmaker
thin
long-polling
jsch
appcompat
revapi
zip4j
tizen-native-app
tropo
drupal-6
colorama
darcs
diagnostics
r-forge
redundancy
chord-diagram
quartz-composer
pycaffe
abcpdf9
deadbolt-2
nxt
phishing
skobbler-maps
transmitfile
mathematica-frontend
autorest
player
captivenetwork
pagerank
browser-link
xna-4.0
intel-fortran
fouc
android-listview
sailfish-os
directoryservices
clicktag
valueconverter
c18
flash-builder4.5
ril
sabredav
drools-planner
android-hardware
cassini-dev
angularjs-controller
coderush
errai
cufon
php-parser
stage
hirefire
file-comparison
mozilla-prism
adrotator
avatar
privilege

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App