Doing Scheme – The Big Lie

This semester, I started and led a team of 10+ people to complete a basic Scheme interpreter with C++ — it’s up on GitHub as Shaka Scheme, and is still missing macros. It’s also pretty raw, so brace yourself.

People say Lisp is so easy to implement. So easy.” Sure, the idea that “it’s easy” is not up-front and everywhere, but it’s told in whispers around the interwebs:

It’s just SO easy to write a Lisp interpreter!

Actually, I agree. Lisp, in its original form, was extremely simple — it was designed to manipulate what are basically linked lists and do it dynamically, and do it well.

It’s elegant:

  • It has a simple grammar. It’s context-free, and thus, we can use the various context-free parsing algorithms — the particular ones of note are the LL(k) recursive descent (currently used by GCC and LLVM for parsing C++) and LR parsers, which things like yacc and bison generate.
  • Homoiconity is useful. Since everything is a list and all code is basically in the form of S-expressions (lists), everything can be manipulated super easily.
  • Macros are super powerful. Lisp macros work on expressions instead of tokens like C preprocessor macros. This means macros are powerful enough to generate ANY type of code.
  • The implementation is easy. The above examples are a pretty clear example that Lisp can be very minimal.

So yes. Lisp is easy, and fun.

Scheme is not. And this is “The Big Lie” — Scheme is a Lisp, but Scheme is definitely NOT JUST Lisp.

Why is Scheme harder than Lisp to make work?

Scheme is a modern Lisp dialect, first created at MIT, and now on its seventh iteration with the Revised (to the 7th power) Report on the Algorithmic Language Scheme specification (R7RS).

Even though Scheme has a reputation for being minimalist, it’s implementation is Lisp, but on steroids:

  • Macros are not so simple. R7RS Scheme has template macros, which are declarative rather than procedural — in other words, it has its own pattern language that it matches expressions to (similar to Haskell’s pattern matching) and then has its own template expression language as well with certain special semantics.
  • First-class continuations are time-machines. They require you to keep an explicit control stack and have a way to save and restore the state of the control stack. This means the evaluation system must have a way of expressing it as an explicit, reified item. No implicit recursion variables — it must ALL be saved.
  • Circular lists are valid in Scheme. We have datum recursive list notation to thank for that. It’s not covered as a section in R7RS, and, like any details on garbage collection or memory management algorithms, the document is almost silent on the matter itself.

Meta-circular evaluators should go away if you’re doing Scheme — essentially, you’ll have to start playing with an intermediate form like continuation passing style (CPS) in order to solve the continuation problem or have reified continuations through exposing it directly in your evaluation system, and you’ll have to start playing with statically-analyzed scopes to deal with macros.

At this point, your evaluator will probably start looking more like a compiler. Hence, this is why implementations like Chez and Guile compile to native or bytecode already.

Are you sure?

There’s been ways of getting around the two, but according to R. Kent Dybvig (who is a prolific writer on Scheme and has been heavily involved with Chez Scheme), he thinks that meta-circular evaluators have to go:

An iterative approach is necessary because the more straightforward recursive approach of the meta-circular interpreter presented in Chapter 2 cannot properly support continuations or tail calls. … Tail calls cannot be properly supported unless the implementation (at the meta level) supports them properly.

So the “circular” part is sort of problematic — the information required to support continuations should be explicit and not implicit as with recursion in order to make sure that it can be saved and restored as data.

So what?

When people say “Lisp,” what comes to mind? Is it Clojure? Scheme? Common Lisp? I think a lot more about what type of Lisp that people want to talk about because they look the same, but all come out differently in terms of semantics, and even syntax sometimes.

I was fooled by the misnomer that “Lisp interpreters are easy” meant that it still applied to modern Lisps (particularly, Scheme, since its reputation as “the simple modern Lisp” is kind of there). I’m pretty sure that when people say “Lisp,” it’s closer to the original McCarthy Lisp instead of modern Scheme or modern Common Lisp.

Food for thought.

But that Ruby one supports continuations…

Ruby already has it with its own callcc.