Prev Up

All the data types discussed here can be lumped together into a single all-encompassing data type called the s-expression (s for symbolic). Thus 42, #\c, (1 . 2), #(a b c), "Hello", (quote xyz), (string->number "16"), and (begin (display "Hello, World!") (newline)) are all s-expressions.

Prev Up

S-expressions are a really cool idea. They're sometimes known as S expressions or even Symbolic expressions. The first reference that I've seen for them is John McCarthy's paper from 1960 called Recursive functions of symbolic expressions. What can we conclude? Not much, so far, other than they're an old idea that, like all good old ideas, is still in use today.

But this doesn't help explain what an S expression is. So here's the definition. An S-expression is:

  1. an atom
  2. (S expression . S expression)

Ok, this is weird - so let's start with line 1 - what's an atom? Like atoms in the world of science, here an atom is an indivisible unit. Do all atoms have to have the same structure? No! Some atoms can be integers, other atoms could be floating point values, other atoms might be character strings. Yes, alright, I get it, they're all represented as sequences of characters, but they are conceptually different and may behave differently in the same way that oxygen atoms behave differently than potassium atoms.

Line 2 is odd unless you understand recursion. But all it is saying is that an S-expression, if it isn't an atom is made up as the combination of two S-expressions. This ends up building a binary tree where the nodes look like line 2 and the leaves of the tree look like line 1.

Why would all this matter? This very simple data structure can model any other data structure which is pretty neat. Peter Landin described something called the SECD machine which is a very simple model of a computer. It can compute everything that is computable and where every program is defined by 4 S-expressions. But, unlike journalism I saved the best for last. There are programming languages, notably lisp where the program itself is an S-expression.

This is revolutionary, if a program is an S-expression, and data is an S-expression, then a program is just data!. If the program is compiled into the SECD machine instruction set, then the object code is just data. With S-expressions as the representation, there's no distinction between program and the things the program operates on. It's all just data!

Log in or register to write something here or to contact authors.