XML, JSON, trees and LISP

2016-08-17

This post is a run-through of some of my thoughts around the formats we surround ourselves with, when developing networked systems - especially on the web but elsewhere as well.

As you may have realized, my web site is antisocial. I do not have a comments section and this is a deliberate choice. If, however, you wish to provide insights or factual corrections, I very much welcome them via e-mail.

Trees... it's all about trees

A web page is, behind the scenes, a DOM tree in the rendering engine of the browser. Usually when we want to serialize that tree, we serialize it to XML. This is not because XML is in some way superior to other representations, it is more an accident of history.

Back in the very early days of the web when HTML was created, it was created solely for markup and did not describe formal tree structures. For example, interlacing elements were allowed; <b>this <em>was</b> valid;</em>.

During the '90s browser wars, one of the things that came out of the standardization effort was that someone (W3C I suppose) realized that transforming HTML from very unstructured markup into a stricter structure and giving it a new name with some resemblance to the original, would probably be necessary if anyone were to ever produce standards of what browsers should support and how they should do it. Thus XML was born.

XML is a properly strict specification with clear rules of what is allowed and what isn't, very much unlike the days of ad-hoc HTML. In XML, interlacing elements as in the previous example, for example, are not allowed. Unlike the early HTML, XML only allows for actual tree structures - for example:

<body> <title>Hello world</title> <p>Some text</p> </body>

or as a graphical representation:

If you actually look at the XML, you'll notice that it is very verbose. The closing tag repeats the name of the element that is being closed, even though though this is completely unambiguous (as elements cannot be interlaced). As long as nobody has to type or read XML, and as long as the cost of networked transport or storage of XML documents is not important, this is all well and fine. But I think most people will agree, that one could think of shorter ways of representing trees.

One example, of course, is S-expressions. These date back to the 1950s, but they are every bit as fine for representing trees as XML is - the same tree would look like this;

(body (title Hello world) (p Some text))

While this style of formatting may seem alien to some, it is not too difficult to get used to either. For one, it is a lot more compact than XML and therefore can be read and written a lot faster by someone skilled.

At the time of the inception of HTML, nobody could of course know where HTML would eventually go; I am not saying that Tim Berners-Lee and his associates did not do well, or that W3C did a poor job of formalizing HTML into XML - but it should be clear that if your goal is to serialize a tree structure into readable text, XML is a far stretch from being compact or easy on the eyes. And it is indisputable that more compact representations predate XML by at least three decades.

So what's in a tree anyway?

Trees hold a lot more than just DOM structures. Any of the programming languages we use get parsed into a tree structure when we pass our source code to the computer (for compilation, interpretation or otherwise). An AST (abstract syntax tree) is one such tree structure. Let's cook up an example; a conditional statement that performs one of two operations based on a test:

This could be written in C++ as:

if (a == 0) zero(); else more();

which is just another way of representing the exact same tree as above.

But wait... If that is just a tree... could we write the code i XML too? I'm glad you asked!

Now that clearly doesn't make the code more readable. I guess this explains why Bjarne didn't pick XML for his "C with Classes"... But how about if we chose a more compact representation, say, like S-expressions again?

(if (== a 0) (zero) (more))

How about that? Now both the C++ and the S-expression representations of the tree are a lot shorter than the XML representation. However, the C++ representation uses a rather intricate syntax; notice how a surrounding parenthesis is used to group the first argument to the if (the test), a semicolon is used to separate the second argument (the positive) from the else and the third argument (the negative). Finally, a semicolon is used to group this conditional expression from any expressions that may follow.

In comparison, the S-expression representation uses a surrounding parenthesis to group the expression - this replaces the ending semicolon from the C++ example. But every argument to if is simply separated by a space; this is why the S-expression is considerably shorter than the C++ example (and completely consistent in its syntax). Anyway this is not about measuring lengths of expressions (or anything else), it is merely about giving examples of the various ways in which we represent the tree structures that we surround ourselves with.

Anyone remember PHP?

So back in the days with "Personal Home Page", the HTML you wrote could be extended in clever ways with actual code that would be executed by the server on which your web site ran. This was almost really clever; since HTML lends itself to extension (eXtensible Markup Language - it got that name for a reason) simply by adding your own elements, PHP offered the HTML author an elegant way to escape from the HTML (that would become the DOM tree in the users browser) into the world of in-server processing.

<body> <h1>Hello world</h1> <?php if ($a == 0) zero(); else more(); ?> </body>

Top points for elegant extension here - the contents of the <?php> element in the document is simply evaluated on the server. Beautiful. But... what for consistency? Both the to-be DOM data (the static HTML) and the PHP code are tree structures, and one is a sub-tree of the other. Why do we use separate representations?

Looking at our previous attempt at representing the simple conditional in XML, I think it is quite clear why Rasmus didn't pick XML representation for his language either. Let's try it anyway:

<body> <h1>Hello world</h1> <?php> <if> <==>a 0</==> <zero/> <more/> </if> </?php> </body>

Yuck. So given the choice, I think it is clear why they chose inconsistency over the alternative. But... What if the web wasn't HTML? What if... say... Tim had used S-expressions? Let's try:

(body (h1 Hello world) (?php (if (== a 0) (zero) (more))))

Well, be that as it may - it's probably a little late to change the web from XML to S-expressions. But anyway, this example is a useful primer for what comes next.

Oh... and in case you're thinking that the four ending parenthesis look scary, don't worry. Any decent editor will do parenthesis matching and any decent developer should be using a decent editor. Trust me when I tell you, that sequences of parenthesis is not a practical problem when developing using S-expressions; it looks like trouble I know, but in the real world it just isn't.

Escaping XML - JSON to the rescue?

The JavaScript community has long worked with XML by means of the tight DOM integration in the language. Sometimes they would work with textual XML, other times they would manipulate objects in the DOM directly. The XMLHTTPRequest method performs an HTTP request and provides, as the name implies, immediate access to an internal representation of the XML response received.

No developer should ever have to write XML; developers will produce and manipulate tree structures, and if needed those can be serialized to XML or parsed from XML; but there should never be a situation in which an actual person sits down and types actual XML. XML is just too inconvenient, too verbose and too silly, for a human to type.

It appears that the Javascript community discovered that rather than working with XML documents, it would be more convenient to write plain JavaScript and use the eval() function on the receiver side to parse this. Once again, I have to award top points for consistency - using only JavaScript rather than using XML as well (one language is usually better than two if you have any amount of code and people on your project). Obviously of course, it wasn't long before the security implications of executing eval() on code received from foreign systems was realized. Today, the valid subset of JavaScript that can is sent between systems is called JSON, and it is no longer parsed using eval(). So far so good.

Inside a pure JavaScript system, JSON thus makes some sense as a textual representation of a tree structure. But in the effort to avoid the security problems of arbitrary code execution from remote systems, the JSON syntax is a far stretch from full JavaScript - therefore, as with the C++ versus XML code sample, JSON suffers from the same deficiency which causes the limited cumbersome syntax to become very verbose for representing arbitrary trees. An example is in place.

Let's take the original C++ example;

if (a == 0) zero(); else more();

Now let us try rewriting this into JSON:

{"if": [ {"==": ["a", "0"]}, {"zero": []}, {"more": []} ]}

While shorter than the XML representation it's still a long way from the briefness of the original C++ or the even shorter S-expression. Again, beauty is in the eye of the beholder, I shall refrain from commenting on what the JSON looks like.

This example is not as contrived as it may seem on the surface. There are indeed real-world projects that serialize their languages into JSON - one such example is the ElasticSearch Query DSL.

In all brevity, JSON does not bring anything to the table that XML or S-expressions didn't do already. It is slightly shorter than XML but it's clearly not the shortest representation. You can't possibly argue that it is significantly prettier than the others (I'll go as far as to agree it's different). It's not more (or less) efficient; it requires a specialized parser like anything else. It was never structured the way it was for any of those reasons either - JSON today looks like it does, because it's a valid subset of JavaScript.

In other words; JSON was never intended to be brief. It was never intended to be efficient. It was never intended to be easily readable. It was never intended to have any other particular quality one might wish from a tree representation language; it is what it is, because it is a valid subset of JavaScript.

The story is the same with XML of course, with XML not being designed from the ground up but looking like it does because of its HTML predecessor. So don't take this as a knocking of JSON in particular - all I'm saying is, JSON is no better than XML for all intents and purposes - and therefore in my view it is useless. It simply doesn't bring anything to the table that wasn't there already.

So is everything just bad then?

I'm glad you asked! No, of course everything isn't just bad. And this post isn't meant to make it sound like I think all is bad. I am simply trying to make you think about trees and how we represent them in writing.

I have been working quite a bit with XML data exchanges lately, and I have been fortunate enough to be able to do much of that work in a language that is itself an S-expression; namely LISP. This has been an interesting experience that has brought back memories from the days of PHP, but with actual elegance and consistency. It's almost too good to be true - and it's definitely good enough to share.

As we established earlier, XML and S-expressions are two of the same - they are simply representations of tree structures. XML is nasty to write though, so before doing anything else, I built an XML parser and an XML generator; two procedures that would convert a string of XML into an S-expression, and convert an S-expression into a string of XML. Like this:

* (sexp->xml '("body" ("title" "Hello world") ("p" "Some text"))) => "<?xml version=\"1.0\" encoding=\"utf-8\"?> <body> <title> Hello world </title> <p> Some text </p> </body>  " * (xml->sexp "<body><title>Hello world</title> <p>Some text</p></body>") => ("body" ("title" "Hello world") ("p" "Some text"))

Now hang on - how's that for consistency? LISP is homoiconic; the language itself is represented in a basic data type of the language. Or the code looks like the data, to put it in another way. While perhaps confusing to the novice, it's inarguably consistent. With the ability to construct our tree structure and then later convert it to a string of XML, this opens up for some very convenient XML processing.

Consider, for example, an API endpoint handler that must return an S-expression which the HTTP server then converts to XML:

(defun api-get-status () `("status" ("version" ,*version*) ("api-time" ,(universal->iso8601)) ("db-time" ,(car (query "SELECT now()" :flatp t))))

Is that sweet or what? Executing this code we get:

* (api-get-status) => ("status" ("version" "unknown") ("api-time" "2016-08-17T15:24:41Z") ("db-time" "2016-08-17T15:24:41.049531Z"))

Ultimately, when the HTTPd executes the api handler code it will also invoke the S-expression to XML conversion. So what is executed is of course more like:

* (sexp->xml (api-get-status)) => "<?xml version=\"1.0\" encoding=\"utf-8\"?> <status> <version>unknown</version> <api-time>2016-08-17T15:28:51Z</api-time> <db-time>2016-08-17T15:28:51.774258Z</db-time> </status> "

Notice how my static document data (the status outer document and the version, api-time and db-time subdocuments structure) are written and how function calls and variables are trivially interwoven (by means of the comma operator). Building up large and complex tree structures from LISP is, in other words, a comparatively nice experience - and turning the S-expression to XML is trivial as already demonstrated.

Where do we go from here?

Where we can go with this exactly, I'm not sure. Doing XML API work in LISP is an absolute joy, that much I can say. The verbosity of S-expressions is minimal and I think the actual implementations are usually really elegant - having the language itself be an S-expression provides a level of consistency I am not used to from other languages.

I think it would be interesting to look into generating asm.js from LISP, as a way to efficiently deliver web applications but being able to write them in a more elegant language. Imagine that; having a server-side API server and the actual browser-side web application all developed in a language as elegant and mature as LISP. Certainly this would be a welcome contender to the current Node.js movement where we attempt to use the most inelegant language ever (hastily) conceived to run both the browser and the server.

Jakob Østergaard Hegelund