Programming Language Syntax

If you know a little bit about how compilers work, you know that the syntax of programming languages is context free. That is to say that each syntactic element of the language can be described as a list of sub-elements, regardless of what context it appears in. For example, a while-loop in C# is (roughly) the keyword ‘while’, followed by an expression, followed by a statement (or block of statements), and it doesn’t matter where the loop appears, that syntactic definition is always the same. This is basically the idea of a context free grammar (CFG).

Natural languages (i.e. human languages) are not context free. It’s impossible to come up with a (concise) list of CFG rules, such as “a sentence is a noun phrase, followed by a verb, followed by a noun phrase; and a noun phrase is an article (a ‘determiner’ in the biz) followed by a noun” to describe English, for instance. That will work for simple sentences like “a man walked a dog”, but not for sentences like “which dog do you think a man walked?”

Now, this raises the question of why we don’t program in languages that closely resemble natural languages, in terms of syntactic structure? Wouldn’t that make it easier to program? There’s a good reason why we don’t do that, actually: No one knows what the syntax of natural languages looks like. Try as we might, natural languages are still beyond our understanding.

The reason I’m writing this is that I just came back from a symposium in honor of one of my professors (I study the syntax of natural language, by the way) who invented Tree Adjoining Grammar. TAG is a type of syntactic formalism that can actually be used to describe English fairly well — in the way that CFGs don’t even come close. At a very high level, TAG adds to CFG the ability to splice together two units of structure. I was wondering whether a TAG-based programming language syntax would let us program with new types of syntactic sugar, although I think the answer is that nothing interesting would come out of it.