A parser's monolog

Summary edit

Richard Suchenwirth 2000-11-16: Whitespace between parens, brackets, or braces doesn't matter much in real life, nor in some popular programming languages. In Tcl, it matters extremely (see also Is white space significant in Tcl). The parser has no rules for "keyword" syntax; it

groups according to braces and brackets;
breaks a script into commands by semicolons and newlines;
breaks a command into words by whitespace.
All commands are treated the same: the first word is the command, the others are its arguments.

Ariel Burbaickij wrote in comp.lang.tcl:

set a (b) [glob a*b] ;#(1)

is not the same as

set a (b)[glob a*b] ;#(2)

Would you be so kind as to explain me how Tcl parses two expressions presented above (set a variations) ?

Certainly, and I even add

set a(b) [glob a*b] ;#(3)

A Parser's Monolog edit

(1) "Aha, I have four words delimited by whitespace: set, a, (b), and something in brackets that I will evaluate first, so recurse:

(1a) "Aha, I have two words, glob and a*b. The first is always the command, so I call 'glob' with the argument a*b. It returns e.g. "afoob abarb", two files that matched the specified pattern. Back in recursion, where I splice that result in the position where the bracketed command stood:

(..1) The first is always the command, so I call 'set' with three arguments a, (b), "afoob abarb" (this is one word that contains a space). 'set' raises an error, where the message says

 'wrong # args: should be "set varName ?newValue?"'

(2) "Aha, I have three words: set, a, and '(b)[glob a*b]'. The third has something in brackets, so I first evaluate that and recurse (see 1a above)

(2..) So my third word is now "(b)afoob abarb". Call 'set' with these two arguments, it assigns the string value "(b)afoob abarb" to the variable a, and returns "(b)afoob abarb" to me too, just in case I need it."

(3) "Aha, I have three words: set, a(b), and [glob a*b] (continued as in (2) above..)

(3..) Now I call set with the arguments "a(b)" and "afoob abarb". The set command (not me!) detects the array syntax in a(b) and assigns the string value "afoob abarb" to the element b in array a, creating one if it does not exist, or erroring if a is a scalar variable."

"The set command and me would have shared work if the command would have been

set a(b) $a(c) ;#(4)

In this case, I see the dollar and know I shall substitute a variable. The parens tell me that it's element c in array a. So I retrieve its value (say grill) and take that as the third word, so call the set command with "a(b)" and "grill". Continue as in (3..) above)."

"And one more variation:

set $a(b) $a(c) ;#(5)

Perfectly valid Tcl again. For the second word, I retrieve the value of element b in array a (you may remember it is "grill" now), and the value of element c in array a (also "grill"). So I call set with the two arguments grill and grill. set (not me!) takes the first as a variable name, the second as a string value, so assigns the string "grill" to the variable grill which it creates if not existent."

Puzzled? But when you learn to think like the parser (the few rules on the Tcl manpage), Tcl really flies!

Another interesting observation:

A line can contain several words (and mostly does, obviously)
A word (in the Tcl parser's sense) can contain several lines (and also very often does, for instance proc bodies; a word can extend over several pages of code - for instance if standing behind namespace eval $name...)

With these two "bi-recursive" rules, a world of software can be built from minimal concepts...

Anybody care to write down what a C(++) parser might think?

Category internals

Category Syntax

Category Parsing

Arts and Crafts of Tcl-Tk Programming