AMG's language ideas

AMG: Many years of thought lead to the development of Brush. The following is historical, and I'm tempted to remove it.

AMG: As you might imagine, I like Tcl quite a lot. Most of all I appreciate how "stupid" it is.

Heh, let me explain what I mean by that. Unlike modern, enlightened languages, it doesn't have preconceived notions about mathematics, I/O, control structures, definition of data structures, object orientation, or graphical user interfaces. It's too stupid. But that's to its credit, because the reality is that all computer programming languages are too stupid--- too stupid to know what I need when it comes to OO, GUIs, or structs. And since Tcl recognizes this, it doesn't get in my way. Sure, something like C++ or Java may suit the needs of many, but they have the side effect of shaping and constraining the programmer's thought processes.

With Tcl, it's the other way around. :^) The language is stupid, and the programmer is smart. The programmer gets to shape the language. What a concept...

Now of course Tcl comes with a standard set of commands that implement most of the above, but (1) they're optional, (2) they can be replaced, (3) they're not part of the language proper, and (4) in most cases they can be redefined within the language. Want a Java without classes or exceptions? Tough.

But I have a few complaints about using Tcl. They're not really problems with Tcl itself, just things I wish I had. Heh, this is me renege'ing on my stated desire to have the language be as simple as possible. But not really; I'm most interested in maximizing programmer freedom.

NEM 26 Aug 2006: I have a few comments, which I've mixed into the flow of text below, marked by my name. I agree with quite a lot of what you've written here. A while ago I wrote up some notes on how I'd like Tcl 9 to look [1]. My thoughts have moved on since then, but the core ideas remain similar.

AMG: Have I influenced you? I had hoped some of my ideas were new, or at least new to the Tcl crew, but it's amazing how similar your paper is to my article. I wonder why we were thinking along the same lines.

I haven't finished reading it, though. It's midnight, after all, and tomorrow morning I'll be putting rafters on a new carport we're building. Need rest...

AMG: Note that as I refine my ideas (i.e. change my mind), I'll move stuff around on this page. Sorry if I have to delete your comment. Check the page history if you wish to see previous revisions.

Let's start by identifying things which I think every practical language needs.

State

Even an interpreter-less language like execline supports state, albeit with the aid of the filesystem. lisp without defun or set would literally only support one-line programs; that is to say, each line may as well be a separate program.

Side effects

With state comes the need to modify state, and that means side effects, or a command doing more than merely returning a value. Of course, side effects should be documented, minimized, and isolated from functional behavior.

Named objects

The most important side effects are creation and modification of objects in the interpreter state. These object must have names in order to be referenced from one line to the next.

Namespaces

Names must be unique, so they occupy slots in a namespace. But these days it would be unreasonable to require that even temporary variables be globally unique, especially when those temporary variables are used within recursive functions. Therefore it's important to have multiple namespaces, for instance one per each frame on the call stack. And if there is to be a global namespace, it's necessary to support accessing objects in namespaces other than local.

Functions and procedures

(To me, a function has no side effects and a procedure does.) Behaviors need to be packaged and named in order to be used more than once. More than that, it's necessary to have a basis on which new behaviors can be implemented. Just try doing anything with an interp with no commands!

Extensibility

It's not possible to define everything using the language itself. A minimal set of functionality needs to be "bootstrapped" using a lower-level language then exposed to the interpreter. Gotta have a basis for recursion. Even machine code can be extended through new circuitry, which is a language expressed in copper and silicon.

If it's possible to do without any of the above and still be able to do more than just 2+2, I'd like to hear about it.

NEM: You don't need state or side-effects. The lambda calculus for instance is entirely stateless, yet is Turing complete. Side-effects are needed if you want to interact with the world, though. However, Haskell even manages to handle this in a clever way using monads: instead of directly performing side-effects you construct an imperative program. This separates things into two parts:

A pure (side-effect free) functional program that constructs the monad;
An imperative program that when run will perform the side-effects.

This way all the side-effects are confined in the constructed program (monad) and most of your code can remain (provably) side-effect free. It's a very interesting approach, but I'd still probably just put state directly into the language (in a manner similar to ML).

AMG: That reminds me of a cheat I sometimes do with Makefiles. The Makefile has a foo.mk target which depends on some kind of configuration file (for example). foo.mk is built by some inline shell script, external program, or combination of the two that reads the configuration file and outputs the data formatted for make (i.e. variable and target definitions). Then I say "include foo.mk" (surrounded by some mumbo jumbo that avoids rebuilding foo.mk only to immediately delete it when doing "make clean"). Before reading any more of the the Makefile, make checks foo.mk's dependencies, possibly (re-)generates it, and parses the resulting file as if hard-coded into the original Makefile.

To summarize, this lets me embed un-makelike processing in a Makefile. It works great, but boy is it weird.

Now for features I'd like to have in a language...

Tcl syntax and semantics

Words are strings of non-metacharacters. Words are separated by whitespace. Metacharacters can be quoted to disable their special meaning. Command invocations are performed by giving the command name as the first word and any number of argments as subsequent words. Scripts are sequences of commands delimited by semicolons and newlines. Comments extend from an unquoted # given as the first word to a newline not preceded by an odd number of backslashes. Metacharacters can be used to perform substitutions, including variable and command substitutions. Metacharacters appearing in substitution results have no special meaning. Individual words can be split into multiple words using a special directive. Variable expansion can never result in the variable being modified. Shining ideal: "everything is a string."

Easy list expansion

I think {*} is acceptable syntax for Tcl, but a new Tcl-like language is free to use something shorter. I'll go with backtick since that seems to be a relatively popular option in DISCUSSION: Argument Expansion Syntax. Since it's easy to type, maybe it'll be used more freely, and no one will ask for implicit expansion (e.g., [lindex]). It is a bit hard to see, but it's used so very infrequently that stealing it as a metacharacter won't hurt anyone. Think of it as half of a backslash. :^)

(I listed this as sugar before, but it's not sugar after all. It's integral to the language.)

Lambda type

In addition to Tcl's standard string, list, dict, and script types, I'd like a lambda type. Maybe call it a procedure type. Either way, it looks like a two-element list. The first element is a list of parameter names, and the second element is a script to be executed in a new call frame with the arguments appearing as locals with names drawn from the parameter list. This is much like the second argument to [proc], but the details can vary.

Larry Smith Simpler is to simply assign an executable list to a variable, and replace the need to carry around a separate arg list with something like init or simply using $1, $2, etc the way shells often do.

  set double { return [ expr $1+$1 ] }

  set double { init x; return [ expr x+x ] }

Assuming [2] for the above.

Perhaps I might want an optional list of static variables, transforming lambdas into closures. Or I could go even farther and have a procedure-in-execution type which is returned by [yield] or made available to an [interp] resource limit exception handler. It would contain the instruction pointer and all local variables and could be resumed at any time, possibly multiple times, or maybe even after being read in from disk. And maybe it could also include a value in a known index or key passed to [yield], allowing for the possibility of generators. Exciting!

Lambdas stored in variables would serve as named procedures. Give the procedure (i.e. variable) name as the first word of the line, and the interpreter looks it up, gets the lambda, and executes it in a new stack frame. The new stack frame would of course contain all the passed arguments using names from the parameter list. And the return value would be passed back to the caller.

(I prevously had an [apply] command, but I realized it had no advantage over ordinary command lookup. [apply] is used for anonymous procedures, but in any practical situation, anonymous procedures are passed as an argument to another command, at which point they're no longer anonymous.)

NEM: Seems very sensible. If I was writing a Tcl 9 now, I'd replace the namespace element with a dictionary. That gives you the ability to create (immutable) closures (all elements in the dictionary become variables in the new scope). You can't do mutable statics/closures if you want to maintain value semantics for lambdas. Not all uses of lambdas are through variables. You want to allow things like: [[get-lambda...] args...]

Unified namespaces

Variables, functions, procedures, file handles, GUI windows, interpreters, etc. all contend for names within a single namespace. In Tcl, it's legal to have a proc and a variable share a name. I don't like this, because it means a different set of methods must be used to manage objects in each different namespace. Hence [unset], [destroy], [rename], [close], and many more.

Larry Smith These are all different ways of specifying a list of names, something that is fundamental to Tcl in the first place. ::a::b::c, .a.b.c etc should all be replaced with { a b c } /Larry Smith

One precondition for placing multiple things in the same namespace is that they must at some level be the same thing. Use the previous sentence as an example: both variables and procedures are things, or else I wouldn't have been able to write the sentence! Variables and procedures are both named objects. The name goes in the namespace, and the object is the name's referent.

NEM: Agree 100%.

AMG: Things get fun when objects don't have string representations that map bijectively to the "real" value, such that a conversion to a string and back won't hurt. Floats have this problem sometimes, but the error introduced is usually negligible. But what about, say, a command implemented in C? What would [puts]'ing it do? If it's written to a file, put up on a Web site, downloaded a week later, and read into a variable by another computer, will that variable still be usable as a command?

This comes to mind because it's a much more visible problem now. Tcl doesn't actually live by its "everything is a string" motto, unless "everything" is only the non-array variables. [array]s can be converted to and from [dict]s with [array get] and [array set]. [proc]s aren't strings, but they can be converted through multiple [info] inquiries, and this fails for C commands. [image]s aren't strings, but [image data] and [image put] can be used to bridge the gap. And who knows what's inside a channel? File name, open mode, seek position, configuration, buffer, etc.

Moving all these non-string things into the same namespace as variables means they need to be made to play by the same rules as variables. And the first two rules are that everything is a string.

Lars H: Unification always has pros and cons, which one should weigh against each others, but somehow the latter is very often forgotten. How much do variables, procs, channels, windows, etc. actually have in common, apart from the fact that one can make them go away? Very little, IMHO.

NEM: So what do you see as the cons? Variables, procs, channels, windows etc have at least the following in common:

They are all named things
They all need cleanup at destruction
They all support events/traces of some kind

Unifying these things to have consistent interfaces and naming seems a big win to me. What are the downsides? That you can't give a proc the same name as a variable?

Lars H: The typical disadvantage is that you have to do a lot of stretching, bending, and other types of violence to make the concepts fit your unified paradigm. The more violence you need to use, the more contorted does the end results tend to be. Attempting a unification is generally good, because it may suggest a better implementation of things, but one must also be prepared to give it up when things simply don't fit.

In your list above, 1 and 2 are merely the bare minimum that might suggest considering a unification, so you only have point 3 as an argument for why this is a natural unification -- all other aspects of variables, procs, channels, and windows (options, stacking order, stack frames, and whatsnot) would essentially just pile up, quite probably getting in the ways of each other. The latter is a typical con of this kind of change, but perhaps not one that is immediately visible.

Even point 3 is rather weak, because the differences are quite large. Events are processed when one enters the event loop, whereas traces are processed immediately. Unification of the two probably means you can only have one of these. Variable traces are somewhat similar to command traces, but quite different from execution traces, so which do you want? File events trigger simple commands, but window events are far more complicated (% substitution, bindtags, multiple bindings firing, etc.). Is that complexity good for file events? For the unification to be a good idea, it'd better be very good.

NEM: You could support both events and traces in the new system. The advantage is that each type of object wouldn't have to write its own machinery to do this. Simple bindings of the sort used by fileevent are all that needs to be supported directly. %-substitution IMHO would be better served by passing a dictionary of options. This is extensible, and you can also use [dict with] to get convenient access to the options. It also means that bindings can be byte-compiled procedures rather than strings. If you also allow events to be bound/generated on any named thing, then bindtags are trivial to implement.

AMG: Forbidding duplicate names is a pro. I don't like the current situation of having to mentally prefix every name with its type. For example, I have to keep track of "the procedure named list" versus "the argument variable named list", and this gets really fun: proc frob {list} {list $list $list}

Several types of objects are currently placed in the command/procedure namespace: C commands and Tcl procedures, Snit objects, Tk windows, images, and maybe more I've forgotten. These can all be deleted with rename $x "". So the current situation isn't quite a bad as we make it out to be.

The big con is that now we have to truly adhere to "everything is a string", to take all pertinent data about each type of object we have and publish it in a stringy format, e.g. nested dicts. Maybe I'm just rigid in my thought, but I'm having a hard time accepting that a Tk window is a string. Sure, a data structure is a string, a string is a string, a number is a string, even a procedure is a string. I'm confident about those, but I can do so only because the conversions between string and internal representation are already well understood. But it must be possible, because there are C data structures backing ecah of these new types, so that gives us a template for the string representation.

What about objects whose very existence has some kind of continuous effect on program execution? Strings, numbers, dicts, and procedures are fine because they only do something when called upon. Interpreters, events, threads, and windows all operate in a more subliminal, behind-the-scenes fashion. I guess their data can be kept in the object store with the possibility of conversion from and to strings, but to actually do anything they need to be registered or activated. But it would be the name that is registered, not the object, since in general modification is done on copies. If a window is created then modified using [set $win(-color) blue], it should turn blue. [set $win(-color) taupe] might trigger a trace that complains about not knowing what taupe is. But then comes [set $win SHINHATSUBAI!]... *boggle*. How should that be handled?

Exposing all this at the script level opens a semantical can of worms. It's probably possible to resolve it all, but it will take some work and may require compromises.

NEM: This isn't really a problem. Unifying the namespaces doesn't imply that everything has to have a string representation. Arrays don't have a string representation and yet they live in the variable namespace. Just throw an error if someone attempts to read the value of the name. BTW, I'd much rather move variables into the command namespace than the other way around. Variables as commands would work quite nicely:

 var foo 12
 foo <- 15
 foo trace add write ...
 puts "foo = [foo]"

You could even then drop $-syntax too...

AMG: Geting a variable's value and invoking it that value as a procedure are two distinct operations. In C, the expression foo evaluates to the address of the function foo, which can be said to be its value. foo(), on the other hand, evaluates to the return value of the function foo, which of course involves invocation. The Tcl analog would be $foo versus [foo]. Losing $foo would make it impossible to get the value (implementation) of a callable procedure without a separate [info]-type facility. It would also require an [apply] command for executing a procedure whose implementation is contained in a variable, but if we're trying to avoid having to give the variable name as the second argument to [set], why is it alright to give it as an argument to [apply]? We're just trading one thing for another. What is gained?

All that aside, I think it might be cool to have variables with custom accessors. Of course, we have that now, but as far as the language is concerned, they're not variables. Use traces instead, I guess.

NEM: There are many kinds of opaque resource which you definitely do not want to allow to be dereferenced (e.g., a remote database). This is even more critical when security is a concern (e.g., if you want to implement something like object-capabilities [3]). Given that $foo can be replaced entirely by [foo], but the reverse isn't true, then I think this is the correct way to do things. Also, how do you deal with lambdas if you don't have [apply]?

Transparent containers

Lists and dictionaries are cool, but they'd be cooler still if their contents could be accessed, manipulated, and deleted in-place without the need for special commands. To make this work, object names should include optional indexing suffixes. For instance, this would make it possible to read or change the second-to-last element of a list without [lindex] or [lset]. Ditto for the "foo" element of a dict, sans [dict get] and [dict set].

These suffixes can be mixed and matched to, e.g., access the third element of the "foo" element of the second-to-last element of a list. Also the index values should be subject to substitution rules, making it possible to do math or indirection without needing to fall back on the underlying commands. These things can get arbitrarily fancy.

The nth element of a list is named by appending {n} to the list variable name. n can be a nonnegative integer. To index relative to the end of the list rather than the beginning, use "end" or "end-n". Similarly, the key element of a dictionary is named by appending (key) to the dict variable name. Adding these suffixes results in new variable names that refer to elements within the list or dictionary.

It's valid to refer to a dict element that doesn't (yet) exist, since this is necessary for adding new elements. To do the same for lists, append "-new" or "+new" to the index to refer to a nonexistent element immediately before or immediately after the element with the given index. Reading from nonexistent elements is an error, same as for nonexistent variables. But writing to them creates them.

(I used to have list ranges and strided ranges, but I dropped them because they were quite complex--- I had to link to a man page just to explain the syntax! The same functionality can be had with [list range], and the semantics of assignment were arbitrary. About the only use for them was deleting multiple elements. Instead I added "-new" and "+new" to avoid needing [linsert] and [lappend]. Also I had called the syntax "sugar", but that's not really true.)

Examples:

 puts $argv{0}
 puts $argv{end-1}
 set matrix{1}{3} 99
 delete users(bob)
 set users($id)(avatar) $av
 set users($id)(ignore){end+new} bob
 set lines{0-new} $header

Lars H: This sounds like wish #72 on the Tcl 9.0 WishList.

Dictionary-like namespaces

Namespaces shouldn't be a separate type with names in a separate, ahem, namespace, as they are in Tcl. At heart, they're not much more than dicts. How about having a special variable named $local that expands to the local namespace? It would be formatted as a dict, a list with alternating key/value elements, where the key is the variable name and the value is the object. Going further, how about $global? Still more: $caller can be the namespace of the calling function, or equal to $global at the second level of the call stack. And another: $parent can be the parent namespace, like ::foo is the parent namespace of ::foo::bar in Tcl. (But I don't think $parent namespaces will necessarily have $caller elements because they may have been created at the top level.)

Namespaces wouldn't be equivalent to dictionaries, though. $local, $parent, $caller, and $global would all have to be in the local namespace (well, in every namespace), but they can't appear in the string representation to avoid infinite recursion. Any code that treats a dict as a namespace would have to pretend that local, parent, caller, and global are present.

Oh, and since namespaces are not 100% the same type as dicts, they can have a different "transparent container" access syntax, like how Tcl separates namespace components with ::'s. The syntax would be {parent namespace list variable_name}, so ${global argv} would be an example. $global(argv) should work too. Why have both syntaxes? The namespace syntax can be constructed using list operations.

The list contained in the braces is subject to substitution and expansion. To get Tcl-style ${foo bar} behavior, where the variable name actually contains a space, use '${{foo bar}}. A common, pratical use would be a procedure that needs to get a variable whose name is itself in a variable passed as an argument. That variable is named with respect to the caller's namespace, so "caller" needs to be prepended to the variable name. But the variable name might itself be a qualified list, so it needs expansion: ${caller `$name}

Each element in the path can have indexing suffixes. I don't see why it wouldn't be possible to have, say, a list of namespaces. The list itself cannot have indexing suffixes, or else it wouldn't be a valid list. If {foo bar} is XXX, ${foo bar}{0} expands to XXX{0}. Another example: if foo is XXX, ${foo}{0} is XXX{0}. This is important when simultaneously prepending a namespace and appending an index: ${caller ${name}{0}}. expands the list in $name, and the string {0} is appended to the last element.

Yeah, "dictionaries for namespaces" is a blatant Python rip-off. :^) [4]

(Previously I thought ${foo}{0} was equivalent to $foo{0}, but I changed my mind because of the desire to manipulate qualified names using list operations.)

NEM: My current thought is that I'd make all naming be done by dicts: namespaces, stack frames, etc. Thus all names would be immutable. I'd also have a single global reference heap (accessed via a ref/var command) for mutable references which support assignment and traces. Ideally they'd be GC'ed, but to do that efficiently implies a move away from everything is a string which would be a different language.

Non-local name resolution

If a name isn't found in the local namespace, search for it in each parent (not caller) namespace, then search in global. This facilitates cheap-o inheritance and possibly obviates [variable]. Per GPS's suggestion, I might prefer to check global before the parent namespaces, since this avoids having stuff like [set] break just because of where in the namespace hierarchy the code is being executed.

Instead of [upvar], one can explicitly look for the variables in the caller namespace. Looking in global subsumes [global]. [uplevel]'s behavior can implemented by a command that executes a block of code in a different namespace, then specifying caller or global as the other namespace.

Gotcha: when a variable is being written to, if it doesn't already exist, it will be created. This suppresses non-local name resolution. To work around this, explicitly give the namespace when writing to a variable that's not intended to be local. This is analogous to "this->" in C++ or "self." in Python and may be good programming practice even for reading variables.

Some applications may want more control over name resolution. Perhaps each namespace can have an [unknown] entry to catch this. But would it work only for command invocation or would it also do ordinary substitution and assignment? I suppose it can do both. It can do any internal processing, for example raising an error or looking in a different namespace, and then it would return the sought-after object, which might be a lambda. Question: would [unknown] be consulted after lookup fails in each namespace, or would the [unknown]s only come into play after resolution fails in all namespaces? Question: should there be a [known] (working title) to override lookup that would have otherwise succeeded?

NEM: By the first sentence, do you mean static/lexical scoping? I'd probably remove upvar and notion of the stack from Tcl, to facilitate things like tail-call optimisation. Instead, I'd adopt something like 3-Lisp's reflective procedures (which I'd put in a package called "meta"):

 meta proc bind {env next name value} {
     next [dict replace $env $name $value]
 }
 bind foo 12

Here the proc gets passed two extra parameters: $env is the environment dictionary of the caller, and $next is the continuation which receives an environment as its first parameter. This is an incredibly powerful construct which allows you to implement almost any possible language construct. (Search for a paper called something like "Reflection and Semantics in Lisp" by Brian Cantwell Smith for details). I'd put it in a separate "meta" package so that the package require documents that crazy meta-programming is coming up. I'd also put a [meta class] construct in this package for the built-in object system... :)

AMG: Static scoping. Remember, the caller namespaces aren't checked, only the local, parent, and global namespaces. A species of dynamic scoping can be done on a case-by-case basis by explicitly using the caller namespace.

I'll have to do some more thinking about tail call optimizations. Maybe I can lie a little about the stack by collapsing/reusing certain frames when they won't be needed again yet maintaining the chain of caller namespaces. Making a new stack frame should be cheap, though. I hope. :^)

Continuations are cool, too, and I can see why tail-calling into them would be important. Don't want an iterative process to eat up stack!

Possibly bad ideas:

Ensemble commands

Ensemble commands could be implemented by nested namespaces (or dictionaries, if you prefer) containing lambdas at the leaf nodes. Command name lookup could be extended to first look inside namespaces, borrowing arguments as subcommand (i.e. child namespace) names. But there's a fundamental problem here. A lambda looks like a single-element dictionary with the parameter list as the key. Well, I guess the lookup could back off one level when it discovers that what it thought was the value is actually a script. Except that the script might also look like a valid dictionary/namespace. Hmm. Hmmmmm.

It's not important and probably not a good idea to put this kind of lookup in the language itself, since ensemble commands can be implemented by making a dispatcher procedure which contains implementations for all the subcommands as arms of a [switch] statement or local variables to be invoked. An "ensemble factory" shouldn't be difficult to make. However, this may slow dispatch which might be a problem for some object systems.

Variable binding

Tcl's [upvar] command can be used to create an alias for a variable, and [interp alias] is good for making alternate ways of calling a command. I'd like similar functionality, but since I unified namespaces I'll have to do it all with one command: [bind] (or maybe [var bind]). It takes an existing name and a new name to create. Both names will refer to the same object, such that after using one name to change the object, the change will be reflected when accessing the object with the other name. This is distinct from setting one variable equal to another (set a $b), wherein both names refer to the same object only until one of them is used to change the object, after they point to different objects. Deleting a bound name simply decrements the object's reference count.

Each namespace's "local" element is linked to that namespace again. Similar goes for parent, caller, and global. But these elements can't appear in the namespace's string representation because of the infinite loop. Binding in general makes this problem possible.

How can a namespace's string representation show binding? It must be possible to keep the links even through converting to and from a string, as may happen if writing the interpreter state to disk then reading it back later. I don't know how to do that, though.

I think I can survive without [bind]. It leads to infinite loops and it breaks garbage collection. Code with [bind]:

 proc with_bind {var_name value} {
     bind (caller `$var_name) var
     set var $($var * $value)
 }

Code without:

 proc without_bind {var_name value} {
     set var_name (caller `$var_name)
     set $var_name $(${`$var_name} * $value)
 }

It's not so bad, and it makes it clear that indirection is taking place. For this same reason, some C++ programmers prefer pointers over references.

Hmm, that ${`$var_name} is a little funky. Maybe each element needs to be automatically expanded. But then again, that breaks ${{name with space in it}}. So let's not. Backticks are cheap. :^)

Futures

I'd like for expression evaluation to be put off as long as possible. Doing set x $(2+2) shouldn't trigger an immediate evaluation of 2+2; instead, $x should be set to an expression object whose string representation, when calculated, will be 4.

I don't know if this would offer a performance increase. It gives some opportunity to detect and cache common subexpressions. Maybe that's worth something.

NEM: Futures are cool for concurrency as well as lazy evaluation. See e.g., Alice ML's various types of future: [5]. I'd love to have that in Tcl, but I'm not sure what the best form would be. Colin McCormack did an implementation of lazy futures, which seems about right.

I want so much syntax sugar that I'll list it as a separate category. Hyperglycemia!!

Basic $variable substitution

Even in Tcl, this is syntax sugar for [set], so I here list it under this category. Since I'm considering calling the "set" command [var get] and [var set], it's good to have an abbreviation that's usable in the vast majority of cases, but I'll just say [set] for the rest of this article. Example: $hello

Math $(2+2) substitution

[expr] is needed very often, but it's a bit much to type out. Compare $(2+2) with [expr {2+2}]. Three characters of "overhead" versus nine! That's quite a lot. Many are tempted to drop the braces, and some don't even know that they're important. To avoid the potential security hole, I suggest providing and recommending a shorthand that does the right thing in the majority of cases. Beginners will stick to $(2+2) because it's easy to type, shows up early in the tutorial, and is very common in example code published online. More experienced programmers will use [expr] only for the case of variables containing math expressions rather than simple numeric values. I think it's good language design to make the best practice coincide with the path of least resistance.

(List construction)

Tcl's [list] command is underutilized by new programmers, who instead tend to construct lists by hand using quoted strings. This can be disastrous when substitution is thrown in the mix. But it does have the advantages of not needing a leading word ("list") and not requiring that newlines and semicolons be backslashed, so it's seductive even to experienced programmers, especially when automatic expansion is desired and {*} is so cumbersome to type.

To solve all these problems, I suggest list construction using parentheses. (foo bar $quux) would be equivalent to Tcl's [list foo bar $quux], with the added bonus that such a list can span multiple lines without backslashes. Also semicolons would not need backslashes, so it can easily be used to produce a script with multiple commands. (Wait, is this business with semicolons a good idea?)

To get expansion, just use . This means list concatenation is accomplished through ($a $b). Selective expansion is sometimes needed, as in the case of $a being a program name and $b being a list of arguments, yet programmers often just say [exec $a $b] and hope for the best. Now this can be had through ($a $b).

Lastly, this frees up the word "list" so that it can be used as an ensemble command, so [llength] can become [list length], etc. An empty list is () or "" or {}, with the exception that the first is a pure list.

(This list is quite a bit shorter than it was before. I removed the transparent container and namespace qualification syntax sections, instead describing the syntax along with the concept.)

NEM: I quite like most of these, although I'm not sure I'd want that much sugar.

Lars H: The problem with all this sugar you want is that it destroys the very feature of Tcl that you began with: its "stupidity". The more sugar you put into the language, the "smarter" it will become, the less of it will be under your control to redefine as you see fit, and the more will the sugar confine your abilities as a programmer. It's certainly more mainstream, but if we thought that was good then we wouldn't be so fond of Tcl, would we?

AMG: Yeah, I know. I want it both ways. :^) I just presented a bunch of ideas to see which can coexist and which conflict. I actually had several more to put in this category but I threw them out because I saw that they conflicted syntactically. Now let's weed out those that conflict philosophically. :^) Which pieces of sugar force the language to go in a direction that some programmers might occasionally wish it didn't? I feel each is all defensible as worthwhile features to have in order to promote readability and good programming practice. They're not essential, but I would miss them if dropped.

Examples

First, let's make a procedure.

 set greet {{} {stdout write hello}}

Next, invoke it in a variety of ways.

 stdout write hello
 `[list get {{} {stdout write hello}} 1]
 `[list get $greet 1]
 `$greet{1}
 greet

If using [set] is cumbersome, make a procedure factory.

 set proc {{name args body} {set (caller `$name) ($args $body)}

You'll notice that the name "args" isn't special. It never is, the way I'd like for things to work. Instead, prefix a parameter (any one parameter!) with an *asterisk, and it'll receive any extra arguments as a list after all those before it and after it have been assigned. The asterisk isn't part of the parameter name.

caller needs to be prepended to the name variable in order for it to be created in the caller's namespace. To make this [proc] even more Tcl'ish, use "global" instead of "caller".

Now let's use it.

 proc greet {} {stdout write hello}

Heh, just like Tcl. Oh wait, that [stdout write] is unfamiliar. Let's fix:

 proc puts {msg} {stdout write $msg}
 proc greet {} {puts hello}

That looks like currying. Let's make a curry factory:

 proc curry {new_name old_name *old_args} {proc (caller `$new_name) {*args} ($old_name `$old_args {`$args})}

Heh, just to be cruel, let's do that without the aid of [proc]:

 set curry {{new_name old_name *old_args} {set (caller `$new_name) ({*args} ($old_name `$old_args {`$args}))}}

This can be broken onto multiple lines:

 set curry {{new_name old_name *old_args} {
     set (caller `$new_name) ({*args} (
         $old_name `$old_args {`$args}
     ))
 }}

Not so tough after all. Notice: (1) parentheses make this a lot easier to read than [list] would; (2) newlines don't need to be backslashed when using parentheses to construct lists; (3) the "args" (extra arguments) parameter doesn't actually need to be called "args"; (4) "caller" needs to be prepended to the new command name or else the command would be created in curry's local namespace; (5) expansion occurs in three places; and (6) args's substitution and expansion need to be quoted so that they won't occur until the new procedure is executed.

Now let's do some currying!

 curry puts stdout write
 proc greet {} {puts hello}
 greet

Hehehe, that's awesome. :^) [greet] itself can be seen as the result of a curry, except that it can't accept any additional arguments.

 curry puts stdout write
 curry greet puts hello
 greet

How about making a control structure? (As if [proc] didn't count, heh.)

 proc do {script noise cond} {
     if {$noise ni {while until}} {
         error "invalid syntax: should be \"do {script} while|until {cond}\""
     }

     if {$noise eq "until"} {
         set cond !($cond)
     }

     namespace caller (
         `$script ;
         while $cond $script
     )
 }

And use it:

 set x 1
 do {
    puts $x
    set x $($x * 2)
 } while {$x < 16}

There's probably something subtly wrong with that use of the semicolon, but I'm not sure. It just makes me a tad uneasy. Any ideas? Maybe it's fine...

I put a lot of ideas on this page. Surely many of them are bad. :^) But I may have a few winners, too. Discuss.

AMG: It is far too much to ask that everything have a string representation from which that original thing can be recreated. This works fine for strings, numbers, lists, dicts, scripts, and procedures. Beyond that, things get iffy. Reducing more complex types (e.g. namespaces, Tcl procedures) to these simpler types (dicts, lists) is helpful, but there's a limit. Take the case of an object ([button]) with methods ([invoke]) and data (-state) with accessors ([cget]/[configure]). How would that work? Exactly the way it currently does, I imagine. :^)

Rather than make a million exceptions, change the rule. Relax the doomed requirement. Everything should have a string representation, but that string representation need not contain all information necessary to reconstruct the object. When necessary, the object's data remains hidden behind an opaque pointer, and its string representation is nothing more than a key in a hash table from which that opaque pointer address can be calculated by type-specific commands.

How does this interact with objects that are themselves top-level commands? I guess it's possible for an entire procedure to be a hash key, but is this wise? Or maybe it's just a command, that is, a "procedure" not defined in script. And in that case, what would its string representation be? "0xb802012a", I guess, or maybe "cmd318". A name like that would be dreadful to use directly, so of course the user would bind it to a variable or place it in a list or dict. If it's a procedure defined in script (e.g., the product of an object system also defined in script), it can define whatever rules it likes for lookups, so the absurdly-long-hash-key concern doesn't necessarily apply.

Not much can be done to manage the data stored outside the realm of strings and variables, I'm afraid, but a common deletion command would go a long way. This command would unset variables, of course, but for other types a deletion handler must be registered. As nice as this sounds, though, it's fundamentally flawed.

 % set obj [foobar]
 cmd1
 % destroy $obj
 success!
 % set obj [foobar]
 cmd2
 % destroy cmd2
 success!
 % destroy obj
 success!
 % set obj [foobar]
 cmd3
 % set cmd3 hello
 hello
 % destroy cmd3
 uhh?

Too bad. Such is the cost of multiple namespaces.

Lars H: Everything that is classically computable [6] by definition has to have a string representation, because the output of a Turing machine is a string! (Quantum computation is another matter, but for that we have TIP#263 [7], and EIAS is not much of an issue there either.) Hence there can always be a string representation; you only have to be prepared to put in the effort necessary to define and produce it. It's not the case that exotic objects in well-written Tcl programs do a lot of shimmering -- most of the time they are created, live, get used, and die without ever even gaining a string representation.

NEM: Turing machines are not terribly relevant here. They ignore various forms of interaction, I/O, state changes and effects over time. These are all things that are very important in most programming, and are all awkward to represent as strings: what is the string rep of a socket channel, for instance?

That said, I don't understand AMG's comments very well. What's wrong with the destroy example? Many named things in Tcl are commands, and these can generally all be destroyed with rename $cmd {}. You can register deletion callbacks to free any associated state. Surely, in your last example the result would also be success -- the "cmd3" variable would be unset.

AMG: When I made the above comments, I didn't have time to carefully edit them until readable. Sorry. (To tell the truth, I don't have the time now, either, so...)

The problem in the destroy example is that objects and variables have separate namespaces (both a foobar and a variable share the name "cmd3") yet destroy takes no argument to specify which namespace should be searched for the name. When asked to destroy cmd1 and cmd2, it finds them in the object namespace and successfully performs the deletion. When asked to destroy obj, it finds it in the variable namespace. (I just added this.) When asked to destroy cmd3, it finds matches in two namespaces, and it has no way of knowing which the user intends. Anyway, adding that "which namespace" parameter to destroy would make it plain that unification between variables and objects is a lie. Their only commonalities are that they have names and that they can be destroyed. But if the name must always be qualified with a namespace in order for lookups to succeed, what's the point? There might as well be separate deletion commands for variables and objects, in which case the namespace is implied by which command is used. And at that point, the only naming unification will be between the different types of objects.

(By "object", here I mean something whose string representation is its name rather than its underlying data.)

Well, I need to get dressed and head to work.

AMG: I've given a great deal of thought to language design since writing the above. Someday I'll merge my musings into the above, add new ideas, refute what I've reconsidered, etc.

Category Language