Updated 2015-08-01 19:27:56 by dbohdan

Arjen Markus: There has been much discussion about references to data in Tcl, in order to build complex data structures and such. Inevitably, garbage collection pops up.

This page is meant to show that at some mundane level you can let Tcl do the job for you. The script below will create a counter "object" that keeps its internal state hidden. It juggles a bit with the counter object and then throws it away.

The essence: the internal state is stored via the interp alias mechanism that allows extra arguments and the counter itself is destroyed via the trace.
namespace eval Counter {
    variable nextid 0

proc makecounter {name initial} {
    upvar $name vname
    variable nextid

    set vname [namespace current]::$nextid
    uplevel [list trace variable $name u [
        list [namespace current]::deletecounter $vname]]

    interp alias {} $vname {} [
        namespace current]::increasecounter $vname $initial
    incr nextid
}

proc increasecounter {cmdname current} {
    set result [expr {$current+1}]
    interp alias {} $cmdname {} [
        namespace current]::increasecounter $cmdname $result
    return $result
}

proc deletecounter {aliasname counter dummy op} {
    interp alias {} $aliasname {}
    # puts "deleted: $counter"
}

} ;# End of namespace

#
# Create a counter
#
Counter::makecounter count1 0
# puts [trace vinfo count1]
puts "1? [$count1]"
puts "2? [$count1]"

#
# Copy the counter (not a reference, just a harmless alias)
#
set count2 $count1
puts "3? [$count2]"

#
# Deleting the alias has no effect
#
unset count2
puts "4? [$count1]"

#
# Deleting the true counter does!
#
set count3 $count1
unset count1
puts "5? [$count3]"

Result:
1? 1
2? 2
3? 3
4? 4
invalid command name "::Counter::0"
    while executing
"$count3"
    invoked from within
"puts "5? [$count3]"
"
    (file "counter.tcl" line 52)

The following "AI koan" (see The Jargon File for more) points out a fundamental difference between the Tcl and LISP approaches to when unused memory is reclaimed and the implications this has for what can be a value.

One day a student came to Moon and said: "I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons."

Moon patiently told the student the following story:
    "One day a student came to Moon and said: `I understand how to make a better garbage collector ...

(Editorial note: Pure reference-count garbage collectors have problems with circular structures that point to themselves. On the primitive level Tcl avoids that problem by the principle that everything is a string, since strings don't point to anything.)

Tcl has a built-in garbage collector on C level. See Tcl_Obj structure and functions like Tcl_IncrRefCount. So internally Tcl use objects and references. The main problem is that this approach assumes that:

  1. every structure (Tcl_Obj) has string representation
  2. every structure (Tcl_Obj) of certain type can be built from string

This works fine for list, numerics, and strings (string-based types). In most cases the better way is to implement your own special type in C-level (see keylset implementation in TclX) as to simulate it in Tcl (high level).

This does not work if you want to have handles (or something like objects in Tcl). Such handles have no real string representation (like Tk Windows names). In such cases one can use Tcl_Obj only to gain some speed, but garbage collection is not possible here. Handles in Tcl are just strings and you can build strings every time again.

If Tcl would have garbage collector for Tk:
 set win [button .mybutton]
 unset win
 # in line above garbage collector have destroyed .mybutton

If one what such thinks is it easy to implement it with Tcl_Obj.

I was pondering the http package; the need to call http::cleanup when done with a token and the potential for leaking memory just seems wrong. So I was thinking abuot a tcl-level garbage collector, and came up with the following. I suppose it's a mark & sweep collector of sorts, although it doesn't do any marking or recursive sweep.
proc gc-find pattern {
    set vars [info vars $pattern]
    set searchspace [uplevel info vars]
    foreach var $searchspace {
        if {[uplevel array exists $var]} {
            foreach {k v} [array get $var] {
                check-item $v vars
            }
        } else {
            check-item [uplevel set $var] vars
        }
    }
    return $vars
}

proc check-item {item vars} {
    upvar $vars vlist
    catch {
        foreach el $item {
            set s [lsearch -exact $vlist $el]
            if {$s > -1} {
                set vlist [lreplace $vlist $s $s]
            }
        }
    }
}

One would periodically call it as
foreach tok [gc-find {::http::[0-9]*}] {
    ::http::cleanup $tok
}

It assumes that any tokens will be either an individual item in a list, or a variable by itself, and it doesn't search namespaces other than the root.

RLE 2011-09-22: Has anyone considered that with 8.5+'s dict that the http package could return a result dict instead of a handle to an array? This would result in garbage collection happening automatically when the dict's reference count fell to zero.

AM 2007-12-17: In response to a thread on the comp.lang.tcl group, I experimented a bit with procedure traces. The idea I had was that the usual way of creating objects is to create a new procedure/command. If you want to create a local object, i.e. an object that should only exist during the life-time of a procedure, however, there is no way for Tcl to know that that is what you intended. So there is no way to actually remove it when the procedure returns.

Unless you help it a bit.

And that is what is done in the slightly silly script below:
# gc.tcl --
#     An experiment with garbage-collecting "objects"

# localobj --
#     Create a _local_ object
#
# Arguments:
#     name      Name of the object/command
#
# Result:
#     None
#
# Side effects:
#     Creates a new command and a trace on the _calling_
#     procedure if needed
#
proc localobj name {
    global local_objects

    # Create the object
    proc $name {cmd} {
        if {$cmd eq {run}} {
            puts "[lindex [info level 0] 0]: Boo!"
        } else {
            return -code error "Command unknown: $cmd"
        }
    }

    # Administration:
    # - Store the command for later GC
    # - Add a trace to the caller, if this was not done yet
    # (Take care of global objects though!)
    if {[info level] > 1} {
        set caller [lindex [info level -1] 0]
        if {![info exists local_objects($caller)]} {
            trace add execution $caller leave [list localobj_destroy $caller]
        }
        lappend local_objects($caller) $name
    }
}

# localobj_destroy --
#     Destroy the caller's local objects
#
# Arguments:
#     caller    Name of the caller
#     command   Original command (ignored)
#     code      Return code (ignored)
#     result    Result of the command (ignored)
#     ops       Operation (ignored)
#
# Result:
#     None
#
# Side effects:
#     Destroys all objects created in the caller procedure
#
proc localobj_destroy {caller command code result ops} {
    global local_objects

    foreach obj $local_objects($caller) {
        rename $obj {}
    }
    unset local_objects($caller)
}

# main --
#     Test this
#
proc myproc {} {
    localobj cow1

    puts Myproc
    cow1 run
    cow2 run
    myproc2
}
proc myproc2 {} {
    # localobj cow1 ;# Hm, would override the other one

    puts Myproc2
    cow1 run                ;# cow1 was created by the calling procedure - it is still available. This is a slight flaw ;)
    cow2 run                ;# cow2 was created as a _global_ object, is this a flaw?
}

localobj cow2

myproc

puts Main
cow1    ;# Now object "cow1" no longer exists, so we get an error message
cow2

KD: Wouldn't it be better to call localobj with the name of a local variable, in which the name of the object will then be stored? In this way, Tcl's inherent rules for destroying local variables can be used to destroy the object itself too:
proc localobj_destroy {name args} {
    puts "Destroying $name"
    rename $name {}
}

proc localobj &name {
    global handlecounter
    if {![info exists handlecounter]} {set handlecounter 0}
    upvar 1 ${&name} name
    set name handle#[incr handlecounter]
    puts "Creating variable ${&name} = proc $name"
    proc $name {args} {puts "executing: [info level 0]"}
    trace add variable name unset [list localobj_destroy $name]
}

#Testing
proc myproc2 {} {
    localobj foo
    $foo testlocal2
}
proc myproc1 {} {
    localobj foo
    $foo testlocal1
    myproc2
}
localobj foo   ;# this one is in fact global
$foo testglobal
myproc1

Result:
Creating variable foo = proc handle#1
executing: handle#1 testglobal
Creating variable foo = proc handle#2
executing: handle#2 testlocal1
Creating variable foo = proc handle#3
executing: handle#3 testlocal2
Destroying handle#3
Destroying handle#2

AM 2007-12-18: Some discussion on this solution was lost, due to a problem with the disks. However, consider the following fragment:
proc myproc {} {
    localobj foo
    $foo testlocal1
    set bar $foo
    unset foo
    $bar testlocal2
}

If I understand the code correctly, then this won't work as expected: unsetting foo will cause the associated object to disappear, leaving bar to pick up the pieces.

KD: Yes, that's right. By declaring localobj foo, you are signing a contract that the lifetime of the object is tied to the lifetime of the $foo. Usually that's also the lifetime of the procedure call, unless $foo is unset manually.

AM: Hm, it is not a perfect solution, but it does have its attractive points - mine was inspired by a partial/incorrect understanding of incr Tcl. Your solution restricts objects to the procedure that created them (unless you pass them to a called procedure). Good :)

DKF: One solution is to give objects a method that instructs them to "return themselves to the caller", which gives them a chance to manage their reference counting/traces. Another is to allow creators of an object to specify what variable to couple the lifetime to, which allows for management via upvar; NAP does this via the as method IIRC.

Reference  edit

Everybody thinks about garbage collection the wrong way, Raymond Chen, 2010-08-09
Garbage collection is simulating a computer with an infinite amount of memory

See Also  edit

Arts and Crafts of Tcl-Tk programming
Reference counted command objects
linked lists
Tcl and LISP
complex data structures
event processing & garbage collection in Tcl and Tk
RJM: Above, the question was raised regarding garbage collection in Tk. I myself have recently had practical experiences with garbage collection in Tcl/Tk related to timing. Here a new page is referenced with the purpose to document
Reference counting vs. tracing garbage collection