Updated 2015-10-06 18:59:21 by Lucretiel

Iterators

See Also  edit

Iterator Protocol
a generic collection traversal interface
NEM 2006-01-16: another take on how to enumerate the elements of a collection.

Description  edit

CLU's iterators correspond to what Python and other languages call generators. CLU iterators were modeled on Alphard's generators [1].

IPL-v generators and Lisp mapping functions are related to Alphard's generators.

Can someone explain the idea of iterator/generator here?

It seems to me that the idea is that rather than explicitly listing an entire range of values (that a variable can take within a loop, etc.) one lists the beginning and ending value of a range plus the incremental factor and the language generates the value as needed. How is this different than using a for loop though?

[Tocc Coram]: An iterator is richer than a mere index. An iterator carries the container reference and the element's position with it. You can use 'em without for loops. The iterator controls access to the element.

RS: So would the "loop variable" in [foreach] come closer to the iterator concept?

Todd Coram: Yes, but iterators are most useful when you have different container types (or classes). foreach iterates over lists. A useful iterator interface deals with strings, lists, trees, vectors, etc. You need OO, interfaces or inherent polymorphism to take full advantage of iterators. Since Tcl primarily deals with lists and arrays, there is no great need for such polymorphic operations. Right? Now, VFS gives us a (loose) approximate of stream iterators which are very useful abstractions...

Also, iterators/generators control what comes next in an iteration. foreach is predetermined: it has already been decided what element you will get on each iteration. If my container wants to return random values as I step through it, it can determine to do so at anytime. -- Todd Coram

Here is a contrived example of an Itcl based iterator.

Eric Boudaillier: I have begin to write a C extension that deals with iterator (or generator ?). This extension defines the iterate command, which works like foreach, but takes the description of the thing to iterate instead of list. This description is a list containing the type of the iterator and the arguments. At this time, four types are defined: numeric, list, string and channel. Here is a little example:
     Common Tcl construction                     Same using [iterate] command

  for {set i 0} {$i<=$max} {incr i} {        iterate i [list numeric 0 $max] {
     ...                                        ...
  }                                          }

  while {[gets $chan line] != -1} {          iterate line [list channel $chan] {
     ...                                        ...
  }                                          }

  foreach c [split $string {}] {             iterate c [list string $string] {
     ...                                        ...
  }                                          }

  foreach e [lreverse $list] {               iterate e [list list $list end 0] {
     ...                                        ...
  }                                          }

[iterate] provides a cleaner syntax, and can be faster and consume less memory. And also, you can combine different type of iterator. Moreover, some iterator can be infinite:
# returns the number of lines in $chan.
proc wc-l {chan} {
    iterate count [list numeric 1 {}] line [list channel $chan] {}
    return $count
}

RS: This looks like a great unification to me - however, if the list-bundling of the second argument could be given up, it would be even more pleasing to write and read:
iterate i    -numeric "0 $max"   {...}
iterate line -channel $chan      {...}
iterate c    -string $string     {...}
iterate e    -list $list         {...}
iterate rev  -list $list {end 0} {...}
iterate count -numeric 0 {} line -channel $chan {}

Granted that the last argument is the body, a variable number of control args could be easily processed. The dash before the type indicator might set it off more clearly. However, I'm not sure whether all iterations possible with for or foreach can be replaced with this construct.

AvL: surely not: foreach will be "nicer" for the easy cases, and for is much stronger: it allows you to modify the iteration-variable from the body - neither foreach, nor the proposed iterators would allow this. Python unfortunately lacks such a construct.

Todd Coram:

By supplying a type to the iteration, I think you lose a great deal of power that typeless iteration gives you. C++ style iteration is what make generic programming such a nice capability. Typelessness allows me to define generic functions that don't care about type. How about an iterator command that generates an iterator for the supplied type?
set it1 [iterator -string "hello world how are you?"]
set it2 [iterator -chan $chan1]
set it3 [iterator -list {{hello world} {how are you}}

The generator iterator would be a proc: So,
proc wc-w {it} {
    set count 1
    while {[$it] != {}} {
        incr count
    }
    return $count
}

wc-w $it1 # -- returns 5
wc-w $it2 # returns the number of words in the file
wc-w $it3 # -- returns 2 

someone: And the iterator command would be a perfect usage of Closures!

Todd Coram: See Iterator using Closures for an example of the above...

Eric Boudaillier: Iterator are not typed: just see the description list of iterators as a constructor. At the C level, I have a structure like Tcl_ObjType:
typedef struct {
    char                     *name;
    Tcl_IteratorCreateProc   *createProc;
    Tcl_IteratorFreeProc     *freeProc;
    Tcl_IteratorGetProc      *getProc;
    Tcl_IteratorNextProc     *nextProc;
    Tcl_IteratorIsFiniteProc *finiteProc;
} Tcl_IteratorType;

[iterator] just looks for the iterator type, and then call the iterator type functions. This implementation is simple, and make iterator lifetime only in [iterator] command. I have tried other implementation with a Tcl_Obj internal representation, but I did'nt know how to handle some cases:
set it1  [iterator string "Hello, World"]
set it2  [iterator channel $chan]
set it1copy $it1
set it2copy $it2
iterate c $it1 line $it2 {
}
# where is $it1copy ? at the start of the string ?
# at least, $it2copy is at the same position of $it2.

Also, [iterator] don't know the args to pass to Tcl_IteratorCreateProc, so this is difficult to break the description list, which can also take options:
iterate v [list numeric 0 9 -loop true] {e1 e2} [list list $list] {
    ...
}

But of course, all these lists can be created by a command.

Iterators are big in Ruby.

escargo: Generators are fundamental in Icon. Generators need not be finite; they will return a new value if they are resumed after suspending and returning a value.

It would seem to me that the construct is maybe nice or fun in certain ways, but when I measure it against imo important criteria:
 * do I need to type less
 * is it more insightfull
 * does it add strong language possibilities not easily or at all achievable otherwise
 * is it more efficient
 * is there some esteatical or alternatively some computer structure related reason for it

then I don't think it would score all too high neccessarily, which of course maybe is less important than that is sais something important in the real world, though I would say that is dangerous territory.

About on of the examples/remarks, the foreach tcl command is not pre-determined, and fun enough to play around with:
set l {1 2 3}
foreach i $l {puts "$i  $l"; if {[llength $l] < 5} {lappend l 5}; set l [lreplace $l 3 3 [expr [lindex $l 3]+1] ]}

Gives:
1  1 2 3
2  1 2 3 6
3  1 2 3 7 5

I guess a informatics professor wouldn't like this...

NEM: Not sure who wrote the above, but it is worth pointing out that the foreach command is pre-determined in that it captures the values of $l when it starts and doesn't reflect any subsequent changes to that variable. This should be the same for any Tcl command as $-substitution gets done before the command is called, and Tcl is pass-by-value rather than pass-by-reference. You can, of course, update l inside the loop as it is a mutable variable, but note that you still only get 3 iterations through foreach, regardless of the new value of l.

NEM 2005-09-20: Iterators (or streams) are certainly nice things to have. When I first saw EB's suggestion for an iterate command, I thought it was a nice higher-order function, where the [list numeric 0 $max] etc were actual command fragments that generate the next iteration. However, this doesn't seem to be the case. So, here is a version of the iterate command that does just use normal Tcl commands for the iterators. Each iterator is a command that accepts subcommands (i.e. like an object or an ensemble). The subcommands for this interface are "empty" (returns true if there are no more elements) and "next" which returns the next value and a new iterator. This last requirement may sound a little strange, but it allows to have purely-functional, persistent iterators that can be re-used as many times as you need. To demonstrate, here is the iterate command:
proc iterate {varname stream body} {
    upvar 1 $varname var
    while {![eval $stream empty]} {
        lassign [eval $stream next] var stream
        uplevel 1 $body
    }
    return $stream
}

This is straight-forward enough. Now let's define some iterators. First, a simple one for lists:
proc literator {list method} {
    switch $method {
        empty   { expr {[llength $list] == 0} }
        next    { list [head $list] [list literator [tail $list]] }
    }
}
proc head list { lindex $list 0 }
proc tail list { lrange $list 1 end }

Now, we can try this out:
set l {literator {1 2 3 4 5}}
iterate x $l { puts "x = $x" }

This produces the expected output and returns an iterator to the empty list. Notice that this all looks very much like an ordinary stateful iterator. However, we can still re-use the original iterator or any of the intermediate ones, as there are no side-effects. In particular, you could pass the same iterator to a bunch of different procs without them interfering with each other. However, we don't have to limit ourselves to just side-effect free iterators. For instance, here I can wrap a channel as an iterator:
proc citerator {chan method} {
    switch $method {
        empty   { eof $chan }
        next    { list [gets $chan] [list citerator $chan] }
        close   { close $chan }
    }
}

And I can use that to iterate through lines of a file:
set chan [list citerator [open $file]]
iterate line $chan { puts "line = $line" }
eval $chan close

That iterator is, of course, not persistent -- once you have used an iterator it is not generally possible to restore it to its previous state (although you might be able to try with tell and seek). Finally, here is an iterator which can generate infinite sequences, based on the "iterate" function of Haskell (which, along with monads, inspired this implementation):
proc fiterator {func init method} {
    switch $method {
        empty   { return 0 }
        next    { list $init [list fiterator $func [eval $func $init]] }
    }
}

Which I will use with a "take" command which itself returns an iterator to the first n items in another iterator:
proc take {n iter method} {
    switch $method {
        empty   { expr {$n == 0} }
        next    { lassign [eval $iter next] next iter
                            list $next [list take [expr {$n-1}] $iter] }
    }
}

And some uses:
proc succ x { incr x }
set nats [list fiterator succ 1]
# First 10 natural numbers:
iterate n [list take 10 $nats] { puts "n = $n" }

Well, that's just some food for thought. You don't need state or closures to implement iterators, but a little bit of sugar goes a long way. In addition, stateless iterators are persistent pure values so you can use them over and over again. Also, as pure values, you don't have to worry about destroying them as they are reference counted by Tcl like every other value (and like other TOOT types, which are also very similar to these iterator type-commands). Add a bit of memoizing and you get some pretty useful lazy structures. A next step might be to define some useful iterator combinators which can combine different iterators in interesting ways, e.g. a lazy append can be defined:
proc iter_append {it1 it2 method} {
    switch $method {
        empty   { expr {[eval $it1 empty] && [eval $it2 empty]} }
        next    { 
            if {[eval $it1 empty]} {
                return  [eval $it2 next]
            } else {
                lassign [eval $it1 next] next it1
                return [list $next [list iter_append $it1 $it2]]
            }
        }
    }
}
set a [list literator {1 2 3}]
set b [list literator {4 5 6}]
set both [list iter_append $a $b]
iterate x $both { puts "x = $x" }

Which prints the values 1 to 6, but only calculates them as needed. I'll leave other combinators (such as other list operators) as an exercise. For the really adventurous, look into Haskell's List monad, which should be fairly easily implemented on top of this, giving a new framework for non-deterministic computations. That would probably simplify and speed up Parser Combinators (also based on monads) quite a bit.

NEM 2005-09-20: A conversation with Joe English on the Tcl chatroom revealed that the implementation of iterators I describe above is actually pretty near to the implementation of anamorphisms or unfolds, as described in the paper Functional Programming with Bananas, Lenses, Envelopes, and Barbed Wire (Meijer, Fokkinga, Paterson [2]). Basically, where a fold (or catamorphism, to give it its posh name) takes a list and applies a binary operator to successive pairs of the list (thus reducing, or folding, the list to a single value), an unfold takes a value and a function to generate a next value and produces a list (or sequence, more generally). To demonstrate here is an unfold function implemented in Tcl:
proc unfold {done func next seed} {
    if {[apply $done $seed]} {
        return [list]
    } else {
        return [list [apply $func $seed] \
                     [list unfold $done $func $next [apply $next $seed]]]
    }
}

Here, the unfold function actually takes 4 arguments: a "done" function that tests the seed value to see if it is the last in the sequence (equivalent to our "empty" method in the above code); an arbitrary function which is applied to seed to generate the actual next element; a function to generate the next seed; and finally the initial seed itself. This is a generalisation of our iterator code, as it distinguishes between seeds used for iteration, and the actual elements of the resulting sequence. The first part of the code checks the "done" function to see if this is the last element, and if so returns an empty list. If not, then we apply the function to the seed to create this element ([apply $func $seed]) and then call ourselves recursively to generate the next element, using the "next" function to generate the next seed ([unfold $done $func $next [apply $next $seed]]). However, rather than immediately execute this recursive call, we instead quote it (using list) and return the two bundled up together; creating a lazy list in effect (Haskell does this by default). We then need a new "take" function that forces the first n elements of the lazy list to be evaluated and returned, which works by grabbing the head of the list and then evaluating the tail of the list:
proc take {n iter} {
    for {set i 0} {$i < $n} {incr i} {
        lappend ret [lindex $iter 0]
        set iter [eval [lindex $iter 1]]
    }
    return $ret
}

This delayed evaluation of elements of the list also has the nice property of converting our recursive definition of unfold into a simple iteration without having to do anything too fancy. We can now use this unfold to produce our original iterators again in a new form. First, we need one last helper that applies a function with proper list quoting:
proc apply {func args} {
    uplevel #0 $func $args
}

Now we can define our iterators. Starting with the natural numbers:
proc false x { return 0  } ;# never done, i.e. infinite sequence
proc id    x { return $x } ;# identity function, i.e. seed and element are identical
proc succ  x { incr x    } ;# next seed is increment of previous
set  nats [unfold false id succ 1]
take 10 $nats ;# Displays 1 .. 10

Here is a version that returns all the lines from a channel as a lazy list:
proc lines chan { unfold eof gets id $chan }
# e.g. take 20 [lines [open $file]]

Great stuff!

Lucretiel (2015-10-06): I took this iterator concept and ran with it, as I needed something like it for a project that requires Tcl 8.0 support. The major difference is that, instead of requiring a trailing "method" argument (or more abstractly a mechanism for detecting if an iterator is complete without advancing it) the iterator-list is simply evaluated in place, and returns either an empty string for "no more data" or a pair consisting of the element and next iterator:
# Step an iterator. Return the next value, and assign the NEW iterator to the variable name. Return with a break code if there is no next value.
proc step {itername} {
    upvar 1 $itername iter
    set step_result [eval $iter]
    if {$step_result == {}} {
        set iter {}
        return -code break
    } else {
        set iter [lindex $step_result 1]
        return [lindex $step_result 0]
    }
}

proc iterate {varname iter block} {
    upvar 1 $varname var
    while 1 {
        set var [step iter]
        
        # Correctly handle break/continue/return from the block
        switch [set code [catch {uplevel 1 $block} result]] {
            0 {}
            3 break
            4 continue
            default {return -code $code $result
        }
    }
}

Here are some basic iterators- infinite count, list and file iteration:
proc count {{start 0} {step 0}} {
    return [list $start [list count [expr {$start + $step}] $step]]
}

proc iterlist {my_list} {
    if {[llength $my_list] > 0} {
        return [list [lindex $my_list 0] [list iterlist [lrange $my_list 1 end]]]
    } else {
        return {}
    }
}

proc iterfile {handle} {
    if {[gets $handle line] != -1} {
        return [list $line [list iterfile $handle]]
    } else {
        return {}
    }
}

And some helper iterators- chain iterators together, python-style enumerate, shorten an iterator:
proc enumerate {iter {n 0}} {
    while 1 {
        return [list [list $n [step iter]] [list enumerate $iter [incr n]]]
    }
    return {}
}

proc chain {args} {
    while {[llength $args] > 0} {
        set iter [lindex $args 0]
        set args [lrange $args 1 end]

        while 1 {
            return [list [step iter] [concat [list chain $iter] $args]]
        }
    }
    return {}
}

proc take {n iter} {
    while {$n > 0} {
        return [list [step iter] [list take [incr n -1] $iter]]
    }
    return {}
}

Here's a simple example:
set iter {enumerate {chain {iterlist {a b c d}} {take 4 {count}}}}

iterate var $iter {
    lassign $var i element
    puts "$i: $element"
}

Output:
0: a
1: b
2: c
3: d
4: 0
5: 1
6: 2
7: 3