Updated 2018-03-18 14:33:23 by pooryorick

dict for iterates through the items in a dictionary.

Synopsis  edit

dict for keyvalnames dictionary body

Description  edit

Each successive key in dictionary is stored in the variable named by the first item in keyvalnames, its value is stored in the variable named by the second item, and 'body'' is evaluated as a script. After all keys have been processed, the empty string is returned. If the result of an evaluation is break, the command ceases to iterate further and returns. It the result of an evaluation is continue, the iteration continues. The order of iteration is the order in which the keys were inserted into the dictionary.

In contrast with foreach, all but the last key-value pair for any redundant keys are ignored.
# Data for one employee
dict set employeeInfo 12345-A forenames Joe
dict set employeeInfo 12345-A surname   Schmoe
dict set employeeInfo 12345-A street {147 Short Street}
dict set employeeInfo 12345-A city   Springfield
dict set employeeInfo 12345-A phone  555-1234
# Data for another employee
dict set employeeInfo 98372-J forenames Anne
dict set employeeInfo 98372-J surname   Other
dict set employeeInfo 98372-J street {32995 Oakdale Way}
dict set employeeInfo 98372-J city   Springfield
dict set employeeInfo 98372-J phone  555-8765
# The above data probably ought to come from a database...

# Print out some employee info
set i 0
puts "There are [dict size $employeeInfo] employees"
dict for {id info} $employeeInfo {
    puts "Employee #[incr i]: $id"
    dict with info {
        puts "   Name: $forenames $surname"
        puts "   Address: $street, $city"
        puts "   Telephone: $phone"
    }
}
# Another way to iterate and pick out names...
foreach id [dict keys $employeeInfo] {
    puts "Hello, [dict get $employeeInfo $id forenames]!"
}

dict for vs foreach  edit

AMG: [dict for] and [foreach] behave a little differently when applied to a key/value list with duplicate keys:
% set mydata {a 1 a 2 b 3 b 4}
a 1 a 2 b 3 b 4
% foreach {k v} $mydata {puts $k=$v}
a=1
a=2
b=3
b=4
% dict for {k v} $mydata {puts $k=$v}
a=2
b=4

[dict for] converts the list to a dict. In the process it discards all but the last instance of each key. This doesn't mean that data is gone for good; it's still present in the string representation, so it'll reappear when the data is later used as a list or string. Amazingly, this is the case even for pure lists (lists with no string representation): the conversion to dict forces the generation of a string representation when there are duplicate keys.
DKF: That's the only way to make the semantics actually function as documented. When there are no duplicate keys, lists can go directly to dicts without losing information, but when there are duplicates, the only way to stop the Tcl value (conceptually) from mutating is to generate the string representation.

PYK: This difference did not exist in 8.5.7 but does exist in 8.6.1. Which release introduced it? I've made use of the foreach behaviour and wondered if dict for shouldn't also behave the same way. If a dict forall existed, would it be more performant than foreach in this scenario, or is foreach already sufficiently optimized for this?

AMG: [dict for] and [foreach] also disagree in their handling of odd-length lists. [dict for] will give the error "missing value to go with key", whereas [foreach] will behave as if the list had an empty string element appended to it.

RLE: (2014-03-19) Given that both are behaving as documented, and the definition of a "dict" (no repeating keys), requires the dict for result, I fail to see the problem. foreach is defined for lists, dict for is defined for dicts. $mydata is both a valid list and an "almost" valid dict (I say "almost" because of the repeating keys, which will be squashed to only the last key when converting to a dict) and so the results differ because the meaning of $mydata interpreted as a list is indeed different from $mydata interpreted as a dict.

But converting dict for to act like foreach when fed a list of pairs would be counter-intuitive. One would get duplicate keys out when lists with duplicate keys were fed in, but no duplicates out when the dict version of the same list was fed in. That seems wrong.

Think of dict for {key value} $dict {} as operating like this when fed a list:
dict for {key value} [ dict create {*}$mydata ] { }

Given Tcl's auto type conversions behind the scenes, this is probably exactly what happens when dict for is fed a list Tcl_Obj instead of an actual dict Tcl_Obj.

PYK 2014-03-19: The current behaviour of foreach is an improvement. In 8.5.7, it behaved exactly as dict for, which was unexpected. The interesting thing now is that a dict behaves like a specialized list: Its values are ordered and it doesn't lose redundant key-value pairs, making the conversion in either direction lossless, even between pure values. Given these semantics, one begins to wonder how much of a performance penalty there is when lappend or foreach cause a list internal representation to replace as dict internal representation. I'm guessing not much.

AMG: The intrep of a dict value is a hash table mapping from keys to values. The mapped-to structure of the hash table contains not only Tcl_Obj pointers but also doubly-linked list pointers to the previous and next entries in the dict. This is done to maintain ordering when shimmering to list. Consequently, constructing a list from a pure, canonical dict isn't very expensive.

Non-pure and non-canonical dict Tcl_Objs contain string representations from which the list is constructed. This is what makes it possible to have repeated keys which are visible in the string and list representations but not the dict representation.

Making a dict from a list involves constructing the complete hash table, plus threading together all the objects into a linked list. There is some cost there. As is typical of the Tcl hash table, many memory allocations are required.

The most expensive thing is repeated conversion between list and dict. Remember that this conversion happens even when merely reading the value as a list or as a dict. Since neither of the types in question is string, no attempt to modify is necessary to change the internal type.

To mitigate the performance penalty described above, [foreach] contains an optimization to read the dict's linked list pointers rather than force conversion to list. However, this won't work if the dict isn't pure (has a string representation) because it might have duplicate keys which would not be present in the hash table.

A number of years ago I proposed unifying the list and dict intreps to avoid shimmering. However, the generic Tcl hash table facility isn't up to the job, and it can't be amended because its internal structure is present in tcl.h, so the required changes break binary compatibility. The solution would be to make a new, more flexible hash table system with the existing one either coexisting or becoming a compatibility interface. Unfortunately, my attempt to discuss this on the Tcl mailing list instantly devolved into a debate on the relative merits of hashing algorithms, and my proposal was forgotten despite the reference code I had produced. I discuss this unification again in my Brush paper, if you would like a fresh look at it. But the algorithms presented in the paper have the drawback of linear-time dict deletes. Since then I have come up with better algorithms, but I haven't published them anywhere, and I'm afraid I may have forgotten some key points.