Updated 2013-01-20 17:53:05 by pooryorick

File format edit

[...]

Steve Cassidy: "Trf lets you zip up a bit of data but doesn't provide the machinery to zip directories or files. mkZipLib [1] provides such an interface as does zvfs [2]. More up to date is the zipvfs package which is part of tclvfs but that doesn't seem to be working cleanly yet and requires tcl8.4 features."

http://www.equi4.com/critlib/zipper.README http://www.equi4.com/critlib/zlib.README

See also Using a zip file as a Tcl Module.

See also Using zipper to create zip files.

Also, explain pkware vs. winzip vs. zLibDll.

ZIP reader class

DKF: Here's a little TclOO class to read from ZIP files (it only handles very basic cases, but that's good enough for what I wanted it for):
package require Tcl 8.6

oo::class create Zip {
    variable comment directory fd
    constructor {filename} {
        set fd [open $filename rb]
        my ReadDirectory
    }
    destructor {
        close $fd
    }

    method ReadDirectory {} {
        set off -22
        while 1 {
            seek $fd $off end
            binary scan [read $fd 4] i sig
            if {$sig == 0x06054b50} {
                seek $fd $off end
                break
            }
            incr off -1
        }
        binary scan [read $fd 22] issssiis sig disk cddisk nrecd nrec \
                dirsize diroff clen
        if {$clen > 0} {
            set comment [read $fd $clen]
        } else {
            set comment ""
        }
        if {$disk != 0} {
            error "multi-file zip not supported"
        }
        seek $fd $diroff
        for {set i 0} {$i < $nrec} {incr i} {
            binary scan [read $fd 46] issssssiiisssssii \
                sig ver mver flag method time date crc csz usz n m k d ia ea \
                off
            if {$sig != 0x02014b50} {
                error "bad directory entry"
            }
            set name [read $fd $n]
            set extra [read $fd $m]
            if {$k == 0} {
                set c ""
            } else {
                set c [read $fd $k]
            }
            set directory($name) [dict create timestamp [list $date $time] \
                    size $csz disksize $usz offset $off method $method \
                    extra $extra comment $c]
        }
    }

    method names {} {
        lsort [array names directory]
    }
    method comment {{name {}}} {
        if {$name eq ""} {
            return $comment
        }
        return [dict get $directory($name) comment]
    }
    method info {name {field ""}} {
        if {$field ne ""} {
            return [dict get $directory($name) $field]
        }
        return $directory($name)
    }
    method contents {name} {
        dict with directory($name) {}
        seek $fd $offset
        binary scan [read $fd 30] isssssiiiss sig - - - - - - - - nlen xlen
        if {$sig != 0x04034b50} {
            error "not a file record"
        }
        seek $fd [expr {$nlen + $xlen}] current
        set data [read $fd $size]
        if {[string length $data] != $size} {
            error "read length mismatch: $size expected"
        }
        if {$method == 0} {
            return $data
        } elseif {$method == 8} {
            return [zlib inflate $data]
        } else {
            error "unsupported method: $method"
        }
    }
}

An example of use:
set z [Zip new [lindex $argv 0]]
if {$argc > 1} {
    puts Comment:[$z comment [lindex $argv 1]]
    puts Contents:\n[$z contents [lindex $argv 1]]
} else {
    puts Comment:[$z comment]
    foreach n [$z names] {
        puts "$n ([$z info $n disksize] bytes)"
    }
}
$z destroy

Functional operation edit

NEM zip is also the name of another functional programming classic. You can think of it as a function that takes a bunch of "columns" (think relational) and returns a list of "rows":
proc zip {cola colb} {
    set ret [list]
    foreach a $cola b $colb { lappend ret [list $a $b] }
    return $ret
}
zip {1 2 3} {a b c} ;# returns {{1 a} {2 b} {3 c}}

You can generalise zip to zipWith which applies an arbitrary function on each pair of values from the columns:
proc apply {func args} { uplevel #0 $func $args }
proc zipWith {f cola colb} {
    set ret [list]
    foreach a $cola b $colb { lappend ret [apply $f $a $b] }
    return $ret
}
interp alias {} zip {} zipWith list

You could further generalise the function to take an arbitrary number of columns. I'll leave that, and the reverse unzip operation as exercises. See also fold, filter, iterators and map.

Lars H: Isn't that more commonly known as "transposing"? See Transposing a matrix.

NEM: Depends who you ask. zipWith is more general, though.

AMG: "list" isn't a valid lambda, so it's not directly usable with the Tcl 8.5 [apply] command. Here's an 8.5-compatible version. It uses single-argument [lindex] instead of [list] to avoid adding an extra level of list nesting. (Single-argument [lindex] simply returns its argument, even if its argument isn't a valid list.)
interp alias "" zip "" zipWith {{args} {lindex $args}}

AMG: Implementation using lcomp:
proc zip {cola colb} {
    lcomp {[list $a $b]} for a in $cola and b in $colb
}

Here's a version that handles an arbitrary number of columns:
proc zip {args} {
    if {[llength $args]} {
        for {set i 0} {$i < [llength $args]} {incr i} {
            append expression " \$$i"
            lappend operations and $i in [lindex $args $i]
        }
        lset operations 0 for
        lcomp \[list$expression\] {*}$operations
    }
}

Also, I'll take your unzip challenge. ;^)
interp alias "" unzip "" zip

As Lars H pointed out, zip is transpose, so using it twice gives back the original input.

Examples:
% zip {a 1} {b 2} {c 3}
{a b c} {1 2 3}
% unzip {a b c} {1 2 3}
{a 1} {b 2} {c 3}