Updated 2017-09-21 12:30:05 by EMJ

csv, a Tcllib package, provides facilities for working with csv files.

http://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/csv/csv.html

Description  edit

Supports mult-line records.

This module provides support for CSV data.

Example: Parsing CSV  edit

#! /bin/env tclsh

package require csv
package require struct::matrix

::struct::matrix data
set chan [open myfile.csv] 
csv::read2matrix $chan data  , auto 
close $chan

set rows [data rows]

for {set row 0} {$row < $rows} {incr row} {
    puts [data get row $row]
}

auto is almost always necessary what you want when parsing a matrix, but it isn't the default, so it has to be explicit passed to [::struct::matrix], meaning that the second argument, usually ,, must also be passed.
#! /bin/env tclsh

package require csv
package require struct::queue

::struct::queue data
set chan [open myfile.csv] 
csv::read2queue $chan data
close $chan

while {[data size] > 0} {
    puts [data get]
}

Example: Generating CSV  edit

package require csv

# Make the lists to convert to csv-format
set exampleList1 {123 123,521.2}
lappend exampleList1 {Mary says "Hello, I am Mary"}
lappend exampleList1 {}
set exampleList2 {a b c d e f}
set exampleList3 {}
for {set i 0} {$i < 10} {incr i} {
   lappend exampleList3 $i
}
# Make a list of lists...
set exampleLists [list $exampleList1 $exampleList2 $exampleList3]

# Write the data to a file
set f [open exampleLists.csv w]
puts $f [csv::joinlist $exampleLists]
close $f

The result of running this program is 4 lines - one for each example list, and an empty line.

[JDW]: The "empty line" (mentioned above) is a nuisance for some applications. It is the result of [csv::joinlist] including a newline at the end of every line, rather than as a delimiter between lines. Then the [puts] adds another newline. The extra newline can be avoided by using the following construct:
puts -nonewline $f [csv::joinlist $exampleLists]

Of course, the [write_file] command from Tclx would would make the [open]/[puts]/[close] sequence all one line:
write_file -nonewline exampleLists.csv [csv::joinlist $exampleLists]

However, -nonewline isn't supported on write_file. My first thought was that the extra newline shouldn't be added by [csv::joinlist], but perhaps the real deficiency is that [write_file] should support -nonewline. One way or the other, it would be handy to make [write_file] and [csv::joinlist] work together.

The (very ugly) workaround I've come up with is:
write_file exampleLists.csv [string range [csv::joinlist $exampleLists] 0 end-1]

Of course, that's probably not efficient for writing non-trivial file sizes.

In case this behavior is version-dependant, this was tested using ActiveTcl 8.4.19.1 on Linux.

Demos  edit

Tcllib also comes with a few sample programs demonstrating the usefulness of the csv package. See the tcllib/examples/csv/ directory for code to convert csv files to html, to cut out csv columns, to join csv data from two files, to sort csv files by column, to do a 'uniq' type function on csv columns, etc. Currently at version 0.0 .

These demos are in the tcllib source tree. If you want to use them, however, you have to install them by hand.

The csv utility commands in tcllib/examples/csv/ are

csv2html

csv2html ?-sep sepchar? ?-title string? file...

        Reads CSV data from the files and returns it as a HTML table
        on stdout.

csvcut

csvcut  ?-sep sepchar? LIST file...

        Like "cut", but for CSV files. Print selected parts of CSV
        records from each FILE to standard output.

        LIST is a comma separated list of column specifications. The
        allowed forms are:

        N       numeric specification of single column
        N-M     range specification, both parts numberic,
                N < M required.
        -M      See N-M, N defaults to 0.
        N-      See N-M, M defaults to last column

        If there are no files or file = "-" read from stdin.

csvdiff

csvdiff ?-n? ?-sep sepchar? ?-key LIST? file1 file2

        Like "diff", but for CSV files. Compare selected columns of CSV
        records from each FILE to standard output.

        -n indicates that line numbers should be output

        -sep sepchar allows one to indicate that, instead of a comma,
        the sepchar will be separating the CSV columns.

        LIST is a comma separated list of column specifications. The
        allowed forms are:

        N       numeric specification of single column
        N-M     range specification, both parts numberic,
                N < M required.
        -M      See N-M, N defaults to 0.
        N-      See N-M, M defaults to last column

        file1 and file2 are the files to be compared.

Example of use:
$ cat > f1
a|b|c|d|e|f|g|h|i|j|
1|2|3|d|e|F|g|h|i|j|
x|y|z|d|e|f|g|h|i|j|
^D
$ cat > f2
a|b|c|d|e|f|g|h|i|j|
1|2|3|d|e|f|g|h|i|j|
x|y|z|d|e|f|g|h|i|j|
^D
$ csvdiff -sep '|' -key '0 5 8 9' f1 f2
-|1|2|3|d|e|F|g|h|i|j|
+|1|2|3|d|e|f|g|h|i|j|

Note that if you want to compare several fields, I find that I have to use spaces to separate them, rather than commas as the comments imply. Also, if there are multiple lines in one of the files that are identical in the columns specified, a warning similar to this will appear:
warning: 0 2942 0000 R occurs multiple times in f1 (lines 2634 and 2633)

Also, all the first file's lines are output first, then the second file's lines are output.

csvjoin

csvjoin ?-sep sepchar? ?-outer? keycol1 file1.in keycol2 file2.in file.out|-

        Joins the two CSV inputtables using the specified columns as
        keys to compare and associate. The result will contain all
        columns from both files with the exception of the second key
        column (the result needs only one key column, the other is
        identical by definition and therefore superfluous).

        Options:

        -sep    specifies the separator character used in the input file.
                Default is comma.

        -outer  Flag, perform outer join. Means that if the key is
                missing in file2 a record is nevertheless written,
                extended with empty values.

csvsort

csvsort ?-sep sepchar? ?-f? ?-n? ?-r? ?-skip cnt? column file.in|- file.out|-

        Like "sort", but for CSV files. Sorts after the specified
        column. Input and output are from and to a file or stdin
        and stdout (Any combination is possible).

        Options:

        -sep    specifies the separator character used in the input file.
                Default is comma.

        -n      If specified integer sorting is used.
        -f      If specified floating point sorting is used.
                (-n and -f exclude each other. If both are used the
                last option decides the mode).

        -r      If specified reverse sorting is used (largest first)

        -skip   If specified that number of rows is skipped at the beginning,
                i.e. excluded from sorting. This is to allow sorting of
                CSV files with header lines.

csvuniq

csvuniq ?-sep sepchar? column file.in|- file.out|-

        Like "uniq", but for CSV files. Uniq's the specified column.
        Writes the first record it encounters for a value. Input and
        output are from and to a file or stdin and stdout (Any
        combination is possible).

        Options:

        -sep    specifies the separator character used in the input file.
                Default is comma.

[Examples of how to use the above commands would be helpful]

Alternatives edit

  • tclcsv is a binary extension for parsing CSV much faster than the pure-tcl implementation can manage
  • tsv (part of tcl-hacks) is a TclOO-based parser for CSV-like formats. It is somewhat slower than Tcllib's CSV, but offers a nicer interface and more flexibility while passing the same test suite.
  • csv - this page's "See Also" section links several related projects