Updated 2007-12-26 18:18:39 by dkf

Recently a discussion occurred on the wiki's chat regarding how to compare two files.

While some operating systems come with such utilities, to write portable code a Tcl developer needs some code of his/her own. Perhaps such code will eventually find its way into fileutil.
westlife       Actually my requirement was to compare two files in binary mode. 
westlife       My friend told me to take them in string using read command and compare them. 
westlife       What do u think is it effeciant and correct way to compare two fils in binary mode ? 
dkf    Two binary files?  That's easy enough.
dkf    Use read to get the data in (making sure you've [fconfigure $chan -translation binary] first)
dkf    And then use string compare or string equal or whatever.
westlife       how can i do that dkf
westlife       but is it efficient way ?
dkf    proc readBinaryFile {filename} {
           set f [open $filename]
           fconfigure $f -translation binary
           set data [read $f]
           close $f
       }
dkf    If your files are small enough (e.g. up to a few megabytes) that'll work just fine.
arjen  If you read it in chunks (especially with large files), then quit as soon as you find a difference
arjen  Use: read $f $chunksize
dkf    If they're really big, you'll need to chunk it
westlife       yes that's what wanted to say 
dkf    msg x'ed 
westlife       oh i.c
westlife       thanx dkf
dkf    Using chunks is slower if your files could fit into your (physical) memory, 
dkf    but if they can't it is much faster.
stevel Also, smaller chunks keep your UI responsive
dkf    proc cmpFilesChunked {file1 file2 {chunksize 16384}} {
           set f1 [open $file1]; fconfigure $f1 -translation binary
           set f2 [open $file2]; fconfigure $f2 -translation binary
           while {1} {
               set d1 [read $f1 $chunksize]
               set d2 [read $f2 $chunksize]
               set diff [string compare $d1 $d2]
               if {$diff != 0 || [eof $f1] || [eof $f2]} {
                   close $f1; close $f2
                   return $diff
               }
           }
       }
dkf    That's untested, but I think it'll work...
westlife        thanx dkf
lvirden        westlife, if you return, a suggestion - check the file sizes before beginning 
lvirden        the file reading process - if the sizes are not equal, then the files are not equal.

Fred Limouzin (2004/03/14): I also use the CRC to compare two (binary) files, at least it works if you are only interested in the equal/different info. If you need to report the differences, then it's another story! So what I do is (need a package require crc32 for instance; assuming your specification allows):

  • if file sizes differ then files differ
  • else if crc32 differ then files differ
  • else files are equal

It dramatically improves the comparison speed.

See also