package require Tclx
set fd [open $bigFlatFile r]
# We know this file is utf-8 encoded, but we want to read a
# certain number of bytes, not chars...
fconfigure $fd -encoding binary
pipe out in
fconfigure $in -encoding binary -blocking 0 -buffering none
fconfigure $out -encoding utf-8 -blocking 0 -buffering none
seek $fd $offset
puts $in [read $fd $numBytes]
read -nonewline $out
close $fd
close $in
close $outUnfortunately, on big chunks of text (>8192), there seems to be a bug in pipe that obstructs this solution... In fact: makes the tcl interpreter hang...In any case, Lars H pointed out that this could be done in a much cleaner way using encoding. Here is the final solution (so far): # This proc is supposed to work just like [read $fileHandle $numChars],
# except that the size of the chunk to read is specified in bytes, not in
# chars. This is useful in connection with [seek] and [tell] which always
# measure in bytes. The proc is supposed to respect the fileHandle's
# configuration w.r.t. encoding, but it will not respect the configuration
# w.r.t. eol convention, I think.
proc readBytes { fileHandle numBytes } {
# Record the original configuration:
set enc [fconfigure $fileHandle -encoding]
# Special treatment of encoding "binary", since this encoding is not
# accepted by [encoding convertfrom]. But this case is trivial:
if { $enc eq "binary" } {
return [read $fileHandle $numBytes]
}
# We are going to reconfigure the channel. If anything goes wrong, at
# least we should restore the original configuration, hence the catch:
if { [catch {
# Configure for binary read:
fconfigure $fileHandle -encoding binary
set binaryData [read $fileHandle $numBytes]
set txt [encoding convertfrom $enc $binaryData]
# And restore the original configuration:
fconfigure $fileHandle -encoding $enc
} err] } {
fconfigure $fileHandle -encoding $enc
error $err
} else {
return $txt
}
}Older remark: it would be really nice (and quite logical, in view of the functionality provided by seek and tell) if read could accept a -bytes flag. The only thing needed is a convention about how to handle the situation where the number of bytes does not constitute a complete char. One convention could be: finish the char in that case. Another convention: discard the non-complete char. Or finally, just leave the fractional char as binary debris --- it is up to the caller to make sure this does not happen, and in the examples like the above this comes about naturally.See Also

