Updated 2016-06-07 20:56:07 by lh

gets - Read a single line from a channel http://www.tcl.tk/man/tcl8.5/TclCmd/gets.htm
gets channelId
gets channelId variable

Reads a single line from the specified channel. In the first form, the characters of the line (with the exception of the end-of-line character) are returned as the result of the command. In the second form the characters of the line are written into the variable and the length of the line is returned instead.

When applied to a blocking channel the command will block until a line is complete or EOF was encountered. If the command is applied to a non-blocking channel and unable to read a complete line the first form of the command will return an empty string. The second form will return a -1 and refrain from setting the variable.

Do not use this command when Working with binary data. It will try to recognize end-of-line characters no matter what, even inside of packets.

If you're using gets in a loop, and want to stop when you reach the end of the file, use the following structure:
while {[gets $filestream line] >= 0} {
    # do what you like here
}

When structuring your code this way (and if your channel is blocking – the default) you do not need to use the eof command to detect when you've read all the data.

APN 2013-11-15 The issues in this section (DoS etc.) have been addressed in newer Tcl versions (8.5+) with the chan pending command.

LH 2016-06-05 Unfortunately not. As explained by Andreas Kupries at https://community.activestate.com/node/6942: "The chan pending command reports the number of characters found in Tcl's input buffer. It does not report how much data could be in the buffers held by the operating system and not yet delivered to Tcl. So, when a read command is used (or gets for that matter, LH) Tcl may find its own buffers empty, at which point it asks the OS for more data, if any, fills its buffers, and then delivers what was wanted."

This means that even if chan pending reports a small number of bytes available, and you call gets, you may still get a very long line in return. The man page for chan pending claims that it can be "especially useful in a readable event callback to impose application-specific limits on input line lengths to avoid a potential denial-of-service attack where a hostile user crafts an extremely long line that exceeds the available memory to buffer it", but I don't see how. George Peter Staplin's advice (see below) seems to be still valid, after so many years.

APN 2016-06-06 I think it does fix the issue. Assuming we are talking about sockets, every platform and transport protocol I know of has a limit on how much data it will store on connections within the kernel before it blocks the remote end via flow control. Let us say this limit is 64K. Suppose the max line length you are willing to expect is 128K. As the chan command keeps reading, it will buffer (within Tcl) about 128K + some small delta and suck out at most 64K from the kernel in an attempt to find a newline. At that point, chan pending returns 128K+some and the application aborts the connection. See chan pending for sample code.

LH 2016-06-07 I don't think it does. What if I wanted to limit the line length to 1K instead of 128K? By the way, your numbers are far too low. Windows can buffer megabytes not kilobytes, so according to your reasoning I could only prevent my socket server to accept lines several megabytes long, and not shorter. Also, the sample code you refer to does not work as intended. See my explanations on the chan pending page.

George Peter Staplin: Using gets with a socket is a BAD IDEA. tclhttpd uses gets (as well as some of the modules), and it is trivial to make it panic on a unix-like system. With a Windows system, that may not have a ulimit on the memory tclhttpd can allocate, it may be even worse. Sadly, this has been known since 2001 (see below).

20040721 CMcC In defence of tclhttpd: tcl core module http and tcllib modules comm, ftpd, ftp, irc, nntp, pop3, pop3d, smtpd and ident all seem to suffer from precisely the same problem.

20060730 CMcC was thinking about why we don't see this problem in the wild, and remembered that all of the above protocols have per-transaction timeouts. For example, tclhttpd expects a completed header within a defined period after a connection occurs. It is this timeout, not the available address space, which limits the length of a line an attacker can send in most cases. This is not to say that gets shouldn't be fixed, but there is a simple preventative.

From the Tcl'ers chat on Oct 24, 2001:

dgp: I reported long ago that tclhttpd was vulnerable to a DoS due to gets slurping up data until it sees a newline. I guess that weakness in gets has never been addressed.

bbh: is it a weakness in gets or a weakness in an app using gets instead of read ?

dgp: Well, Brent Welch replied and said that the solution he would have to implement would be effectively writing his own safe gets in terms of read.

From the Tcl'ers Wiki Sep 20, 2002:

GPS: This bug with gets could be solved I suspect by adding a -maxchars flag to gets. For example:
 set res [gets -maxchars 100 $chan data]

If more than 100 chars are read then gets should return -1 or something like that. This would only be for the usage of gets with the optional variable argument.

MC 29 Oct 2006: I've proposed a [chan available] command (TIP #287 [1]) to give programmers a tool they can use to introspect the amount of buffered (but as of yet unread) data on a channel. This would allow applications enough new introspection capabilities to implement their own policy for handling excessively long input lines, while still retaining the same [gets] semantics. (In a readable fileevent callback, where one should be testing for fblocked already, you could check whether [chan available $sock] > $limit and take appropriate action if it is.)

RS 2005-08-25: Here's how to temporarily disable echoing of the characters input to gets. You need stty which is part of Linux and Cygwin, so it works even on windows: (thanks MNO for the stty tip!)
 proc userpasswd _arr {
    upvar 1 $_arr ""
    if ![info exists (-user)]   {set (-user) [prompt "username:"]}
    if ![info exists (-passwd)] {
        exec stty -echo
        set (-passwd) [prompt "password:"]
        exec stty echo
        puts ""
    }
 }
 proc prompt string {
    puts -nonewline "$string "
    flush stdout
    gets stdin
 }

More concentrated, here's the "gets with no echo" functionality by itself:
 proc gets'noecho {} {
     exec stty -echo
     gets stdin line
     exec stty echo
     puts ""
     set line
 }

MHo: I think it should be possible to handle the echo state on Windows without installing Cygwin....

See gets workaround for a solution when [gets stdin] won't work, e.g. on W95 and PocketPC.

From comp.lang.tcl, thanks to Alex, a drop-in replacement for gets with an extra timeout argument:
 proc gets_timeout {ch vline timeout} { 
    upvar $vline line 
    set id [after $timeout set ::_gt($ch) 1] 
    set blo [fconfigure $ch -blocking] 
    fconfigure $ch -blocking 0 
    fileevent $ch readable [list set ::_gt($ch) 2] 
    set err NONE 
    while {1} { 
        vwait ::_gt($ch) 
        if {$::_gt($ch)==1} { 
            set err TIMEOUT 
            break 
        } 
        set n [gets $ch line] 
        if {$n<0} { 
            if {[fblocked $ch]} continue 
            set err EOF 
        } 
        after cancel $id 
        break 
    } 
    fconfigure $ch -blocking $blo 
    switch $err { 
        NONE {return $n} 
        TIMEOUT {error TIMEOUT} 
        EOF {return -1} 
    } 
 } 

AMG: I'd like to see an option added to [gets] to override the end-of-line characters. When this option is in use, the delimiter character probably should be retained in the output so the program can tell which delimiter was read, or if a delimiter was read at all before hitting EOF. I guess it would work a bit like getdelim() [2].

Here's some code I use right now that comes close.
# Read from $chan until one of the characters in $delims is encountered.
proc read_delim {chan delims} {
    set result ""
    while {1} {
        set char [read $chan 1]
        if {$char eq ""} {
            error EOF
        } elseif {[string first $char $delims] == -1} {
            append result $char
        } else {
            return $result
        }
    }
}

Pie in the sky: Allow the definition of a "line" to be specified as a regular expression. However, I doubt the Tcl RE code is flexible enough to operate on a stream of data as well as a random-access buffer whose size is known in advance.