Updated 2018-02-24 17:31:46 by paul

Purpose edit

HaO: This page is dedicated to the asynchronous socket connect, started by 'socket -async'. See the socket page for a general description.

It also serves as communication page for development and compares TCL 8.5.15-16, TCL 8.6.1-3 and future versions.

Async connect got more complicated in TCL 8.6, as multiple destination IPs are internally supported (due to IPV6 or DNS lookup resulting in multiple IPs).

See my speech on ETCL 2014 [1].

Use cases edit

Background connect and notification about connect

The typical use-case for background connect is to install a writable event to get notified about the connect. If there is an additional connect timeout, this is canceled by the writable connect.

Typical code:
proc Connected {aid h fromip toip} {
    # cancel timeout
    after cancel $aid
    # check connect success or fail
    set error [fconfigure $h -error]
    if {$error ne ""} {
        catch {close $h}
        return
    }
    # disable writable event as it will come again and again if nothing written here
    fileevent $h writable ""
    # do something with the socket
    puts $h "HELO"
    # install readable event to process reply
    fileevent $h readable Receive
}
proc Timeout {h} {
    # Connect timeout
    catch {close $h}
}
# Receive function is not shown here and may be derived from the example below
set h [socket -async $host $port]
set aid [after 10000 [namespace code [list Timeout $h]]]
fileevent $h writable [namespace code [list Connected $aid]]
vwait forever

Background connect and only readevent

If there is no need to get notified on a successful connect and no connect timeout needed, one may use a readable connect only.

Attention, this did not work in Windows before 8.5.16 and 8.6.1 due to bugs:
proc Receive {h fromip toip} {
    # check connect succes or fail
    set error [fconfigure $h -error]
    if {$error ne ""} {
        catch {close $h}
        return
    }
    # get read data and process it
    if {[catch {gets $h} data]} {
        # read error
        catch {close $h}
        return
    }
    if {[eof $h]} {
        # other side disconnected
        catch {close $h}
        return
    }
    # now do something with the data...
}
set h [socket -async $host $port]
fileevent $h readable Read
# if a message is needed by the server after the connect, send it now non-blocking
# It will be automatically sent when the connect succeeds
fconfigure $h -blocking 0
puts $h "HELO"
flush $h
vwait forever

async connect and blocking operation

A use case is to start multiple connect, do something else and then process the connect state, all in a linear program without event queue. An example is a test if multiple servers are alive.

Example program:
set h [socket -async $host $port]
# do something else which needs time
# check if failed. Start also next try of multiple IPs of $host
set error [fconfigure $h -error]
if {$error ne ""} {
    # connect failed
    catch {close $h}
    return
}
# do something else which needs time
# check if failed. Start also next try of multiple IPs of $host
set error [fconfigure $h -error]
if {$error ne ""} {
    # connect failed
    catch {close $h}
    return
}
# nothing to do, so do the rest synchronously
# this blocks !
if {[catch {
    puts $h "HELO"
    set Data [gets $h]
    close $h
} error] {
    # connect failed
    catch {close $h}
}

async connect and no event queue

This example requires the command 'fconfigure -connecting' which is included in TIP 427 and present as a hidden feature since tcl 8.6.2, public in 8.6.4. It investigates if the connection process is still running. This allows to do the upper example without blocking commands.

Example program:
set h [socket -async $host $port]
while {[fconfigure $h -connecting]} {
    # do something else which needs time
}
# connection process terminated - check if failed
set error [fconfigure $h -error]
if {$error ne ""} {
    # connect failed
    catch {close $h}
    return
}
# do something with the connected socket

Command behaviour edit

socket -async

'socket -async' host first does a synchronous DNS lookup.

Then the connect is started as background process.

  • In TCL8.5, this terminates without any interaction by background processes.
  • In TCL8.6, the event loop or command invocation is required to check multiple IPs.
VersionStatus
8.5.15ok
8.5.16+ok
8.6.1 unixok, requires event loop
8.6.1 winonly first IP (broken)
8.6.2ok*
8.6.3ok*
8.6.4+ok
ideasmay be moved in own thread to not require event loop and not to pause between connect tries when command driven

* See below: Bug c6ed4acfd8

update,vwait

Starting the event loop allows in TCL8.6 to continue with the next try or to fail finally. It is not absolutely necessary, as all other socket commands also advance the connect process.

The event queue may also initiate a pending background flush when the socket is successfully opened.

close on error

As a start point for all other commands: if a failed async connect socket is not closed after the first reported error, bad things like unreported errors etc. may happen.

Please close an async socket connect after the first reported error.

fileevent writable

Fires when async connect terminates with success or error.

'fconfigure -error' may be used in the event procedure to check if the connect was successful.
VersionStatus
8.5.15ok, see bugs
8.5.16+ok
8.6.1 winonly first IP (broken)
8.6.1 unixok
8.6.2+ok

fileevent readable

Fires when async connect terminates with error.

On a successful connect, it fires only, if there is data received.

'fconfigure -error' may be used in the event procedure to check if the connect was successful.
VersionStatus
8.5.15 unixok
8.5.15 winonly works when also writable event installed, see bugs
8.5.16+ok
8.6.1 winonly first IP and only with writable event (broken)
8.6.1 unixok
8.6.2+ok

blocking gets,read,puts,flush

Remark: a puts may be delayed to a following flush.

The async connect is terminated synchronously.

On success, the operation is performed.

On connect failure, the error "socket is not connected" is returned. The reason for the connect failure may be investigated using fconfigure -error.
VersionStatus
8.5.15 unixok. Instead of "socket is not connected", "broken pipe" may be reported.
8.5.15 winok
8.5.16+ unixok
8.5.16+ok
8.6.1 winonly first IP tested (broken).
8.6.1 unixok. Instead of "socket is not connected", "broken pipe" may be reported.
8.6.2+ok

non blocking gets,read,puts,flush

Remark: a puts may be delayed to a following flush.

The async connect state is checked or continued (next IP) in a non-blocking way.

Eventual pending flush is executed in the background automatically when the connection is established and the event queue is running.

Possible results:
NumberConditionAction
NB1async connect still in progresswrite operation is buffered and scheduled for background flush.
Read operation returns empty string
NB2async connect succeededoperation is directly executed
NB3async connect failedError "socket is not connected" is returned

Implementation status:
VersionStatus
8.5.15 unixok. Instead of "socket is not connected", "broken pipe" may be reported.
8.5.15 winok
8.5.16+ unixok. Instead of "socket is not connected", "broken pipe" may be reported.
8.5.16+ winok
8.6.1 winonly first IP (broken)
8.6.1 unixok. Instead of "socket is not connected", "broken pipe" may be reported.
8.6.2+ok

close

A close while connection is in progress or after a successful connection should succeed.

A close after a failed connection succeeds.

If a background flush is pending (or already resulted in an internal error), an error is returned.
VersionStatus
8.5.15ok. Empty error message may appear.
8.5.16+ok
8.6.1ok. Empty error message may appear.
8.6.2+ok

eof

eof should be active:

  • After a read on a socket closed from the other side.
  • never active with async sockets and may not be used to detect the connection status
VersionStatus
8.5.15ok
8.5.16+ok
8.6.1ok
8.6.2+ok

fconfigure

Any fconfigure command on the socket continues the connect process.
VersionStatus
8.6.1 winno
8.6.1 unixno
8.6.2+ok

fconfigure -error

A final connect error should be returned by 'fconfigure -error'. No error should be flagged while connection is running.

Implementation status:
VersionStatus
8.5.15 unixok.
8.5.15 winok. Small bug: Failed socket connect error is reported indefinitely
8.5.16+ unixok.
8.5.16+ winok. Small bug: Failed socket connect error is reported indefinitely
8.6.1 winresult of first tested IP (broken)
8.6.1 unixThe errors of all tested IPs show temporarily up. The connect process may be disturbed.
8.6.2+ok

To fix the small bug, that a connect error is repeated indefinitely may introduce compatibility issues of programs which rely on that.

fconfigure -sockname

My own IP of the socket connection. Returns list of IP, Name, Port.

The return value is documented as undefined while an async connect is running.

Implementation status:
VersionStatus
8.5.xreturns something like "0.0.0.0 0.0.0.0 51063"
8.6.2returns the addresses of the connect tries which show up temporarily. Typically ::1, then 127.0.0.1
8.6.3returns the empty string

fconfigure -peername

The destination IP. Returns list of IP, Name, Port.

Implementation status:
VersionStatus
8.5.xreturns information of tried IP while connecting. Error if connection failed
8.6.1 winreturns information of first tried IP. Error if first connect try failed
8.6.2reflects connection process, may return temporary IPs or temporary errors
8.6.3returns the empty string

fconfigure -connecting

Returns 1, if connection process is still running, 0 otherwise. Introduced and described in TIP 427.

Implementation status:
VersionStatus
8.5.xnot supported
8.6.1not supported
8.6.2present as hidden feature
8.6.4public feature

thread::transfer

I thought, transferring a socket while connecting would for sure end in a not detected connection (error).

But in fact, everything worked on my Windows using thread 2.7.1 (current trunk):

  test logs

Successful connection:
% package require Thread
2.7.1
% set t [thread::create]
tid000016FC
% set h [socket -async www.google.com 80]
sock01AE4608
% thread::transfer $t $h
% thread::send $t "fconfigure $h -error"
% thread::send $t "puts $h GETS"
%

and connect error:
% set t [thread::create]
tid000014C0
% #set h [socket -async www.google.com 80]
% set h [socket -async localhost 30001]
sock01AE4708
% #fconfigure $h -unsupported1 1
% fconfigure $h -blocking 0
% thread::transfer $t $h
% thread::send $t "fconfigure $h -error"
connection refused
% thread::send $t "close $h"
%

Bugs edit

Win TCL8.6.1 only tries first IP

Bug 13d3af3ad5

TCL8.6.1 only tries the first of eventual multiple IP addresses to connect. This may cause serious connect issues, specially with IPV6.

This is fixed in branch bug-13d3af3ad5 which also serves as main branch to fix all bugs in TCL8.6.1 and to test enhancements too.

Win connect ignored

Bug 336441ed59

Two issues:

  1. When a connect terminates to quick so the notifier is not ready yet, the connect is ignored and thus it waits forever for it.
  2. A call of puts, gets or read while connecting shortly switched off the connect notification.
VersionStatus
8.5.15bug present
8.5.16+fixed
8.6.1bug present
8.6.2+fixed

Test for 1 is timing dependent and may ignore issue on some machines. Test for 2 is to run the teapot client massively, see bug description.

Empty error message on close on failed background flush

Bug 97069ea11a
VersionStatus
8.5.15bug present
8.5.16+fixed
8.6.1bug present
8.6.2+fixed

  Test proposal

The test is difficult, as an async connect must fail after a puts is issued on the channel.

Idea: write a dummy channel driver, which may be set to an error state by fconfigure -seterror and where the readbale/writable state may be set. So one could:
set h [open dummy]
fconfigure $h -seterror EWOULDBLOCK
fileevent $h writable {set x writable}
fconfigure $h -blocking 0
puts $h abc
fconfigure $h -setwritable 1
vwait x
catch {close $h} e d

No readable event on async socket connect failure

Bug 581937ab1e
VersionStatus
8.5.15 winbug present
8.5.16+ winFixed
8.6.1 winbug present
8.6.2+ winFixed

TCL hangs in event queue when an async socket tries next IP and there is already a connected socket

Bug c6ed4acfd8
VersionStatus
8.6.1ok
8.6.2 winBug introduced
8.6.3 winBug present
8.6.4 winFixed

ToDo's edit

Robust tests

(Bug 42d50ebd) Many tests now got timing dependent. Here is my discussion proposal to eventually cure that:

As found out yesterday, it is not possible any more to fix the moment when a socket connect fails. Example:
set h [socket -async localhost [randport]]
# This needs two "updates" to fail, one for ::1, one for 127.0.0.1
# Background connect to ::1 started
fconfigure $h -blocking 0
# if connect to ::1 already failed, connect to 127.0.0.1 starts
puts $h Hi
flush $h
# if connect to 127.0.0.1 already failed, this shows the error
"connection refused"
# if connect to ::1 already failed, connect to 127.0.0.1 starts

For most tests, we need the connect procedure fail after the flush.

So I propose to:

  • Add a test command "testsocket testflags $h bool" which sets a channel flag to not continue the connect on any command.

Thus, the upper test may go like that:
# Switch auto-continue off
set h [socket -async localhost [randport]]
testsocket testflags $h 1
close $h
# Now do the test setup:
set h [socket -async localhost [randport]]
fconfigure $h -blocking 0
puts $h Hi
flush $h
fileevent $h writable {set ::x 0}
# switch auto-continue on to have normal operation
testsocket testflags $h 0
vwait x

This is implemented in fossil branch robust-async-connect-tests

The same way, we could also do the test for tclIO.c "background error but no error message".

There are still a couple of test failures on CentOS and on FreeBSD documented in ticket 13d3af3ad5.

prioritize connect errors and return most appropriate

If a socket connect fails, the error in the latest connect stage should be returned. This would prioritize "access denied" (e.g. socket in use) before "network unreachable" (no route).

Project stage for Win and Unix.

This is already implemented for Unix server sockets.

TclWinGetSockOpt() stubs entry may return wrong state

The Win TCL stubs table contains an entry for TclWinGetSockOpt() which returns the info from getsockopt().

In TCL8.5, the result of fconfigure -error was always the return value of the system call getsockopt(). In TCL8.6, a connect failure is cached in a variable and returned by fconfigure -error. Eventually, this should also be done by the routine called by the TclWinGetSockOpt stubs entry.

The purpose of this stub entry seams to be from the times of Windows 98 where a WinSock2.dll may not be present. There are no known usage of this. Thus it was decided to leave it as depreciated and to remove it for Tcl9.0.

  send pending data when connected

When one puts pending data while connecting:
set h [socket -async $host $port]
fconfigure $h -blocking 0
puts $h "HELO"
flush $h
vwait forever

this data is automatically sent when the connection is available.

I have no idea how this works, but it seam to work.

If there is a writable event, ok, I see the entry point for the framework, but without ?

This is a marker for me to investigate this.

A proposed test is a bit like that (sorry, in German from an E-Mail to rmax):
Wir brauchen:
- eine Maschine mit IPV4 und IPV6.
- info ob erst IPV4 oder erst IPV6 geprüft wird.
Im folgenden wird (wie bei mir) erst IPV6 geprüft.

Server und Prüfer:
proc accept {s m p} {
    set ::s $s
    set ::x [gets $s]
    # hier kein close, da es auch Prozesse auslösen kann
}
set server [socket -server accept -myaddr 127.0.0.1 30000]
vwait x
set x
# -> x muss "Hi" sein
close $server
close $s

Client in extra Prozess
set h [socket -async localhost 30000]
fconfigure $h -blocking 0
puts $h Hi
# stößt eventuell zweiten connectversuch an, aber gibt erstmal
EWOULDBLOCK an Framework zurück...
flush $h
# bis hierhin wird nichts gesendet, da noch der zweite Connectversuch läuft.
# Hier wird jetzt connected und im Hintergrund automatisch das flush
ausgeführt. Wie ? keine Ahnung aber bei mir gehts....
after 2000 {set w 1}
vwait w

# kein close, Daten müssen ohne close ankommen...
# auch kein fileevent, da das auch ein background flush auslöst.

  (obsolete, only for documentation) On Windows, use list connect and remove all the looping if Vista+ is present

In the windows API, there are two system functions which basically do all the work done in the tcl connect loop:

Those functions are only available for Vista+ (Desktop Applications) and Windows 8.1 (all Applications) (whatever that means).

Jan Nijtmans has sent me the following pointer to make code dependent on the availability of windows features: [2]

To use this would increase performance within the connect procedure as there is no wait for the event loop etc necessary. In addition, the loop over IPs could be removed which makes a whole bunch of things easier.

rmax - I don't think this is The Right Thing to do for now, because the WSAConnectBy*() functions don't seem to allow non-blocking operation, so they would only be usable for blocking connects. This means we'd still need the looping and event loop stuff for [socket -async]. Another reason why we couldn't drop the loops even from the synchronous case is that we probably still want to support Windows versions before Vista.

So on the bottom line, we wouldn't save any of the current code, but add a lot of complexity, because the code would have to decide when these convenience functions can be used and load them. It would also increase testing effort, because different Windows versions are needed to test the different code paths.

I'd rather suggest to invest that time into unifying the two loops from the Windows and Unix platform code into a portable convenience function that goes into generic/tclIOSock.c, so that future changes don't have to be done twice.

HaO Thanks Reinhard (also for the chat session). After reading the docs it seams only be usable for syncroneous operation and lacks of options '-myip/-myport'. So using those commands is not an option.

For me, the final goal for the command 'socket -async' is:

  • it returns imediately (not after DNS lookup)
  • it works fully in the background and only gives a status when connected or completely failed
  • it has highest possible performance

and I hoped I can reach those aims without putting all the connect process in its own thread. Aparently, this is not the case using those functions. They block and they don't have the full functionality.

Thanks edit

Thanks to Wojciech Kocjan for his book BOOK Tcl 8.5 Network Programming and discussions which taught this network stuff to me.