Updated 2015-09-01 13:54:03 by pooryorick

Benny Riefenstahl 2003-01-05:

Tk uses X11-style keysyms in the generic parts of its implementation and on the Tcl level. Because keysyms are a pre-Unicode concept this causes some disconnect on platforms like Microsoft Windows and MacOS X that support Unicode input but know nothing about X11 keysyms. This page tries to present the issues.

See also keysyms for Tcl code using keysyms.

As with all Wiki pages, please feel free to correct the text or add to it.

Basic idea and X11 implementation

Keys on a keyboard have a platform and hardware specific designation called a "keycode." A keycode for a key does not change when the labels on a keyboard change. The same keyboard can be marketed in the US and in Germany with just different labels on the keys. The same key that has the "Z" label on a given keyboard in the US, has the "Y" label on the german version. On both versions the key would still have the same keycode. OTOH the escape key and the space bar always have the same keycode on a given physical keyboard because it wouldn't make sense to change their meaning.

To abstract from hardware and platform specific keycodes, X11 represents keys on a keyboard with "keysyms." There are two type of keysyms:
Character keys
Keys that represent normal printable characters. These are determined from keycodes using a keyboard mapping table. This mapping of keycodes to character keysyms can change while the application runs (see below).
Special keys
Keys that represent some function on the keyboard. Some of these can be represented with ASCII control characters (e.g. ENTER, SPACE, ESCAPE, BACKSPACE), while others can not (e.g. function keys, cursor keys). On MacOS X the keycodes for these keys have a fixed well-known mapping to keysyms. On other platforms or with more "exotic" keyboards, these keys also have to be mapped from keycodes using a mapping table.

Modifier keys are also encoded as keysyms. In addition they are usually encoded separately as a modifier mask in key events, so one knows which modifiers are pressed at the occurrence of other key events.

For C programming the X11 keysyms have names that are assigned in the include file <X11/keysymdef.h> [1].

Except for the ASCII and ISO-8859-1 ranges, the keysym codes are not compatible with Unicode. To get from keysyms to Unicode characters an additional mapping has to be performed. Also the keysym system in its traditional form of approximately 920 symbols does not cover the whole Unicode range. Because nobody wanted to maintain tables with names for all the characters in the Unicode standard, an algorithmic mapping was invented for characters that are not covered by the existing tables [2], [3].

Other platforms

The keysym concept in X11 form is not known natively on MacOS X and Microsoft Windows. Windows has "virtual key codes" instead that serve a similar function. MacOS X (Carbon) uses the keycodes themself for special keys. For the printable characters it uses characters in MacOS specific encodings or in Unicode.

Tk OTOH uses the X11 keysym codes even on Windows and MacOS X in the implementation of the keyboard events. They are used in the internal keyboard handling at the boundary between generic code and platform specific code as well as in the Tcl commands like bind.

The MacOS X "Input Menu" and the "Keyboard Layout/IME" menu on Microsoft Windows allow the user to change the keyboard mapping on an application by application basis, on the fly and while the application runs. So the mapping from keycodes to keysyms can change at any time. The fact that Tk uses keycodes in the internal interfaces besides keysyms doesn't make this any easier.

The main user-visible use of keysyms in Tk is the use of keysym names for portable key bindings. On X11 Tk uses xlib to convert between names and codes, on other platforms Tk derives the names from <X11/keysymdef.h> and creates a runtime mapping table from that header.

On Windows and MacOS X Tk only supports keysyms correctly for a limited number of keys, namely special keys, and the ranges of ASCII and ISO-8859-1 (support of ISO-8859-1 on MacOS X since 8.4.2).

How to go forward for a better implementation

To extend keysym support on Windows and MacOS X, generic mapping functions between Unicode and keysyms could be created to cover all characters. We could probably just copy the XFree86 version (see [4]).

Another possibility would be to drop support for keysyms in general except for special keys and ASCII. Instead Unicode code points and Unicode strings could be allowed whereever keysym codes and keysym names are used right now. Input of keysym codes and keysym names could be allowed as platform extensions on those systems and as far as they work right now.