Updated 2011-07-15 02:34:34 by AMG

This page describes how to convert a string to or from the GSM 03.38 character set.

Requirements

The non-null 0 character "@"

In the GSM 03.38 character set, the character "@" is encoded as 0x00. encoding convertto interprets such characters as "not convertible" and uses the fallback character "?".

Example:
    set gsmString [encoding convertto gsm0338 "[email protected]"]
    # will return me?domain.com instead of me\x00domain.com

Workarounds
 # The string is split at each "@", and the pieces are converted
 # separately.
 proc toGsm {aString} {
  set partsWithoutAt [split $aString "@"]

  set convertedPartsWithoutAt {}
  foreach part $partsWithoutAt {
   lappend convertedPartsWithoutAt [encoding convertto gsm0338 $part]
  }

  return [join $convertedPartsWithoutAt "\x00"]
 }

An alternative workaround, which is shorter but relies on an implementation detail of encoding convertto:
 proc toGsm {aString} {
  return [encoding convertto gsm0338 [string map {@ \x00} $aString] ]
 }

Lars H, 4 July 2005: I wouldn't label that as an implementation detail of encoding convertto; aren't you simply using the fact that gsm0338 maps NUL to itself? The string map maps \x40 to \x00.

As for the encoding error, is this a shortcoming in the encoding mechanism (impossible to map non-NUL characters to NUL) or an error in the particular encoding definition file? I wouldn't be entirely surprised if other languages have trouble with NULs in strings, but Tcl handles it correctly AFAIK.

schlenk, 5 July 2005: One could simply add the txt file from http://www.unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT to the encoding dir under tools in the tcl source distribution and rebuild the tcl encoding files from scratch including gsm0338 like Tcl does for all the other encoding tables. The above mentioned text file contains some extra hints on how to use the translation table.

willdye For what it's worth, the following (edited) conversation about GSM encoding took place in the Tcl chatroom on 2005-10-14:

Cameron_: Why would a telephone handset, which seems to be using 7-bit ASCII for the most part, encode a '$' as decimal 2 (where blank is 32, '.' is 46, and so on--the normal ascii table)?

schlenk: Maybe some SMS/GSM specification stuff? Something like this: http://wiki.tcl.tk/14441 . "$" is "02" in that specification.

Cameron_: Wow! What a great answer. Thank you!

BAS - yeah, from what I remember, gsm03.38 looks like ASCII, except they took out a lot of the control characters, and replaced them with chars from LATIN-1 upper table.