Updated 2011-07-07 22:23:36 by RLE

Richard Suchenwirth 2002-12-15 - Unique item identifiers may at times get pretty lengthy, especially if numeric. For instance, an EAN code (European Article Number) as found on many food and non-food articles, is 13 digits long. In the typical scanning of barcodes, or occasional typing by hand, errors may occur that can be detected by requesting a certain property from the code - most often that a check sum computed by some rules, modulo a number, gives zero. Albrecht Beutelspacher's Pasta all'infinito (M?: dtv 2002) taught me some rules, amid much other reading pleasure (highly recommended if you're interested in Math and/or Italy), so here's my Tcl implementations which return 1 if the input code is valid, else 0.

Digits in the EAN code enter the checksum weighted alternatingly by 1 or 3. Let's take for example the EAN of that book:
 EAN:    9  7 8 3 4 2 3 3 3 0 6  9 5
 weight: 1  3 1 3 1 3 1 3 1 3 1  3 1
 gives:  9+21+8+9+4+6+3+9+3+0+6+27+5 = 110, 110 % 10 = 0 -> ok
 proc EANvalid ean {
    regsub -all {[^0-9]} $ean "" ean ;# remove all non-digits
    set weight 1
    set sum 0
    foreach digit [split $ean ""] {
        set sum    [expr {$sum + $digit * $weight}]
        set weight [expr {$weight == 1? 3: 1}]
    }
    expr {($sum % 10) == 0}
 }

This procedure detects all 1-digit errors, but may fail on 10% of digit swaps (e.g. if 3 8 was typed instead of 8 3). A more robust scheme is used in ISBN (International Standard Book Numbers), ten-digit sequences consisting of the following fields:
 country     publisher  number checkdigit, e.g.
 3 (Germany) 423 (dtv)  33069  4

Weights in this scheme decrease from 10 to 1, and the special twist is that the check digit is "undecimal", meaning both "non-decimal" (because besides the digits 0-9, it may be X) and "base 11" (Latin undecem, X standing for 10). For example, an elderly book that I also really like has the ISBN
 ISBN:    3 - 5  4 0 - 1 0  3  5 2 - X
 weight: 10   9  8 7   6 5  4  3 2   1
 gives:  30+ 45+32+0+  6+0+12+15+4+ 10 = 154, 154%11 = 0 -> ok
 proc ISBNvalid isbn {
    regsub -all {[^0-9X]} $isbn "" isbn
    set weight 10
    set sum 0
    foreach digit [split $isbn ""] {
        if {$digit=="X"} {set digit 10}
        set sum [expr {$sum + $digit * $weight}]
        incr weight -1
    }
    expr {($sum % 11) == 0}    
 }

Note also that modern books have a bar-coded EAN which, after the prefix 978, contain the ISBN minus its check digit (EAN is decimal, so X might be a problem), and adding the EAN check digit, so the Beutelspacher book has the EAN
 978   3 423 33069   5
 ISBN: 3-423-33069-4

See also ISBN - UIC vehicle number validator - Validating Credit Card Check Digits

Comment: The weight sequence above is in the wrong order.
         It should be 1 2 3 4 5 6 7 8 9 10
         Goran
         [email protected]

RS: I just re-checked the Beutelsbacher book - the weights from left to right are indeed given there as descending 10 9 .. 2 1. However, I tested several books, and both sequences seem to work.. Funny.

Lars H: Through the miracle of modular arithmetic, both weight sequences are equivalent. Since the weighted sum is supposed to be 0, it doesn't matter if we switch sign on all the weights (-0=0), and as it happens, -1=10, -2=9, -3=8, ..., -10=1 (modulo 11).

ECS 2005-03-18: A really interesting page is "The Laws of Cryptography: Coping with Decimal Numbers" by Neal R. Wagner [1].

See also the checkdigit code at bpay. BPay is an Australian bill payment system.