Updated 2011-06-10 12:36:47 by RLE

Richard Suchenwirth - From a harmless question on news:comp.lang.tcl, whether Tcl has an equivalent to isdigit() in C, evolved the following code that checks whether a string is a well-formed English number.
```  proc isnumber s {
set re ^((one|two|three|four|five|six|seven|eight|nine|ten
append re |eleven|twelve|thir|fif|teen|twen|ty|forty|fifty
append re |eighty|hundred|thousand|and
append re {)[ -]?)+\$}
regexp \$re \$s
}
isnumber {nineteen hundred and seventy-six}
1
isnumber twenty-six
1
isnumber twenty-something
0```

Dan Smart commented:
``` Hmm,
isnumber {teen thousand and ty nine}
1
isnumber teen
1
isnumber ty
1
isnumber fif
1
isnumber {one and thousand and two and fif and thir and hundred}
1```

Ooops - I guess it's back to the drawing board...

It was. Here's version 0.2, written at midnight:
``` proc en2num s {
array set dic {
zero 0  one 1 two 2 three 3 four 4 five 5 six 6 seven 7
eight 8 nine 9 ten 10 eleven 11 twelve 12 thirteen 13
fifteen 15 eighteen 18 twenty 20 thirty 30 forty 40
fifty 50 eighty 80 score 20 hundred 100 thousand 1000
million 1000000 millions 1000000
}
regsub -all " and |-" [string trim \$s] " " s
set res [list] ;# will become the translation to math
foreach i [split \$s] {
if [info exists dic(\$i)] {
if {\$dic(\$i)>=1000 && [llength \$res]} {
regsub 000 \$res "" res ;# will multiply by 1000 later
set res "(\$res)"
}
if {(\$dic(\$i)>99|\$i=="score") && [llength \$res]} {
lappend res *
} else {lappend res +}
lappend res \$dic(\$i)
} elseif {[regexp (.+)teen \$i -> t]&&[info exists dic(\$t)]} {
lappend res + [expr \$dic(\$t)+10]
} elseif {[regexp (.+)ty \$i -> t]&&[info exists dic(\$t)]} {
lappend res + [expr \$dic(\$t)*10]
} else {return -code error "\$s is not a number: \$i"}
}
expr \$res
}
proc en:isnum s {expr ![catch {en2num \$s}]}```

This uses a parser that extracts the value of an English number, if possible, by building up, and finally evaluating, an expression. It passes Dan's test cases, even supports some outdated formats (backwards compatible ;-)
``` en2num {four score and seven}
87
en2num {one and twenty}
21```

but still allows more than it should, so acts like a language-driven adding machine:
``` en2num {one two three}
6
en2num {fifty fifty}
100```

To fix this, one would need a kind of slots for ones, tens, and hundreds, that could be filled maximally once, and shift these for thousands, millions, ...

For translating numbers to natural languages, see also the Bag of number/time spellers