Updated 2012-08-31 17:42:46 by RFox

Richard Suchenwirth 2004-02-11 - Here's a routine for formatting non-negative integers (e.g. file sizes), so that the resulting field width does not exceed 5 characters, of which max. three significant digits, and K(ilo = 1024), M(ega = 1024^2) etc. modifiers are used for scaling. This may be useful on small displays.
 proc fixform n {
    if {wide($n) < 1000} {return $n}
    foreach unit {K M G T P E} {
        set n [expr {$n/1024.}]
        if {$n < 1000} {
            set n [string range $n 0 3]
            regexp {(.+)\.$} $n -> n
            return $n$unit
        }
    }
    return Inf ;# :)
 }

HJG Test:
 proc Demo {n} { puts "$n = [fixform $n]" }

 catch {console show}
 Demo 1
 Demo 20
 Demo 300
 Demo 4000
 Demo 50000
 Demo 600000
 Demo 7000000
 Demo 80000000
 Demo 900000000
 Demo 1000000000
 Demo 11000000000
 Demo 120000000000
 Demo 1300000000000
 Demo 14000000000000
 Demo 150000000000000
 Demo 1600000000000000
 Demo 17000000000000000
 Demo 180000000000000000
 Demo 1900000000000000000

Testing:
 % fixform 1
 1
 % fixform 10
 10
 % fixform 100
 100
 % fixform 1000
 0.97K
 % fixform 10000
 9.76K
 % fixform 100000
 97.6K
 % fixform 1000000
 976K
 % fixform 10000000
 9.53M
 % fixform 100000000
 95.3M
 % fixform 1000000000
 953M

In the Tcl chatroom I was informed that according to http://physics.nist.gov/cuu/Units/binary.html , the units based on powers of 2 should be abbreviated
 Ki Mi Gi Ti Pi Ei

but I haven't seen that used yet... change if you wish.

DKF: Some standards are just destined to be ignored...

slebetman: The standard is no longer being ignored. New RFC and IEEE documents now use the proper SI form. The documentation for the Linux kernel including comments in the code also use the proper form (kiB = 1024 bytes, kB = 1000 bytes). Still some people insist on ignorance: Americans are still ignoring metres and grams.

Shouldn't the tests be
  if {wide($n) < 1024} {...

and
  if {$n < 1024} {...

slebetman: Here's a version supporting both binary and decimal units. It defaults to decimal but may be given the option -binary to change it:
 proc fixform {args} {
    set n [lindex $args end]
    set div 1000.0
    switch [llength $args] {
      1 {}
      2 {
        if {[lindex $args 0] == "-binary"} {
          set div 1024.0
        }
      }
      default {
        error {wrong # args: should be "fixform ?-binary? number"}
      }
    }
    if {wide($n) < $div} {return $n}
    foreach unit {k M G T P E} {
        set n [expr {$n/$div}]
        if {$n < $div} {
            set n [string range $n 0 3]
            regexp {(.+)\.$} $n -> n
            return $n$unit
        }
    }
    error "number is too large"
 }

 # Test:
 fixform 50000
 fixform -binary 50000

EKB A slight modification to include the "i":
 proc fixform {args} {
    set n [lindex $args end]
    set div 1000.0
    set c ""
    switch [llength $args] {
      1 {}
      2 {
        if {[lindex $args 0] == "-binary"} {
          set div 1024.0
          set c "i"
        }
      }
      default {
        error {wrong # args: should be "fixform ?-binary? number"}
      }
    }
    if {wide($n) < $div} {return $n}
    foreach unit {k M G T P E} {
        set n [expr {$n/$div}]
        if {$n < $div} {
            set n [string range $n 0 3]
            regexp {(.+)\.$} $n -> n
            return $n$unit$c
        }
    }
    return Inf ;# :)
 }

See also a simple download progress widget which contains code similar to RS's original proc (it uses 1024bytes == 1KB etc).

See also Engineering Notation for numbers smaller than 1.

MB : There are limitations while processing large integers with the "expr" subcommands. These limitations are so that processing large files may generate problems in the algorithm. On my Tcl 8.4 installation, the epta byte file size cannot be computed with the current version of "fixform". If I type :
  fixform 10000000000000000000

it returns :
  10000000000000000000

as expected, since the "expr" command is so that the "wide" operator :
  expr {wide(10000000000000000000)}

returns :
  -8446744073709551616

My Tcl manual says that "All internal computations involving integers are done with the C type long, and all internal computations involving floating-point are done with the C type double.". If I suppose that my current system is based on 32 bits long integers, that means that the expr command cannot process files larger than 2 GB. See the "hrfilesize" package for a solution based on the Tcllib package "bigfloat".

[BEO] Faster versions:
 proc format_1024_units {value} {
    set len [string length $value]
    if {$value < 1024} {
      format "%s B" $value
    } else {
      set unit [expr {($len - 1) / 3}]
      format "%.1f %s" [expr {$value / pow(1024,$unit)}] [lindex \
        [list B KiB MiB GiB TiB PiB EiB ZiB YiB] $unit]
    }
 }

 proc format_1000_units {value} {
    set len [string length $value]
    if {$value < 1000} {
      format "%s B" $value
    } else {
      set unit [expr {($len - 1) / 3}]
      format "%.1f %s" [expr {$value / pow(1000,$unit)}] [lindex \
        [list B KB MB GB TB PB EB ZB YB] $unit]
    }
 }

See also


RFox - 2012-08-31 17:42:46

" Americans are still ignoring metres and grams"

We don't ignore them...we just don't use them ;-)