Updated 2016-11-22 02:04:32 by gam

ELF, or Executable and Linkable Format, widely used on UNIX systems, is a platform-independent binary format for object files, libraries and executables.

See Also  edit

Object Dive
Extracts symbols from objects, producing a graph of relationships.

elfdecode is a package to read an ELF file and query its components. It supports 32 and 64 bit ELF files.

Reading an ELF file * edit

AMG: Here's code to dump the contents of an ELF file, by section. At present, it only supports 32-bit little-endian ELF, since I don't have any other kind of ELF file on hand to check it against. It uses code adapted from Dump a file in hex and ASCII. There are many section attributes and other headers it could print, but doesn't; it's easy to add support for what you need. It might not work on stripped binaries.

Reference: [1]
#!/usr/bin/env tclsh

if {[llength $argv] == 0 || [llength $argv] > 2} {
    puts stderr "Usage: [file tail $argv0] FILENAME ?PATTERN?"
    puts stderr "FILENAME: Name of a 32-bit LE ELF file to dump"
    puts stderr "PATTERN: Glob-style section name match pattern"
    puts stderr "All section names are printed if PATTERN is omitted"
    exit 1
}

proc hex {data} {
    set result ""
    for {set i 0} {$i < [string length $data]} {incr i 16} {
        set row [string range $data $i [expr {$i + 15}]]
        binary scan $row H* hex
        set hex [regsub -all {(.{4})} [format %-32s $hex] {\1 }]
        set row [regsub -all {[^[:print:]]} $row .]
        append result [format "%08x: %s %-16s\n" $i $hex $row]
    }
    string range $result 0 end-1
}

proc unsigned {bits args} {
    foreach varname $args {
        upvar 1 $varname var
        set var [expr {$var & ((1 << $bits) - 1)}]
    }
}

proc sections {chan} {
    seek $chan 32
    binary scan [read $chan 4] i shoff
    unsigned 32 shoff

    seek $chan 46
    binary scan [read $chan 12] sss shentsize shnum shstrndx
    unsigned 16 shentsize shnum shstrndx

    seek $chan [expr {$shoff + 16 + $shstrndx * $shentsize}]
    binary scan [read $chan 8] ii strtaboff strtabsize
    unsigned 32 strtaboff strtabsize

    seek $chan $strtaboff
    set strtab [read $chan $strtabsize]

    seek $chan $shoff
    set result {}
    for {set i 0} {$i < $shnum} {incr i} {
        binary scan [read $chan $shentsize] ix12ii name offset size
        unsigned 32 name offset size

        if {[string index $strtab $name] ne "\0"} {
            set end [expr {[string first \0 $strtab $name] - 1}]
            lappend result [string range $strtab $name $end] $offset $size
        }
    }
    return $result
}

proc dumpsections {filename {pattern ""}} {
    set chan [open $filename rb]
    if {[read $chan 7] ne "\177ELF\1\1\1"} {
        error "unsupported format"
    }
    foreach {name offset size} [sections $chan] {
        if {$pattern eq ""} {
            puts [format "%-32s %08x %08x" $name $offset $size]
        } elseif {[string match $pattern $name]} {
            seek $chan $offset
            puts [format "%-32s %08x %08x" $name $offset $size]
            puts [hex [read $chan $size]]
        }
    }
    close $chan
}

dumpsections [lindex $argv 0] [lindex $argv 1]

Example:
[andy@toaster|~/dwarf]$ ./dumpsections.tcl test.o
.text                            00000034 0000001c
.rel.text                        00001688 00000018
.data                            00000050 00000000
.bss                             00000050 00000000
.debug_abbrev                    00000050 000000d9
.debug_info                      00000129 00000160
.rel.debug_info                  000016a0 000000a8
.debug_line                      00000289 0000003a
.rel.debug_line                  00001748 00000008
.debug_macinfo                   000002c3 00000cd7
.debug_loc                       00000f9a 0000002c
.debug_pubnames                  00000fc6 0000002b
.rel.debug_pubnames              00001750 00000008
.debug_aranges                   00000ff1 00000020
.rel.debug_aranges               00001758 00000010
.debug_str                       00001011 0000005c
.comment                         0000106d 00000012
.note.GNU-stack                  0000107f 00000000
.debug_frame                     00001080 0000002c
.rel.debug_frame                 00001768 00000010
.shstrtab                        000010ac 000000d4
.symtab                          00001540 00000130
.strtab                          00001670 00000015
[andy@toaster|~/dwarf]$ ./dumpsections.tcl test.o .debug_str
.debug_str                       00001011 0000005c
00000000: 756e 7369 676e 6564 2069 6e74 006c 6f6e  unsigned int.lon
00000010: 676e 616d 6500 6c6f 6e67 2069 6e74 0061  gname.long int.a
00000020: 7267 7600 6172 6763 006d 6169 6e00 6368  rgv.argc.main.ch
00000030: 6172 006e 6578 7400 2f68 6f6d 652f 616e  ar.next./home/an
00000040: 6479 2f64 7761 7266 0074 6573 742e 6300  dy/dwarf.test.c.
00000050: 474e 5520 4320 342e 342e 3400            GNU C 4.4.4.

AMG: Here is code to read and modify the symbol table of a 32-bit little-endian ELF file. The modification performed is to find every common symbol (whose name is listed in an input file) and make it be undefined instead. This is done to work around some assorted strangeness with Fortran and the dynamic linker. You may find the code useful as an example of reading and writing a symbol table.
#!/usr/bin/env tclsh

package require Tcl 8.4

# Read an 8-bit unsigned value.
proc read8 {chan} {
    binary scan [read $chan 1] c result
    expr {$result & 0xff}
}

# Read a 16-bit unsigned value.
proc read16 {chan} {
    binary scan [read $chan 2] s result
    expr {$result & 0xffff}
}

# Read a 32-bit unsigned value.
proc read32 {chan} {
    binary scan [read $chan 4] i result
    expr {$result & 0xffffffff}
}

# Edit $objfile to change SHN_COMMON symbols listed in $commonfile to SHN_UNDEF.
proc common_to_undef {objfile commonfile} {
    # Read the common symbol list file.
    set chan [open $commonfile]
    set commons [lsort -unique [split [read $chan] \n]]
    if {[lindex $commons 0] eq ""} {
        set commons [lrange $commons 1 end]
    }
    close $chan

    # Open the ELF object file read/write.
    set chan [open $objfile r+]
    fconfigure $chan -translation binary -buffering none

    # Read initial header and confirm type is supported.
    # Bytes 0-3: EI_MAG0-3 "\177ELF"
    # Byte 4: EI_CLASS "\1" 32-bit object
    # Byte 5: EI_DATA "\1" ELFDATA2LSB two's complement little endian
    # Byte 6: EI_VERSION "\1" ELF 1
    if {[read $chan 7] ne "\177ELF\1\1\1"} {
        error "unsupported format; must be 32-bit little-endian ELF"
    }

    # Read section header table location, entry size, and entry count.  Also
    # read index of section name string table.
    seek $chan 32
    set shoff [read32 $chan]
    seek $chan 46
    set shentsize [read16 $chan]
    set shnum [read16 $chan]
    set shstrndx [read16 $chan]
    if {$shoff == 0} {
        error "no section header table"
    } elseif {$shentsize != 40} {
        error "bad section header table entry size $shentsize: must be 40"
    } elseif {$shstrndx == 0} {
        error "no section name string table"
    }

    # Read section name string table location and size.
    seek $chan [expr {$shoff + 16 + $shstrndx * 40}]
    set strtaboff [read32 $chan]
    set strtabsize [read32 $chan]

    # Read section name string table.
    seek $chan $strtaboff
    set shstrtab [read $chan $strtabsize]

    # Read section header table.  Search for the SHT_SYMTAB section named
    # ".symtab" and the SHT_STRTAB section named ".strtab".
    seek $chan $shoff
    for {set i 0} {$i < $shnum} {incr i} {
        # Process the section according to the name and type.
        set name [read32 $chan]
        set type [read32 $chan]
        set end [expr {[string first \0 $shstrtab $name] - 1}]
        set name [string range $shstrtab $name $end]
        if {$type == 2 && $name eq ".symtab"} {
            # Found SHT_SYMTAB named ".symtab".
            seek $chan 8 current
            set symtab_offset [read32 $chan]
            set symtab_size [read32 $chan]
            seek $chan 12 current
            set symtab_entsize [read32 $chan]
        } elseif {$type == 3 && $name eq ".strtab"} {
            # Found SHT_STRTAB named ".strtab".
            seek $chan 8 current
            set strtab_offset [read32 $chan]
            set strtab_size [read32 $chan]
            seek $chan 16 current
        } else {
            # Ignore all other sections.
            seek $chan 32 current
        }
    }
    if {![info exists symtab_offset]} {
        error "no symbol table"
    } elseif {![info exists strtab_offset]} {
        error "no symbol string table"
    } elseif {$symtab_entsize != 16} {
        error "bad symbol table entry size $symtab_entsize: must be 16"
    }

    # Read symbol string table.
    seek $chan $strtab_offset
    set strtab [read $chan $strtab_size]

    # Read symbol table.  Search for STB_GLOBAL symbols in SHN_COMMON whose
    # names match those in the common file.  Modify these symbols to have zero
    # value and be in SHN_UNDEF.
    seek $chan $symtab_offset
    for {set i 0} {$i < $symtab_size} {incr i 16} {
        # Get symbol information.
        set name [read32 $chan]
        seek $chan 4 current
        set size [read32 $chan]
        set info [read8 $chan]
        seek $chan 1 current
        set shndx [read16 $chan]
        set end [expr {[string first \0 $strtab $name] - 1}]
        set name [string range $strtab $name $end]

        # Modify selected symbols to be undefined.
        if {$info >> 4 == 1 && $shndx == 0xfff2
         && [lsearch -sorted $commons $name] != -1} {
            seek $chan -12 current
            puts -nonewline $chan \0\0\0\0
            seek $chan 6 current
            puts -nonewline $chan \0\0
        }
    }

    # Close the ELF file.
    close $chan
}

# First two arguments must be object file and common file names.
common_to_undef [lindex $argv 0] [lindex $argv 1]

# vim: set sts=4 sw=4 tw=80 et ft=tcl:

The "buffering -none" is to work around a bug in Tcl 8.4. Without it, I get this error:
andy@slack:~/elf$ ./common_to_undef.tcl test.o commons
error during seek on "file5": bad address in system call argument
    while executing
"seek $chan 4 current"
    (procedure "common_to_undef" line 94)
    invoked from within
"common_to_undef [lindex $argv 0] [lindex $argv 1]"
    (file "./common_to_undef.tcl" line 140)

Tcl 8.6 does not have this bug.

Testing...
andy@slack:~/elf$ cat test.c
#define EXIT_SUCCESS 0
struct foo {
        unsigned x, y[5], *z[2][3][4];
        struct foo *next;
} foo;
union bar {
        struct foo foo;
        long a;
        int b;
} bar;
extern int zzz;
int data = 22;
int main(int argc, const char *const *argv)
{
        foo.x = argc;
        bar.a = foo.x;
        zzz = 999;
        return EXIT_SUCCESS;
}

andy@slack:~/elf$ objdump -t test.o
test.o:     file format elf32-i386
SYMBOL TABLE:
00000000 l    df *ABS*  00000000 test.c
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l    d  .note.GNU-stack        00000000 .note.GNU-stack
00000000 l    d  .eh_frame      00000000 .eh_frame
00000000 l    d  .comment       00000000 .comment
0000007c       O *COM*  00000020 foo
0000007c       O *COM*  00000020 bar
00000000 g     O .data  00000004 data
00000000 g     F .text  00000026 main
00000000         *UND*  00000000 zzz

andy@slack:~/elf$ cat commons
foo

andy@slack:~/elf$ ./common_to_undef.tcl test.o commons

andy@slack:~/elf$ objdump -t test.o
test.o:     file format elf32-i386
SYMBOL TABLE:
00000000 l    df *ABS*  00000000 test.c
00000000 l    d  .text  00000000 .text
00000000 l    d  .data  00000000 .data
00000000 l    d  .bss   00000000 .bss
00000000 l    d  .note.GNU-stack        00000000 .note.GNU-stack
00000000 l    d  .eh_frame      00000000 .eh_frame
00000000 l    d  .comment       00000000 .comment
00000000       O *UND*  0000007c foo
0000007c       O *COM*  00000020 bar
00000000 g     O .data  00000004 data
00000000 g     F .text  00000026 main
00000000         *UND*  00000000 zzz

As shown by the second objdump, foo has become undefined.

Common and undefined symbols work almost the same way; the linker will replace either with a defined symbol if it finds one. But if it doesn't, the behaviors differ when making a shared object (dynamic library). Undefined symbols remain undefined, but common symbols become BSS symbols. For my application, I require undefined symbols, but Fortran does not have "extern" variables, only commons, so I add this feature using the above script.

If anyone reading this says I shouldn't want Fortran code to communicate via commons across shared objects, you are right, I shouldn't want this, and I don't. But that doesn't matter; I am constrained by circumstances I can't change.