Updated 2012-09-28 10:50:08 by RLE

[ Scripted Compiler :: Code generation ]

Nothing concrete here yet.

References

  • compilers & interpreters archive -- runtime code generation subject area [1]
  • compilers & interpreters archive -- general code generation techniques subject area [2]
  • Cameron Laird's Regular Expressions Column [3] talked in April 2004 about 'Rapid Development of An Assembler Using Python' [4]. Also coded in Python, and therefore scriptable in it, is Atul Varma's "Mini-C Compiler" [5].
  • See Playing Assembler as well.
  • And GPS' IA-32/x86 assembler in Tcl.
  • CorePy: Synthetic Programming in Python [6]. For Python, geared towards numeric work, and also restricted to Cell/AltiVec, nevertheless interesting.

Keyword: BURS - Bottom Up Rewrite System - Related to LR parsing.

Discussions

AK: The runtime assembler/code-generator Lightning (written in C) is GPL, the concept however can be used to generate machine code from within Tcl, plus some bits to write ELF, COFF, etc. libraries the low-level backend would be done.

jcw: Too much machinery - I'd generate raw into a bytearray. Store it that way too. ELF is for linkers and other arcane concepts <wink>

AK: Do not forget pre-compiling of binaries. Even if the compiler is in an extension I do not want to deliver it with every application. Especially not wrapped applications..

jcw: Absolutely. Which is why I said: "store it that way too".

AK: Hm, sort of like tbcload, but machine code ? ... A mixture of bytecode and machine code might be of interest. A loader format containing binary data in a script language wrapper ... Oh, I just remember Oberon's Slim Binaries. These store a portable AST of the input (not bytecode) and use a simple compiler/mapper to machine code when loading. Lower-level than Tcl bytecodes, and are still as efficient as machine code, because in the end they are machine-code. I think I will start a new thread below to collect and merge the thoughts into something cohesive.

DKF: Irrespective of what's stored on the disk, I'd do a two-phase code generation. It makes instruction-level optimization much easier if you don't have to worry about what you're going to do about relative jump offsets.

Related software

  • Softwire @ http://softwire.sourceforge.net/ (LGPL).
  • See also tcc, which can compile and run C in core, but also knows about ELF (LGPL).
  • Small Device C Compiler @ http://sdcc.sourceforge.net/ (GPL).
  • QEMU [7]. Actually a CPU emulator, but it seems to use runtime generation of native machine code for code to emulate to push performance. This also reaches into the direction of stuff like TransMeta's JIT to emulate x86 on their VLIW processor.

A Tcl-coded assembler that processes input with this sort of appearance
             ld   r3 0x44
             div  r1 r6
    label:  call subr_a
      ...

looks to be the fun sort of thing RS turns out in a weekend, and having Tcl computational capabilities at hand would make it actually useful. Think how much nicer such an assembler would be just in its "macro" functionality.

RS replies: Sure enough - part of it is since long at Playing Assembler, but that executes the code directly instead of generating real machine code. Feel free to modify it! As I haven't done real assembler for 10 years now, it would involve a bit more work for me than usual for a fun project - and I remember from x86 ASM how easy one can crash the box :(