Summary edit
HJG 2016-02-02: This is another attempt at quick, easy, ad-hoc printing of plain textfiles.I often have some informations in a textfile, and if I need to print that, I want a nice looking page,e.g. with a few headers, some text in bold etc., but I don't want to use an 'Office'-textprocessor for that.The idea is to convert that textfile to a html-file, then print that with the webbrowser.The basic operation of that converter is to copy the inputfile x.txt to x.html,
add some lines like "<html>" "<head>", "<body>" etc.,
then wrap the first line of text in <h1>-tags to get a big header,
and put the rest of the textlines in <pre>- or <p>-tags.
Also, replace some chars (like &, <, >) with html-entities.Then drop the resulting file x.html into the webbrowser and do print-preview / print.
Or, with a fixed location for the output-file, use a bookmark in the browser.Add some CSS to taste, and extent the converters "basic operation"
to cover more markup (headers, lists, etc.), as need arises.There are some programs available that work like that, e.g. Markdown.
But Markdown uses Perl, and I want even more minimal markup.
With ideas and code from the following pages:
- Markdown
- A little demon - Script that can work as a print-spooler.
- A grep-like utility
- Scan and modify text files
- Wiki format to HTML
- ...
Code edit
# EasyTextPrint012.tcl - HaJo Gurt - 2016-02-14 # http://wiki.tcl.tk/42409 set progVersion "EasyTextPrint v0.12" puts "# $progVersion" set fn1 "Todo.txt" set fn1 "City.txt" set msg "$progVersion - Select inputfile" set fn1 [tk_getOpenFile -title $msg \ -filetypes {{TEXT .txt} {"All files" *}} \ -defaultextension .txt -initialfile $fn1] set fn2 "EasyTextPrint.html" set LineNr 0; # count non-comment lines from inputfile set Skip 0 set Title "EasyTextPrint" set Cmd "Hdr" set H 1 set Prev "H"; set Default "p"; # Default: wrap inputline in paragraph-tags catch {console show} catch {wm withdraw .} proc e {} { exit } proc q {} { exit } #---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+--- proc repl {T S1 S2} { set p1 [string first $S1 $T] set p2 [expr { $p1 + [string length $S1] -1 } ] set T2 [string replace $T $p1 $p2 $S2] return $T2 } proc tagReplace {T0 S1 S2 S3} { #: change "**bold**" to "<b>bold</b>", etc. set p1 [string first $S1 $T0] set p9 [string last $S1 $T0] if {$p1==$p9} {return $T0}; # only 1 tag found incr p9 -1 if {$p1==$p9} {return $T0} set T1 [repl $T0 $S1 $S2 ] set T2 [repl $T1 $S1 $S3 ] incr ::Changes return $T2 } proc Out {T} { puts $::fh2 $T } proc Head {T} { Out "<!DOCTYPE HTML>" Out "<html>" Out "<HEAD>" Out "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />" Out "<style type=\"text/css\"> " Out "* {" Out " margin: 0;" Out " margin-left: 10px;" Out " padding: 0; }" Out "body {" Out " background: silver;" #Out " font-family: Verdana, Helvetica, sans-serif;" Out " font-family: \"DejaVu Sans\", Helvetica, sans-serif;" Out " font-size: 12px; }" Out "h1,h2,h3,h4,h5,h6,p,ul,li,hr,blockquote {" Out " padding: 1px;" Out " background: #eeEEee; } " Out "h1 { background: #ffFF80; text-align: center; } " Out "h2 { background: #80FFFF; text-decoration: underline; } " Out "h3 { background: #80FF80; } " Out "h4 { background: #FF8080; } " Out "li { margin-left: 13px; }" Out "blockquote {" Out " margin-top: 2px; margin-right: 16px; margin-bottom: 2px; margin-left: 24px; }" Out "code,kbd {" Out " font-family: \"Lucida Console\", \"DejaVu Sans Mono\", monospace;" Out " font-size: 10px; background: orange; }" #Out " ...more style-css..." Out "</style>" Out "<TITLE>$T</TITLE>" Out "</HEAD>\n" ; } proc Footer {} { Out "</BODY>" Out "</HTML>" } #---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+--- proc main {} { #... return } puts "# Read file $fn1 ..." if {![file exists $fn1] || [catch { set fh1 [open $fn1 r] } ] } { puts "# Error: open $fn1" return 1 } puts "# Write to file $fn2 ..." set fh2 [open $fn2 w] ;# w / w+ update set i 0 while {![chan eof $fh1]} { ;# needs Tcl 8.5 gets $fh1 line incr i 1 #puts "$i $Cmd : ($line)"; ## set c1 [string index $line 0] set c2 [string range $line 0 1] set c4 [string range $line 0 3] if {$c4 eq "##__"} { set Cmd "EOF"; break; } if {$c4 eq "##++"} { set Skip 1; continue; }; # Skip following text / Stop printing if {$c4 eq "##(("} { set Skip 1; continue; }; # if {$c4 eq "##--"} { set Skip 0 }; # Continue printing if {$c4 eq "##))"} { set Skip 0 }; # if {$Skip > 0} { continue }; if {$c4 eq "##::"} { set Default [string range $line 4 end] }; if {$line eq "" } { Out $line; continue }; if {$c1 eq "\t"} { Out $line; continue }; if {$c4 eq "##!!"} { Out "<DIV style=\"page-break-after:always\"></DIV>"; set Cmd "FF"; continue }; if {$c1 eq "#" } { continue }; # comment if {$line eq "_" } { Out " "; continue }; if {$c2 eq "^^" } { set ::H 1; set Cmd "Hdr"; continue } if {$c2 eq "==" } { set ::H 2; set Cmd "Hdr"; continue } if {$c2 eq "--" } { set ::H 3; set Cmd "Hdr"; continue } set line [string map {< < > > & & – ‐ } $line] set line [string map {Ä Ä Ö Ö Ü Ü ² ² } $line] set line [string map {ä ä ö ö ü ü ß ß } $line] set line [string map {é é è è ê ê ç ç} $line] if {$LineNr == 0} { Head $line Out "<BODY>" } set Changes 1 while {$Changes>0} { set Changes 0 set line [tagReplace $line "**" "<B>" "</B>"] set line [tagReplace $line "//" "<I>" "</I>"] set line [tagReplace $line "__" "<U>" "</U>"] set line [tagReplace $line "%%" "<center>" "</center>"] } set line1 [string range $line 1 end ] if {$c1 eq " "} { Out "<PRE>$line1</PRE>"; continue }; if {$Cmd eq ""} { if {$c1 eq "*"} { Out "<UL><LI>$line1</LI></UL>"; continue }; if {$c1 eq ">"} { Out "<BLOCKQUOTE>[string range $line 4 end ]</BLOCKQUOTE>"; continue}; } else { if {$Cmd eq "Hdr"} { if {$Prev ne "H"} { Out "<hr>\n" } Out "<H$H>$line</H$H>" incr LineNr; set Prev "H" set Cmd "" continue } } #Out "<P>$line</P>"; # Default: wrap inputline in paragraph-tags Out "<$Default>$line</$Default>"; set Prev "P" }; # while if {$LineNr > 0} { Footer } close $fh1 close $fh2 puts "# Output written to file: $fn2" puts "# Done." #exit #---+----1----+----2----+----3----+----4----+----5----+----6----+----7----+--- #.
Code - awk edit
I did a first prototype of this program using awk, and this script already has the 'basics' plus a few additional features implemented:#!/usr/bin/awk -f # txt2html.awk - gurt.gmx@de - 2016-02-15 # #: Read plain text, output as html, marked up for printing via webbrowser #: Markup - String at start of line determines type of header in next line: # ^^ H1-header in next line (implicit just before first line of inputfile) # == H2-header in next non-comment, non-blank line # -- H3-header in next non-comment, non-blank line # Usage: # gawk -f txt2htm.awk Tel.txt # gawk -f txt2htm.awk City.txt > City.html # See also: https://css-tricks.com/almanac/properties/p/page-break/ # #-##+####1####+####2####+####3####+####4####+####5####+####6####+####7####+### # function chr(c) \ { return sprintf( "%c", c+0 ); # make c numeric by adding 0 } BEGIN { Q1 = "'"; Q2 = "\""; # Quotes A = "\\&"; LineNr = 0; Skip = 0 Title = "EasyTextPrint" Cmd = "Hdr"; H = 1; Prev = "H"; } function Head(T) \ { print("<!DOCTYPE HTML>") print("<html>") print("<HEAD>") print("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />") print("<style type=\"text/css\"> ") print("* {") print(" margin: 0;") print(" margin-left: 10px;") print(" padding: 0; }") print("body {") print(" background: silver;") print(" font-family: verdana, helvetica, sans-serif;") print(" font-size: 12px;") print("}") print("h1,h2,h3,h4,h5,p,ul,li,hr {") print(" padding: 1px;") print(" background: #eeEEee; } ") print("h1 { background: #ffFF80; text-align: center; } ") print("h2 { background: #80FFFF; text-decoration: underline; } ") print("h3 { background: #80FF80; } ") print("h4 { background: #FF8080; } ") #print(" ...more style-css...") print("</style>") print("<TITLE>" T "</TITLE>") print("</HEAD>\n"); return } /^##__/ { exit } /^##!!/ { print "<DIV style=\"page-break-after:always\"></DIV>"; Cmd="FF"; next } /^_$/ { print " "; next } /^##\++/ { Skip=1 } ##++ skip /^##--/ { Skip=0 } ##--Start-- Skip>0 { next } /^#/ { next } /^\^+/ { Cmd = "Hdr"; H=1; next } /^==/ { Cmd = "Hdr"; H=2; next } /^--/ { Cmd = "Hdr"; H=3; next } NF<1 { print; next } { gsub( "&", A"amp;"); } { gsub( "<", A"lt;" ); } { gsub( ">", A"gt;" ); } { gsub( "Ä", A"Auml;" ); } { gsub( "Ö", A"Ouml;" ); } { gsub( "Ü", A"Uuml;" ); } { gsub( "ä", A"auml;" ); } { gsub( "ö", A"ouml;" ); } { gsub( "ü", A"uuml;" ); } { gsub( "ß", A"szlig;"); } { gsub( "²", A"sup2;"); } { gsub( "–", A"dash;"); } #{ gsub( "-", A"ndash;"); } # { sub( "[*][*]", "<B>"); } { sub( "[*][*]", "</B>"); } { sub( "//", "<I>"); } { sub( "//", "</I>"); } { sub( "__", "<U>"); } { sub( "__", "</U>"); } { sub( "%%", "<center>"); } # ^^ { sub( "%%", "</center>"); } /^ / { print("<pre>" $0 "</pre>" ); next } LineNr==0 { Title = $0; LineNr++; Head(Title); print("<BODY>"); #next } Cmd=="Hdr" { Hdr = $0; LineNr++; Cmd=""; if (Prev!="H") { print("<hr>\n"); } print("<h" H ">" Hdr "</h" H ">"); # H1..H3 Prev="H"; next } /^\*/ { T = $0; T = substr( $0,2 ); print("<UL><LI>" T "</LI></UL>" ); Prev="u"; next } { print("<p>" $0 "</p>" ); Prev="p"; next } # { print } END { # print "# Done." print("</BODY>") print("</html>") } #.
Input edit
This is an example of a plain textfile used as input.It will show pretty much all features implemented for now, along with some of the more common special chars.With the 'slimlined' CSS above, the result should be 2 printed pages(DIN A4, with margins set at 10mm left and right, and at 6 mm for top and bottom).
# comment - This is the file: City.txt # 2* H1-header: Großstädte in Deutschland ^^ Kommunalverband besonderer Art == ##++ skip: don't print the following lines of text, until reaching a line starting with "##--" # Test1: == Test-H2 Umlaute: < ÄÖÜ & äöüß > Textstyle: **bold** //italic// __underline__ **bold2** //italic2// __underline2__ *** /// -- Test-H3 Text-Paragraph Text=P Text-Pre Text=Pre * Text-UL * Text=UL > Text-BQ -- Jäger, Müller & Förster GmbH & Co. KG Erzhäuser Straße. 90, 88662 Überlingen Tel. 07773 74 75 76 Internet: www.nospam.de - [email protected] -- Lorem ipsum # show blockquote, and wrapping of long lines >ubique nostro singulis in vix, vis eu doctus scripserit ullamcorper. His quidam detraxit referrentur ei, affert adolescens intellegam sea in. Eros phaedrum imperdiet vim ei, ex amet voluptatum efficiendi eos, nihil sanctus intellegebat at nec. Adipisci theophrastus ei duo, eos cu conceptam percipitur, an dicta eripuit similique his. Graeci convenire in sit, eum errem laoreet ancillae ut, qui at facilisi periculis. ##-- Start/continue printing here == Niedersachsen -- Göttingen Niedersachsen Einwohner: 117.665 Postleitzahlen: 37001–37099 Vorwahl: 0551 Kfz-Kennzeichen: GÖ 37083 Göttingen -- Hannover Niedersachsen Höhe: 55 m ü. NHN Fläche: 204,14 km² Einwohner: 523.642 Postleitzahlen: 30159–30659 Vorwahl: 0511 Kfz-Kennzeichen: H 30159 Hannover -- == Baden-Württemberg -- Reutlingen Baden-Württemberg Regierungsbezirk: Tübingen Landkreis: Reutlingen Einwohner: 112.452 Postleitzahlen: 72760–72770 Vorwahlen: 07121, 07072 und 07127 Kfz-Kennzeichen: RT 72764 Reutlingen -- == Saarland -- Saarbrücken Saarland Einwohner: 180.047 Postleitzahlen: 66001–66133 Vorwahlen: 0681, 06893, 06897, 06898, 06805, 06806, 06881 Kfz-Kennzeichen: SB 66111 Saarbrücken -- ##!! page-break == Nordrhein-Westfalen -- Aachen Nordrhein-Westfalen Einwohner: 243.336 Postleitzahlen: 52056–52080 Vorwahlen: 0241, 02403, 02405, 02407, 02408 Kfz-Kennzeichen: AC, MON 52062 Aachen -- Bergisch Gladbach Nordrhein-Westfalen Einwohner: 109.697 Postleitzahlen: 51427–51469 Vorwahlen: 02202, 02204, 02207 Kfz-Kennzeichen: GL 51465 Bergisch Gladbach -- Moers Nordrhein-Westfalen Einwohner: 102.923 Postleitzahlen: 47441–47447 Vorwahl: 02841 Kfz-Kennzeichen: WES, DIN, MO 47441 Moers -- Neuss Nordrhein-Westfalen Einwohner: 152.644 Postleitzahlen: 41460–41472 Vorwahlen: 02131, 02137, 02182 Kfz-Kennzeichen: NE, GV 41460 Neuss -- Paderborn Nordrhein-Westfalen Einwohner: 145.176 Postleitzahlen: 33098–33109 Vorwahlen: 05251, 05252, 05254, 05293 Kfz-Kennzeichen: PB, BÜR 33098 Paderborn -- Recklinghausen Nordrhein-Westfalen Einwohner: 114.147 Postleitzahlen: 45601–45665 Vorwahl: 02361 Kfz-Kennzeichen: RE, CAS, GLA 45657 Recklinghausen -- Siegen Nordrhein-Westfalen Einwohner: 100.325 Postleitzahlen: 57072–57080 Vorwahlen: 0271, 02732 (Meiswinkel), 02737 (Feuersbach) Kfz-Kennzeichen: SI, BLB 57072 Siegen == Code ##::kbd # this cannot have whitespace at start of line (that would result in <pre>-formatted text): awk '{sub(/[ \t]+$/,"")}; 1'; # delete trailing whitespace ##::P # back to standard <p>-paragraphs Done :-) -- _ Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi Hi _ %%End%% ##__EOF__ don't print this bla blah
Comments edit
HJG 2016-02-13: Change of plan: there is no need to use H6 as pagebreak, and I want to use all the headers H1,H2,H3 directly.The demo-inputfile has been modified.
Markup
- # : Comments: lines starting with a '#' don't get printed.
- ## : Commands: some special comments are used as commands:
- ##__ : End-of-file. Stop printing, end the program.
- ##!! : Pagebreak. Continue printing on a new page.
- ##++ : Start-marker: pause printing, and skip the following lines, until the endmarker '##--' is found.
- ##-- : Endmarker: resume printing.
- the same: ##(( ...ignore lines... ##))
- ##::kbd : Set default-tag for wrapping lines (standard is 'p', for paragraph). Only a single tag please!
- The first non-comment line of the textfile will be used as title and H1-header.
- ^^ : The text in the following (non-comment, non-blank) line will be used as a H1-header.
- == : Dito, H2-header to follow.
- -- : Dito, H3-header to follow.
The lines after that header will be formatted as 'normal' text.
Normal text gets wrapped in <p>-tags (can be changed via '##::', e.g. to q, kbd, code, pre). - Textstyles: **bold** //italic// __underline__ %%centered%%
- Lines starting with a blank: the line gets wrapped in <pre>-tags ==> preformated text
- Lines starting with a '*' : the line gets wrapped in <UL><LI>-tags ==> unnumbered list
- Lines starting with a '>' : the line gets wrapped in <BLOCKQUOTE>-tags ==> text is indented
- A line with a single '_' : it gets replaced with a ==> blank line
Features
- Comments, <pre>, <UL>, H1..H3, EOF, and skip-ranges are extensions to the "basic operation" of the converter.
- Blank lines are not used for headers. The formatting of the inputfile can be as spaced-out as you want.
- Markup for **bold**, italic, underline is done only when a pair of '**' etc. is found on the same line.
So, a single '***' or '///' remains unchanged. - The special chars I use most commonly are replaced with html-entities (ÄÖÜ, dashes, etc.) - Easy to extend.
- Textsize, line-height, margins, padding are set to minimal values, to fit as much text on a page as possible.
To see how much space a normal print would need, use the browsers's "Inspect element", and uncheck 'margins' in the Rules-tab. - Light background-colors, to show the structure of the text - and to make it easy to spot errors...
- Pagebreak is a CSS-feature that only works when printing.
To see the position of the break, change the empty DIV, e.g. to '<DIV style="page-break-after:always">-</DIV>'. - Print-Preview in the browser allows to customize headers and footers, e.g. filename, pagenumbers, etc.
Quirks & Todo
- No ordered-lists: I rarely use these, so I have no plans to implement them here, and I wanted '#' as comment-char.
- No links, no images, no forms. Well, this is for printing fairly short notes etc., not for browsing.
- (low-priority todo)
- Center: uses the obsolete tag '<center>'.
Also, I wanted the markup as '^^center^^', but ^ is a very special char - This might get fixed. - Unnumbered-lists: only first level is supported for now - Todo.
- No tables - Todo.
- More ideas/todos:
- detect and underline links and eMails.
- 2 or 3 columns, to fit more short text-snippets on a single page - without organizing them into a table.
See also:
- Printing text files under Windows - several ideas to print via copy/type, or external programs - such as notepad or printraw etc.
Peter Newman suggested to use the webbrowser, and wrap the text in <PRE>-tags. - How do I read and write files in Tcl
- Additional file commands
- Text processing tips
- awk
- https://css-tricks.com/almanac/properties/p/page-break/
- http://www.w3schools.com/css/css_inline-block.asp - Float
- http://www.w3schools.com/css/css3_multiple_columns.asp
- [..]