Updated 2017-10-28 22:54:43 by bll

Based on the code in wiki-reaper, I put together a wish interface so that I could run it on my Windows system. I was able to use it to reap the TkMapper code automagically (quite an impressive program by the way).

GPS - The reaping of the Wiki seems dangerous to me. Perhaps you should add checks for strings like "file delete" (possibly a glob del* pattern if file uses strncmp), and "rm "

escargo 5 Jun 2003 - If you want to be safe, then you could do any textual analysis on the code before you run it. Reaping only collects the code; you don't have to execute it if you do not want to. You can always read the code on the Wiki before you reap it, since "As you see, so shall ye reap." - RS 2004-03-06: Also, Tcl has safe interpreters for exactly this purpose: running untrusted code without danger...

Many of the applications in Category Whizzlet are reapable and entertaining.

Pages that it can reap successfully include:

Pages that it cannot reap successfully include:

  • A little file searcher (3751, two program versions in one page that don't play nice together)
  • Click me (8408, two program versions in one page that don't play nice together)
  • cursors (1425, more than one version on the page that don't play nice together)

cannot reap successfully means that the file resulting from reaping the specified page will not run when started from a Windows file explorer without being edited. The contents of the page will still be collected.

RS: Hm - the second time you "misreaped" a page of mine.. In the case of Click me, it's so ridiculously small, that it doesn't deserve splitting up; a little file searcher will split off A little file searcher (iPaq) ;-)

escargo 16 Mar 2003 - The code for a little file searcher can be reaped, but I can't run it under Windows at least because it assumes a command line interface. - RS has a command line interface on Windows, the infamous DOS prompt ;-)

This program is certainly able to be improved, but it works for now.
 #!/usr/bin/env wish
 # wish-reaper --
 # This script replaces the command line interface of wiki-repeater.tcl
 # with a wish GUI.
 
 # Create a user interface to collect the necessary parameters
 # and control actions.
 # We need an entry for a number field (the page to reap).
 #    Page 6277 provides TkMapper.
 #    Page 4718 prvides the original wiki-reaper.
 # We need an entry for a filename field (to write the page to).
 # We need a button to tell when to do the job.
 # We need a button to tell when to quit.
 
 wm title . "Wish Reaper"
 set font {Helvetica 14}
 frame .reaper
 pack .reaper -expand 1 -fill both -side left
 label .reaper.banner -font $font -text "Wish Reaper"
 pack .reaper.banner -expand 1 -fill x -side top
 
 frame .reaper.data
 pack .reaper.data -expand 1 -fill both -pady 5 -side top
 label .reaper.data.page -anchor w -justify left -text "Page Number"
 grid .reaper.data.page -column 1 -row 1 -sticky w
 entry .reaper.data.pageno -textvariable pageno
 grid .reaper.data.pageno -column 2 -row 1
 label .reaper.data.file -anchor w -justify left -text "Output file name"
 grid .reaper.data.file -column 1 -row 2 -sticky w
 entry .reaper.data.filename -textvariable filename
 grid .reaper.data.filename -column 2 -row 2
 button .reaper.data.browse -text Browse... -command [list select_file filename]
 grid .reaper.data.browse -column 3 -row 2
 
 button .reaper.reap -command reap_now -padx 4 -pady 4 -text Reap
 pack .reaper.reap -side left
 button .reaper.quit -command exit -padx 4 -pady 4 -text Quit
 pack .reaper.quit -side right
 foreach {pageno filename} $argv break
 focus .reaper.data.pageno
 .reaper.data.pageno icursor end

 if {[catch {package require Tcl 8.3}]} {
     tk_messageBox -icon error \
         -message "Tcl version 8.3 or later is required." \
         -type ok -parent .reaper
     exit
 }
 
 if {![catch { package require nstcl-html }] &&
     ![catch { package require nstcl-http }]} {
     namespace import nstcl::*
 } else {
     # Catch a failure here and pop up a dialog box before exiting.
     # A more precise statement about which version of http is needed
     # would be helpful. http::cleanup needs to be provided.
     if {[catch {package require http}]} {
         tk_messageBox -icon error \
             -message "Package http not found." \
             -type ok -parent .reaper
         exit
     } else {
        catch {package require autoproxy}
     }
     
     proc ns_geturl {url} {
         set conn [http::geturl $url]
         set html [http::data $conn]
         http::cleanup $conn
         return $html
     }
     
     proc ns_striphtml {-tags_only html} {
         regsub -all -- {<[^>]+>} $html "" html
         return $html ;# corrected a typo here
     }
     
     proc ns_urlencode {string} {
         set allowed_chars  {[a-zA-Z0-9]}
         set encoded_string ""
         
         foreach char [split $string ""] {
             if {[string match $allowed_chars $char]} {
                 append encoded_string $char
             } else {
                 scan $char %c ascii
                 append encoded_string %[format %02x $ascii]
             }
         }
         
         return $encoded_string
     }
 }
 
 proc output {data} {
     global out
     # we don't want to throw an error if output channel has been closed
     catch { puts $out $data }
 }
 
 # Factoring out the postamble for clarity and ease of change.
 proc postamble { } {
     output \n
     output "# EOF"
     output \n
 }
 
 # Factoring out the preamble for clarity and ease of change.
 proc preamble {title url now updated} {
     output "#####"
     output "#"
     output "# \"$title\""
     output "# [string map [list mini.net/tcl wiki.tcl.tk] $url]"
     output "#"
     output "# Tcl code harvested on:  $now GMT"
     output "# Wiki page last updated: $updated"
     output "#"
     output "#####"
     output \n
 }
 
 proc reap {page} {
     set url  http://wiki.tcl.tk/[ns_urlencode $page]
     set now  [clock format [clock seconds] -format "%e %b %Y, %H:%M" -gmt 1]
     set html [ns_geturl $url]
     
     # can't imagine why these characters would be in here, but just to be safe
     set html [string map [list \x00 "" \x0d ""] $html]
     set html [string map [list {<pre class='sh_tcl'>} \x00 </pre> \x0d] $html]
     
     if {![regexp -nocase {<title>([^<]*)</title>} $html => title]} {
         set title "(no title!?)"
     }
     
     set re {Updated ([^G]+) by}
     if {![regexp -nocase $re $html => updated]} {
         set updated "???"
     }
     
     preamble $title $url $now $updated
     set html [ns_striphtml -tags_only $html]
     
     foreach chunk [regexp -inline -all {\x00[^\x0d]+\x0d} $html] {
         set chunk [string range $chunk 1 end-1]
         set chunk [string map [list "&quot;" \x22 "&amp;" & "&lt;" < "&gt;" >] $chunk]
         
         foreach line [split $chunk \n] {
             if {[string index $line 0] == " "} {
                 set line [string range $line 1 end]
             }
             output $line
         }
     }
     postamble
 }
 
 proc reap_now { } {
     global filename
     global out
     global pageno
     if {![string is integer -strict $pageno]} {
         # put up a dialog box here.
         tk_messageBox -icon error \
             -message "An integer page number is required." \
             -type ok -parent .reaper
     } elseif {$filename == ""} {
         # put up a dialog box here
         tk_messageBox -icon error -message "A nonempty filename is required." \
             -type ok -parent .reaper
     } elseif {[file exists $filename]} {
         # put up a dialog box here
         tk_messageBox -icon error \
             -message "File $filename is not allowed to be overwritten." \
             -type ok -parent .reaper
     } elseif {[catch {open $filename w} out]} {
         # put up a dialog box here
         tk_messageBox -icon error \
             -message "File $filename could not be opened." \
             -type ok -parent .reaper
     } else {
         # everything is good.
         reap $pageno
         close $out
         # completion dialog.
         tk_messageBox -icon info -message "Reaping complete." -type ok\
         -parent .reaper
     }
 }

 proc select_file {varName} {
    upvar $varName filename
    set filename [tk_getSaveFile -title "Save to file ..."]
 }

Created by escargo 16 Jan 2003

And my thanks to the people who keep adding reapable applications to the wiki and then pointing to them from here. -- escargo 24 Sep 2003

escargo 12 Mar 2003 - Maybe I should create a reapable category to make it easy to find all the code that can be extracted; it would be a subset of Category Application.

Luciano ES 19 July 2003 - How about this: edit all of the reapable applications' pages, adding the hidden tag <!-- reap-ready --> to all of them. Quite some work, but quite feasible too.

MJL 30 May 2005 - Fixed the regexp to extract the "page last updated" value. <i>Updated on ... didn't match the HTML. I've split this into two lines to avoid exceeding 80 characters.

daapp 07 Mar 2006 - I add optional support for http proxy and button to browse for file name.

[ATK] 04.2011 - adapted for new wiki