Summary edit
HJG: Someone has uploaded a lot of pictures to Flickr, and I want to show them someplace where no internet is available.The pages at Flickr have a lot of links, icons etc., so a simple recursive download with e.g. wget would fetch lots of unwanted stuff. Of course, I could tweak the parameters for calling wget (-accept, -reject, etc.), or get the html-pages, then filter their contents with awk or perl, but doing roughly the same thing in Tcl looks like more fun :-) Moreover, with a Tcl-script I can also get the titles and descriptions of the images.So the first step is to download the html-pages from that person, extract the links to the photos from them, then download the photo-pages (containing titles and complete descriptions), and the pictures in the selected size (Thumbnail=100x75, Small=240x180, Medium=500x375, Large=1024x768, Original=as taken).Then we can make a Flickr Offline Photoalbum out of them, or just use a program like IrfanView [1] to present the pictures as a slideshow.Code edit
This is the beta-version of the downloader: #!/bin/sh
# Restart with tcl: -*- mode: tcl; tab-width: 4; -*- \
exec wish $0 ${1+"$@"}
# FlickrDownload.tcl - HaJo Gurt - 2006-01-20 - http://wiki.tcl.tk/15303
#: Download webpages and images for a photo-album from flickr.com
#
# 2005-11-22 First Version
# 2005-11-23 entry
# 2005-11-24 checkbuttons
# 2005-11-25 save data to file
# Todo:
# * Save infos to file for next stage (album-maker)
# * expand Analyze1 to recognize set-pages, search-result-pages etc.
# * Bug: !! End of Multiline-Descriptions not detected
# * ?? FetchImage: check status
package require Tk
package require Img
package require http
proc Init {} {
#: Initialize Values
global Prg Const
set Prg(Title) "Flickr-Download"
set Prg(Version) "v0.32"
set Prg(Date) "2006-01-26"
set Prg(Author) "Hans-Joachim Gurt"
set Prg(Contact) [string map -nocase {: @ ! .} gurt:gmx!de]
set Prg(About) "Download pictures from a photo-album at Flickr.com" ;#:
#%%
set Const(Prefix1) "s"
#set Const(Prefix1) "page" ;# page01.html
set Const(Datafile) slides.txt
}
#########1#########2#########3#########4#########5#########6#########7#####
proc Print { Str {Tag ""} } {
#: Output to text-window
#puts $Str
.txt1 insert end "\n"
.txt1 insert end "$Str" $Tag
.txt1 see end ;# scroll to bottom
update
}
proc Log { Str {Tag ""} } {
##: Debug-Output
#Print $Str $Tag ;##
}
proc ShowOpt {} {
##: Debug: Show Options
global Opt
return ;##
Print ""
foreach key [array names Opt] {
Print "$key : $Opt($key)"
}
}
#########1#########2#########3#########4#########5#########6#########7#####
proc GetPage { url } {
#: Fetch a webpage from the web
set token [::http::geturl $url]
set page [::http::data $token]
::http::cleanup $token
return $page
}
proc FetchImage { url fname } {
#: Fetch a picture from the web
# See also: [Polling web images with Tk]
#puts -nonewline "Fetch: \"$url\" "
Print "Fetch: \"$url\" " DL
set Stat "skip"
## %% Deactivate for offline-testing:
if 1 {
set f [open $fname w]
fconfigure $f -translation binary
set imgtok [http::geturl $url -binary true -channel $f]
#set Stat [::http::error $imgtok]
set Stat [::http::status $imgtok]
# ?? Errorhandling ??
flush $f
close $f
http::cleanup $imgtok
}
Print " Status: $Stat " Ok ;# ?? true status
}
#########1#########2#########3#########4#########5#########6#########7#####
proc Analyse1 { url1 page } {
#: Analyse flickr album-webpage,
# like http://www.flickr.com/photos/PERSON
# or http://www.flickr.com/photos/PERSON/page2
global PicNr Const Opt Data
set filename [format "%s%02d.html" $Const(Prefix1) $page ]
if ($page==1) {
set url $url1
} else {
set url [format "$url1/page%d" $page ]
}
set base $url1
set p1 [ string first "//" $url 0 ]; incr p1 2
set p2 [ string first "/" $url $p1 ]; incr p2 -1
set p1 0
set base [ string range $url $p1 $p2 ]
#Log "$base: $p1 $p2: '$base'"
Log "# filename: $filename" ;##
Log "# url : $url" ;##
Log "# base: $base" ;##
## %% Deactivate for offline-testing:
if 1 {
set page [ GetPage $url ]
#puts "$page" ;##
set fileId [open $filename "w"]
puts -nonewline $fileId $page
close $fileId
}
set fileId [open $filename r]
set page [read $fileId]
close $fileId
foreach line [split $page \n] {
# <title>Flickr: Photos from ALBUM</title>
if {[regexp -- "<title>" $line]} {
#Log "1: $line";
set p1 [ string first ":" $line 0 ]; incr p1 14
set p2 [ string first "</title" $line $p1 ]; incr p2 -1
set sA [ string range $line $p1 $p2 ]
Log "Album: $p1 $p2: '$sA'"
Print "Album: '$sA'"
set Data(0.Album) $sA
set Data2(Album) $sA
}
# <h4>stilles Seitental</h4>
if {[regexp -- "<h4>" $line]} {
#Log "2: $line";
incr PicNr
set p1 [ string first "<h4>" $line 0 ]; incr p1 4
set p2 [ string first "</h4>" $line $p1 ]; incr p2 -1
set sH [ string range $line $p1 $p2 ]
Log "\n"
Log "$PicNr - Header: $p1 $p2: '$sH'" Hi
Print "$PicNr - Header: '$sH'" Hi
set Data($PicNr.Head) $sH
set Data($PicNr.Desc) ""
}
# <p class="Photo"><a href="/photos/PERSON/87654321/">
# <img src="http://static.flickr.com/42/87654321_8888_m.jpg" width="240" height="180" /></a></p>
if {[regexp -- (class="Photo") $line]} {
#Log "3: $line";
#incr n
set p1 [ string first "href=" $line 0 ]; incr p1 6
set p2 [ string first "img" $line $p1 ]; incr p2 -4
set sL [ string range $line $p1 $p2 ]
Log "Link : $p1 $p2: '$sL'"
#set Data($PicNr.Link) $sL
set p1 [ string first "src=" $line 0 ]; incr p1 5
set p2 [ string first "jpg" $line $p1 ]; incr p2 2
set sP [ string range $line $p1 $p2 ]
Log "Photo: $p1 $p2: '$sP'"
#Print "Photo: '$sP'"
#set Data($PicNr.Photo) $sP
set url2 $sL ;# /photos/PERSON/87654321/
set nr $url2 ;# /photos/PERSON/87654321/
#Log "#> '$nr'"
set p2 [ string last "/" $url2 ]; incr p2 -1
set p1 [ string last "/" $url2 $p2 ]; incr p1 1
set nr [ string range $url2 $p1 $p2 ]
Log "#>Nr: $p1 $p2 '$nr'"
#set filename [format "p%04d.html" $PicNr ]
set filename [format "%s.html" $nr ] ;# Filename for local photo-page
set Data($PicNr.Name) $nr
Print "Name : $nr"
set filename0 [format "%s_0100.jpg" $nr ] ;# 100x75 - Thumbnail
set sP0 [ string map {_m _t} $sP ]
if { $Opt(100x75) } { FetchImage $sP0 $filename0 }
set filename1 [format "%s_0240.jpg" $nr ] ;# 240x180 - Small
if { $Opt(240x180) } { FetchImage $sP $filename1 }
set filename2 [format "%s_0500.jpg" $nr ] ;# 500x375 - Medium
set sP2 [ string map {_m ""} $sP ]
if { $Opt(500x375) } { FetchImage $sP2 $filename2 }
set filename3 [format "%s_1024.jpg" $nr ] ;# 1024x768 - Large
set sP3 [ string map {_m _b} $sP ]
if { $Opt(1024x768) } { FetchImage $sP3 $filename3 }
#break ;##
set filename4 [format "%s_2048.jpg" $nr ] ;# Original Size, e.g. 2560x1920
set sP4 [ string map {_m _o} $sP ]
if { $Opt(MaxSize) } { FetchImage $sP4 $filename4 }
}
# <p class="Desc">im Khao Sok</p>
# <p class="Desc">Figuren aus dem alten China, auf<a href="/photos/PERSON/87654321/">...</a></p>
if {[regexp -- (class="Desc") $line]} {
#Log "4: $line";
set p1 [ string first "Desc" $line 0 ]; incr p1 6
#set p2 [ string first "</p>" $line $p1 ]; incr p2 -1
set p2 [ string first "<" $line $p1 ]; incr p2 -1
set sD [ string range $line $p1 $p2 ]
Log "Descr: $p1 $p2: '$sD'"
#Print "Descr: '$sD'"
set Data($PicNr.Desc) $sD ;# gets replaced again in Analyse2
}
# <a href="/photos/PERSON/page12/" class="end">12</a>
# <a href="/photos/PERSON/" class="end">1</a>
if {[regexp -- (page.*class="end") $line]} {
#Log "5: $line";
#incr n;
set p1 [ string first "page" $line 0 ]; incr p1 4
set p2 [ string first "/" $line $p1 ]; incr p2 -1
set s9 [ string range $line $p1 $p2 ]
Log "End: $p1 $p2: '$s9'"
return [incr s9 0]
#break
}
# <p class="Activity">
if {[regexp -- (class="Activity") $line]} { ;# now get photo-page
Analyse2 $base $sL $filename
#break
}
# <!-- ### MAIN NAVIGATION ### -->
if {[regexp -- "<!-- ### MAIN" $line]} {
break ;# Nothing interesting beyond this point
}
}
return 0
}
#########1#########2#########3#########4#########5#########6#########7#####
proc Analyse2 { url1 url2 filename } {
#: Analyse a flickr photo-webpage (which shows a single photo),
# like http://www.flickr.com/photos/PERSON/87654321/
#
# @url1 : first part of the url, e.g. "http://www.flickr.com/"
# @url2 : 2nd part of the url, e.g. "/photos/PERSON/87654321/"
# @filename: filename for local copy of webpage
global PicNr Data
set url "$url1$url2"
## %% Deactivate for offline-testing:
if 1 {
set page [ GetPage $url ]
#Log "$page" ;##
set fileId [open $filename "w"]
puts -nonewline $fileId $page
close $fileId
}
set fileId [open $filename r]
set page [read $fileId]
close $fileId
foreach line [split $page \n] {
# page_current_url
if {[regexp -- "page_current_url" $line]} {
#Log "1>> $line";
}
# <li class="Stats">
# Taken with an Olympus C5050Z.
if {[regexp -- "Taken with an" $line]} {
#Log "2>> $line";
set p1 [ string first "with" $line 0 ]; incr p1 8
set p2 [ string first "<br /" $line $p1 ]; incr p2 -3
set sC [ string range $line $p1 $p2 ]
Log ">> Camera: $p1 $p2: '$sC'"
}
# <p class="DateTime"> Uploaded on <a href="/photos/PERSON/archives/date-posted/2006/01/07/"
# style="text-decoration: none;">Jan 7, 2006</a></p>
if {[regexp -- "Uploaded on" $line]} {
#Log "3>> $line";
set p1 [ string first "date-posted" $line 0 ]; incr p1 12
set p2 [ string first "style" $line $p1 ]; incr p2 -4
set sU [ string range $line $p1 $p2 ]
set sU [ string map {/ -} $sU ]
Log ">> Upload: $p1 $p2: '$sU'"
}
# Taken on <a href="/photos/PERSON/archives/date-taken/2006/01/10/"
# style="text-decoration: none;">January 10, 2006</a>
if {[regexp -- "archives/date-taken" $line]} {
#Log "4>> $line";
set p1 [ string first "date-taken" $line 0 ]; incr p1 11
set p2 [ string first "style" $line $p1 ]; incr p2 -4
set sS [ string range $line $p1 $p2 ]
set sS [ string map {/ -} $sS ]
set Data($PicNr.Date) $sS
Log ">> Shot: $p1 $p2: '$sS'"
Print "Date: '$sS'"
}
# <h1 id="title_div87654321">stilles Seitental</h1>
if {[regexp -- "<h1" $line]} {
#Log "H1: $line";
set p1 [ string first ">" $line 0 ]; incr p1 1
set p2 [ string first "</h1>" $line $p1 ]; incr p2 -1
set sH [ string range $line $p1 $p2 ]
Log ">> $PicNr - Header: $p1 $p2: '$sH'"
}
# <div id="description_div87654321" class="photoDescription">im Khao Sok</div>
# <div id="description_div73182923" class="photoDescription">Massiert wird überall und immer...,
# viel Konkurrenz bedeutet kleine Preise: 1h Fußmassage = 120Bt (3€)<br />
# Es massieren die Frauen, die tragende Säule der Gesellschaft.</div>
#
if {[regexp -- (class="photoDescription") $line]} {
#Log "D: $line";
set p1 [ string first "Desc" $line 0 ]; incr p1 13
set p2 [ string first "</div>" $line $p1 ]
# !! Multiline-Descriptions: get at least the first line:
if {$p2 > $p1} { incr p2 -1 } else { set p2 [string length $line] }
set sD [ string range $line $p1 $p2 ]
set Data($PicNr.Desc) $sD
Log ">> Descr: $p1 $p2: '$sD'"
Print "Descr: '$sD'"
}
# Abort scanning of current file (nothing of interest below):
if {[regexp -- "upload_form_container" $line]} {
Print "-"
Log "##> $PicNr : $Data($PicNr.Name) #\
$Data($PicNr.Date) #\
$Data($PicNr.Head) #\
$Data($PicNr.Desc)"
global Data2
#%%
set key $Data($PicNr.Name)
set Data2($key.Date) $Data($PicNr.Date)
set Data2($key.Head) $Data($PicNr.Head)
set Data2($key.Desc) $Data($PicNr.Desc)
break
}
}
}
#########1#########2#########3#########4#########5#########6#########7#####
proc Go {url} {
#: Start processing after user entered url
global PicNr Const Opt Data
set StartPage 1
Print ""
Print "Flickr-Download from $url" Hi
set PicNr 0
set filename [ format "%s%02d.html" $Const(Prefix1) $StartPage ] ;# page01.html
set MaxPage [ Analyse1 $url $StartPage ]
incr StartPage 1
#set MaxPage 2 ;##
if { $Opt(All_Pages) } {
for {set page $StartPage} {$page <= $MaxPage} {incr page} {
Analyse1 $url $page
}
}
Print ""
Print "Done !" Hi
#: Show collected Data about pictures:
Print ""
set line -1
#%%
global Data2
set line 0
foreach key [lsort -dictionary [ array names Data2 ]] {
Print "$key : $Data2($key)" [expr [incr line]%3]
}
arr'dump Data $Const(Datafile)
arr'dump Data2 data2.txt
Print ""
}
proc arr'dump { _arr fn } {
#: Dump array to file, in a format ready to be loaded via 'source'
upvar 1 $_arr arr
set f [open $fn w]
puts $f "array set $_arr \{"
foreach key [ lsort [array names arr] ] {
puts $f [ list $key $arr($key) ]
}
puts $f "\}"
close $f
}
#########1#########2#########3#########4#########5#########6#########7#####
#: Main :
Init
#catch {console show} ;##
pack [frame .f1]
pack [frame .f2]
label .lab1 -text "URL:"
entry .ent1 -textvar e -width 80
text .txt1 -yscrollcommand ".scr1 set" -width 100 -height 40 -bg white -wrap word
scrollbar .scr1 -command ".txt1 yview"
button .but0 -text "Clear Log" -command { .txt1 delete 0.0 end }
button .but1 -text "Go" -command { Go $e }
pack .lab1 .ent1 .but0 .but1 -in .f1 -side left -padx 2
label .lab2 -text "Options:"
pack .lab2 -in .f2 -side left
set AllPages "All Pages"
lappend Options 100x75 240x180 500x375 1024x768 MaxSize Get_from_Web All_Pages
foreach size $Options {
set cl [label .sz$size -text $size ]
set cc [checkbutton .cb$size -variable Opt($size) -command ShowOpt ]
pack $cl -in .f2 -side left -anchor e
pack $cc -in .f2 -side left -anchor w
}
.txt1 tag configure "Hi" -background red -foreground white
.txt1 tag configure "DL" -background lightblue -underline 1
.txt1 tag configure "Ok" -background green -underline 0
.txt1 tag configure 1 -background cyan
Print " $Prg(Title) $Prg(Version) - $Prg(Date) " Hi
Print "$Prg(About)"
Print "(c) $Prg(Author) - $Prg(Contact)" Ok
set Opt(100x75) 0
set Opt(All_Pages) 0
set Opt(Get_from_Web) 1
ShowOpt ;##
set Data(0.Album) "Flickr"
pack .scr1 -side right -fill y
pack .txt1 -side right
bind .ent1 <Return> { Go $e }
bind . <Key-F1> { console show }
set e http://www.flickr.com/photos/
#set e http://www.flickr.com/photos/siegfrieden
#set e http://www.flickr.com/photos/siegfrieden/page2
wm title . $Prg(Title)
focus -force .ent1#.Comments edit
Now with a nice GUI: enter URL of first album-page, check the options you want, then press the GO-button.Checkboxes for the image-sizes to download are obvious. When "Get from Web" is not checked, no internet-access happens and local files (from a previous download) are used. When "All Pages" is not checked, processing stops after the first page.CJL wonders whether the Flickr-generated RSS feeds for an album might be a quicker way of getting at the required set of image URLs.HJG: I don't think so - the data in the RSS lists only the most recently uploaded images, it misses some details (i.e. date when picture was taken), and the description-field looks messy.Here is a more quick'n'dirty way to get just the pictures, using wget and awk:- Visit the first page of the album with a browser [2]
- Save this page as s01.html (html-only is enough)
- At the bottom of the album-page, right-click each "Page X"-link, and save-link-as s02.html, etc. (ok, more than about a dozen of these would get tiresome...)
- awk -f flickr.awk s*.html > links.txt
- wget -w1 -i links.txt
BEGIN { FS="\""
Found=0;
print "# flickr-Download:"
}
/class="Photo/ { Found++
sub( "^.*http", "http", $0)
sub( "_m", "_b", $1) # _b : large picture = 1024x768
print $1
next
}
END { print "# Found:", Found }Next step: Flickr Offline Photoalbum.schlenk wonders if using htmlparse or tdom in html mode would make the page parsing code look nicer.HJG: Are there any examples of these tools here on the wiki (or elsewhere), with a demo of how to parse a fairly complex webpage ? Of course, it is hard to see how the webpage to be parsed looked like, when only the parsing code is there.I admit that my code is more "working" than "elegant"...2006-02-02: After the first successful use of this program, some problems showed up:
- Descriptions for photos can be longer than one line
- Flickr-pages with defined "sets" have extra title-entries (should be filtered out)
- For some pictures, the selected size might not available (e.g. only 640x480).
- No checks yet if the download of an image is successful
- There are other types of webpages at flickr (e.g. Set, Calendar, Tags...) that cannot be parsed yet.
- I have not yet decided on the design for the data to pass to the viewer.
See also:
- http - Download file via HTTP - Polling web images with Tk
- A little file searcher - owh - a fileless tclsh
- Serializing an array
- -
- Parsing HTML - Regexp HTML Attribute Parsing
- Stephen Uhler's HTML parser in 10 lines
- websearch

