Unix Technical Forum

How do I do this in Wget?

This is a discussion on How do I do this in Wget? within the Gentoo Linux Support forums, part of the Unix Operating Systems category; --> There's a picture site I am a member of. Not wanting to save 32,000 images manually, I want to ...


Go Back   Unix Technical Forum > Unix Operating Systems > Gentoo Linux Support

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-21-2008, 08:34 AM
Choo-choo-train
 
Posts: n/a
Default How do I do this in Wget?

There's a picture site I am a member of. Not wanting to save 32,000 images
manually, I want to just download everything with Wget. Directories are
forbidden, so you can't just point wget at the base picture directories.
It is necessary to use recursive fetching, ie., pass Wget the link to
the gallery index html, and have it follow the links. I tried the -p
switch, but that didn't follow the thumbnails to the full-size images. So
I tried -r -l 1, but at the top of each page is a link to the next
gallery, which Wget then follows and downloads. So I tried -r -l 1 -R
..htm. but then Wget doesn't follow the links to the full-size images, as
it has to discard the page passed to it. So I tried --ignore-tags="<a
href="*.htm"*>" but it still downloads the linked .htm documents. Maybe
I'm misusing the --ignore-tags= switch, but I couldn't find any examples
of it's proper usage.
How do I make Wget download the .htm passed to it, follow all the links
going to files ending in .jpg, and ignoring links going to files ending in
..htm?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 02-21-2008, 08:34 AM
ghost
 
Posts: n/a
Default Re: How do I do this in Wget?

Choo-choo-train wrote:

> There's a picture site I am a member of. Not wanting to save 32,000 images
> manually, I want to just download everything with Wget. Directories are
> forbidden, so you can't just point wget at the base picture directories.
> It is necessary to use recursive fetching, ie., pass Wget the link to
> the gallery index html, and have it follow the links. I tried the -p
> switch, but that didn't follow the thumbnails to the full-size images. So
> I tried -r -l 1, but at the top of each page is a link to the next
> gallery, which Wget then follows and downloads. So I tried -r -l 1 -R
> .htm. but then Wget doesn't follow the links to the full-size images, as
> it has to discard the page passed to it. So I tried --ignore-tags="<a
> href="*.htm"*>" but it still downloads the linked .htm documents. Maybe
> I'm misusing the --ignore-tags= switch, but I couldn't find any examples
> of it's proper usage.
> How do I make Wget download the .htm passed to it, follow all the links
> going to files ending in .jpg, and ignoring links going to files ending in
> .htm?


I would try option -m (mirror is similar to options -r -N -l inf -nr)
wget -m

--
greeting
ghost
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:23 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com