This is a discussion on How do I do this in Wget? within the Gentoo Linux Support forums, part of the Unix Operating Systems category; --> There's a picture site I am a member of. Not wanting to save 32,000 images manually, I want to ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| There's a picture site I am a member of. Not wanting to save 32,000 images manually, I want to just download everything with Wget. Directories are forbidden, so you can't just point wget at the base picture directories. It is necessary to use recursive fetching, ie., pass Wget the link to the gallery index html, and have it follow the links. I tried the -p switch, but that didn't follow the thumbnails to the full-size images. So I tried -r -l 1, but at the top of each page is a link to the next gallery, which Wget then follows and downloads. So I tried -r -l 1 -R ..htm. but then Wget doesn't follow the links to the full-size images, as it has to discard the page passed to it. So I tried --ignore-tags="<a href="*.htm"*>" but it still downloads the linked .htm documents. Maybe I'm misusing the --ignore-tags= switch, but I couldn't find any examples of it's proper usage. How do I make Wget download the .htm passed to it, follow all the links going to files ending in .jpg, and ignoring links going to files ending in ..htm? |
| ||||
| Choo-choo-train wrote: > There's a picture site I am a member of. Not wanting to save 32,000 images > manually, I want to just download everything with Wget. Directories are > forbidden, so you can't just point wget at the base picture directories. > It is necessary to use recursive fetching, ie., pass Wget the link to > the gallery index html, and have it follow the links. I tried the -p > switch, but that didn't follow the thumbnails to the full-size images. So > I tried -r -l 1, but at the top of each page is a link to the next > gallery, which Wget then follows and downloads. So I tried -r -l 1 -R > .htm. but then Wget doesn't follow the links to the full-size images, as > it has to discard the page passed to it. So I tried --ignore-tags="<a > href="*.htm"*>" but it still downloads the linked .htm documents. Maybe > I'm misusing the --ignore-tags= switch, but I couldn't find any examples > of it's proper usage. > How do I make Wget download the .htm passed to it, follow all the links > going to files ending in .jpg, and ignoring links going to files ending in > .htm? I would try option -m (mirror is similar to options -r -N -l inf -nr) wget -m -- greeting ghost |