On 16 Dec 2004 03:58:21 -0800,
guruteck@gmail.com <guruteck@gmail.com> wrote:
> Thanx u very much .It works greatly for me..
> But I didnt completely understand why it is working for me now??
> I didnt get complete information from man pages
The man page doesn't seem to cover robots.txt very much. (Perhaps this
is intentional.) It's worth reading /etc/wgetrc as that will give you
some more ideas as to what wget can do.
If the '-erobots=off' option worked for you, then this option told wget
to ignore the
http://some.example.com/robots.txt file on the website.
robots.txt is part of the Robots Exclusion Standard that well-behaved
web robots (like wget) will follow. It tells web robots what part(s) of
the site should not be downloaded or indexed. There is more information
on the robotstxt site:
<http://www.robotstxt.org/>
<http://www.robotstxt.org/wc/norobots.html#introduction>
If the '-U' option worked for you, the website you're downloading from
is blocking requests from any client called "Wget/1.9.1". The -U option
allows wget to look like another client, like firefox for instance:
wget -U \
"Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0" \
http://some.example.com
--
Mark Hill