How to use Wget to recursively download files from a web directory?

Updated on September 3, 2017

Question: My company’s intranet website hosts GB’s of software tools and applications. I have been trying to download all of those via wget command and yet maintain the same directory structure (the source directory has organized tools under various categories, which I like maintain it same). Here’s an example of the intranet web directory – http://intranet.company.com/software-tools/. The ‘software-tools’ is the directory that contain plenty of subdirectories and files.  I would like to download all of those recursively using wget command. But I can’t seem to find the right options for wget? Can someone from techglimpse help me out? – Naveen.

Answer:

wget is the command line utility to download files from a remote web server. It allows you to download a specific file, loop through a directory (I mean, recursively) and download files from subdirectories as well.

tech forum

wget recursive download

# wget  -r http://intranet.company.com/software-tools/

The above command will download all the files and directories inside ‘software-tools’ directory. But remember, it will also download ‘index.html’ file under ‘software-tool’ directory and as well as inside subdirectories. To avoid that, you have to use reject option with value as ‘index.html’.

wget reject option

# wget  -r --reject "index.html*" http://intranet.company.com/software-tools/

Note: Recursive option for wget will work only if Directory Listing was allowed by the web server.

Sample output:

::::::::::::::::::::::::::
 Removing intranet.company.com/software-tools/index.html?C=M;O=D since it should be rejected.
 ::::::::::::::::::::::::::

Do not ascend to the parent directory while download recursively using wget

If you don’t want wget to follow the link in the directory index and ascend to the parent directory, then option ‘–no-parent’ should be used as below.

wget -r --no-parent --reject "index.html*" http://intranet.company.com/software-tools/

You will find more information from man wget.

Was this article helpful?

Related Articles

Leave a Comment