The Unix WGET command

WGET is a great Unix tool to retrieve files from the World Wide Web using HTTP and FTP.

This is especially useful for downloading directories and files from the web.

Example:

wget -r http://asap.ahabs.wisc.edu/~glasner/EnteroFams/alignments

Using this, GNU has come up with "GNU Wget", a free software package for retrieving files using HTTP, HTTPS and FTP.

 

Use the wget command to download files to a remote Unix/Linux workstation

wget is especially useful for bioinformaticians working with NCBI. NCBI has a ftp site with major resources for researchers. If you are working on a Unix/Linux machine remotely through an SSH session, and then you need to get a resource (like a tar or gzip file) that's on the NCBI ftp site, there are the following options.

  1. You can download that file to your local machine.
  2. Use scp to copy it to your remote Unix box.

You can bypass this two step process using the wget command:

Copy the URL for the resource in your local workstation and use wget on the remote workstation and download it directly to that machine.

In my SSH terminal I typed this:
wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz

As soon as the download was completed I had the files I needed on my remote server, with no need for the extra scp step.

You can also try the different options available for wget

Example:

wget -r -l1 --no-parent -A.html http://asap.ahabs.wisc.edu/~glasner/EnteroFams/alignments/
Here, -r -l1 means to retrieve recursively, with maximum depth of 1. --no-parent means that references to the parent directory are ignored. And -A.html means to download only the HTML files. -A "*.html" would have worked too.

One more thing to know is that this will leave a record in the remote system's access log files, showing the hit coming from the remote system where you ran the wget command.

0 comments: