WGET is a great Unix tool to retrieve files from the World Wide Web using HTTP and FTP.
This is especially useful for downloading directories and files from the web.
Example:wget -r http://asap.ahabs.wisc.edu/~glasner/EnteroFams/alignments
Using this, GNU has come up with "GNU Wget", a free software package for retrieving files using HTTP, HTTPS and FTP.
Use the wget command to download files to a remote Unix/Linux workstation
wget is especially useful for bioinformaticians working with NCBI. NCBI has a ftp site with major resources for researchers. If you are working on a Unix/Linux machine remotely through an SSH session, and then you need to get a resource (like a tar or gzip file) that's on the NCBI ftp site, there are the following options.
- You can download that file to your local machine.
- Use scp to copy it to your remote Unix box.
You can bypass this two step process using the wget command:
Copy the URL for the resource in your local workstation and use wget
on the remote workstation and download it directly to that machine.
wget ftp://ftp.ncbi.nih.gov/gene/DATA/gene2refseq.gz
As soon as the download was completed I had the files I needed on my remote server, with no need for the extra scp
step.
You can also try the different options available for wget
Example:
wget -r -l1 --no-parent -A.html http://asap.ahabs.wisc.edu/~glasner/EnteroFams/alignments/Here, -r -l1 means to retrieve recursively, with maximum depth of 1. --no-parent means that references to the parent directory are ignored. And -A.html means to download only the HTML files. -A "*.html" would have worked too.
One more thing to know is that this will leave a record in the remote system's access log files, showing the hit coming from the remote system where you ran the wget
command.
0 comments:
Post a Comment