Wget tricks and hacks with Linux part 1
wget stands for web get .Wget is a terminal based download manager.wget allows you to download content from web to your system with efficiently. It supports http,https and fpt protocols. Files can be retrieved with proxies too. Wget is an open source software.It is very powerful it has many awesome features:
Features of wget
One it's cool feature is ,it works in the background even,if you are not logged in.It does not need your presence constantly like many other download manager.
It has resume support.
Wget has been designed for robustness over slow or unstable network connections. if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved. If the server supports regetting, it will instruct the server to continue the download from where it left off.
Let's explore it's awesome features with real work.So fire up your terminal. Type the following command
Below command will show you all the available that commands that you can use with wget.there are a lot of cool things you can do with wget.
root@seven:~# wget --help GNU Wget 1.16, a non-interactive network retriever. Usage: wget [OPTION]... [URL]... Mandatory arguments to long options are mandatory for short options too. Startup: -V, --version display the version of Wget and exit. -h, --help print this help. -b, --background go to background after startup. -e, --execute=COMMAND execute a `.wgetrc'-style command. Logging and input file: -o, --output-file=FILE log messages to FILE. -a, --append-output=FILE append messages to FILE. -d, --debug print lots of debugging information. -q, --quiet quiet (no output). -v, --verbose be verbose (this is the default). -nv, --no-verbose turn off verboseness, without being quiet. --report-speed=TYPE Output bandwidth as TYPE. TYPE can be bits. -i, --input-file=FILE download URLs found in local or external FILE. -F, --force-html treat input file as HTML. -B, --base=URL resolves HTML input-file links (-i -F) relative to URL. --config=FILE Specify config file to use. --no-config Do not read any config file. local.
Download a single web page-wget
Just write the address or url with wget it will download page for you.
root@seven:~# wget www.google.com
Your files will be saved into your present working directory.So check out your directory and a file with index.html will be downloaded.
Download Entire website with Wget
In order to download full website we need to give -r option.Then it will download recursively means it will download everything which linked to the webpage like images,pages etc.
root@seven:~# wget -r http://www.demoweb.com
-r option tells wget to download recursively. it downloads all the images ,pages and data that is linked on the front page.
But many sites do not want you to download their entire site. To prevent this, they check how browsers identify. Many sites refuses you to connect or sends a blank page if they
detect you are not using a web-browser. You might get a message like:
Sorry, but the download manager you are using to view this site is not supported. We do not support use of such download managers as flashget, go!zilla, or getright.
There is -U option that can be used to tell the site to that you are using some commonly accepted browser.
So follow the below command:
root@seven:~# wget -r -p -U Mozilla http://www.demosite.com
-U with -U you should specify your default browser.
Save file under different name and location
When you download a webpage or any other file it's always saved into your present working directory and name will be inherited from the source.We can change both[name and default]. Type the following command.
root@seven:~# wget -O /root/Desktop/googlesourcecode www.google.com
-O for specifying your location where you want your file to be saved.
Resuming files with Wget
Now lets say you are downloading a large file and you stopped it for some reasons.Dont worry wget takes good care of you.Even if file your are downloading with other Download managers can be resumed.
root@seven:~# wget http://kali2.mirror.garr.it/mirrors/kali-images/kali-2.0/kali-linux-2.0-amd64.iso
When you hit enter it will start downloading kali linux iso image which is obviously damn large.Note down the size of the file and exit by pressing control+c.Now we will resume the same file with the following command:
root@seven:~# wget -c http://kali2.mirror.garr.it/mirrors/kali-images/kali-2.0/kali-linux-2.0-amd64.iso
In order to resume we have to give -c options which tells wget to resume the file.So just insert -c after wget and re-enter the same URL.It will start where you left off.
Give limited width
When you start downloading a file it will use your full bandwidth,so if you want to limit your downloading speed with certain amount you can do with limit-rate
root@seven:~# wget --limit-rate=220k http://kali2.mirror.garr.it/mirrors/kali-images/kali-2.0/kali-linux-2.0-amd64.iso
tries option with wget
If you have a slow connection then sometimes it takes a little time to ping servers.In such cases your downloading stops you have to start over again but wget allows us to reconnect automatically you just need to specify the amount of time you want to wget to try.
root@seven:~# wget --tries 10 www.google.com
So you have to supply --tries parameter it will attempt to download file 10 times before giving up.
Send files to background with wget
-b options sends file to the background .Now file will be downloaded in the background without interrupting your workflow.You can check log file to know the info.
wget -b www.google.com
kill wget processes
root@seven:~# killall wget
It will delete all the wget files running.You can delete specific operations for that you have to supply process id.
Bulk Downloading with wget
This is an awesome features.You can put url's in a single text file and give it to the wget.Syntax goes like this:
root@seven:~# wget -i bulkdownload.txt
Check for broken links
This feature is specially for web developers.It will check for for broken links.You can use wget as a spider too.
root@seven:~# wget --spider -o broken.txt -r -p http://www.zeeroseven.com
--spider is for crawling website it finds pages.
-o is for writing to a file.
-r is for recursive.