Tuesday, June 17, 2014

cURL Super Powers

cURL is an extremely powerful tool that you can use, available on many platforms including Linux or Windows (via tools like Cygwin). It's powered by libcurl. Wget is another like-minded utility to download files recursively.

Here's an example to parallel download all 150+ Microsoft eBooks available from this site.
curl http://blogs.msdn.com/b/mssmallbiz/archive/2013/06/28/almost-150-free-microsoft-ebooks-covering-windows-7-windows-8-office-2010-office-2013-office-365-office-web-apps-windows-server-2012-windows-phone-7-windows-phone-8-sql-server-2008-sql-server-2012-sharepoint-server-2010-s.aspx | grep -o -E "http://ligman.me/[0-9A-Za-z]+" | uniq | xargs -P 5 -n 1 wget --content-disposition

So what's it doing?
First, download (or curl) the url specified.
Second,search the second URL for a matching regular expression.
Third, uniq(ue) the resultset.
Fourth, xargs to deal with the large parameter list
Fifth, wget to download the resulting URLs in a parallel fashion.

Why use both tools when they appear to do similar things?
http://stackoverflow.com/questions/636339/what-is-better-curl-or-wget
If you're a Windows user, and can't install Cygwin, Powershell can do some of the things that wget can do.
http://superuser.com/questions/25538/what-is-the-windows-equivalent-of-wget
(new-object System.Net.WebClient).DownloadFile('http://www.xyz.net/file.txt','C:\tmp\file.tx‌​t')
Here's a ton of GNU manuals, including wget.

No comments: