This article will give you one way to filter out all the unwanted versions of the zipped picture sets.
In order for this to work you have to go to every month you wish to download.
Then you should save every month on the site to a htm file, no need to save the graphics along, just keep it to html only. Secondly you have to remove all newlines, carrige returns and tabs from the htm files..
You can accomplish that by running a TR command on the commandline in the OS of your choice.
The TR should look like the one below.
tr -d "\n\r\t" < onteXYZ.htm > onteXYZ.htmlDo that on every page you saved. Once they are all saved to their .html counterpart, then run the command below.
grep -Po "http://[\w.]+/members/(.....)?zips/[-\w]+\.zip(?=..[\w ]+.?\([\d]+.[\d]+ MB\)./div../div.)" onte*.htmlThat will extract all the picture zip files in the higest quality available. No need to filter stuff manually. And no need to clean up after the download.
What the grep actually does is to look for any http:// link which links to a zip file and have some trailings which consists of two ending div's.
Have a look below to better understand the result.
No comments:
Post a Comment