Lets say you saved the page as bangbros.html. Here is afew commands you can fire against your html file to get a nice and clean directory structure for your clips and pics.
tr -d "\n\r\t" < bangbros.html ¦ sed -e "s/ //g" > bb2.txt
grep -Po "(..\d{4}).html.\>\<b\>(.*?)\</b\>\</a\>\</p\>(.*?)Added: (\w*?) (\d{2}), (\d{4})" bb2.txt >bb3.txt
sed -e "s/January/01/" -e "s/February/02/" -e "s/March/03/" -e "s/April/04/" -e "s/May/05/" -e "s/June/06/" -e "s/July/07/" -e "s/August/08/" -e "s/September/09/" -e "s/October/10/" -e "s/November/11/" -e "s/December/12/" bb3.txt >bb4.txt
sed -e "s/.html.>//" bb4.txt>bb5.txt
sed -e "s/<b>/-/" -e "s/<\/b><\/a><\/p>Added: /-/" -e "s/,//" bb5.txt>bb6.txt
sed "s/\(......\)-\(.*\)-\(..\) \(..\) \(....\)/mkdir \"\5-\3-\4 - (\1) - \2\"/" bb6.txt
Those few commands makes you go from some html whick looks like this:
...
<td align="left" valign="top" width="24%">
<p><a href="http://members.bangbrosnetwork.com/bangbus/intro/bb4222.html"><b>Spring Break Hottie</b></a></p>
Added: March 12, 2008<br>
<p>Website: <a href="http://members.bangbrosnetwork.com/bangbus/main-1.html">bangbus.com</a></p>
<div><img src="bangbus_files/small_7.gif" alt="bar 7" border="0" height="12" width="58"></div>
<p><small>Rating: 6.78 (674 votes)</small></p></td>
...
into a nice clean command list which you can run, that looks like this:
mkdir "2008-03-12 - (bb4222) - Spring Break Hottie"
mkdir "2008-03-05 - (bb4197) - Kangaroo spotting"
mkdir "2008-02-27 - (bb4173) - Cock Hungry SaraJay"
...
No comments:
Post a Comment