Wednesday, September 2, 2009

Understanding the URL / links





URL's or links as they are often called can be quite complex. This article will dig into urls and try to explain them. Let's look at 4 url examples below.

1)hxxp://aaabbbcccd.n1.brazzers.com/members/?action=scenes&site_id=5&order_by=date&order_order=desc
2)hxxp://aaabbbcccd.n1.brazzers.com/members/?action=scene&tab=home&scene_id=2776
3)hxxp://members.bangbros.com/milfsoup/updates-1.html
4)hxxp://members.bangbros.com/membercheck?path=ms4276/streaming&fname=ms4276500k.wmv


There are several parts to an url. The easiest is no. 3 which is a plain a simple url. The url consists of 4 parts.

hxxp://www.sitename.com/path_on_server/web_page.html

Part 1
The first part is the protocol, in our case the protocol is http (hxxp). You may have experienced protocols as mms:, https:, ftp:, rtsp: or pnm:. But let us concentrate on http.

Part 2
The second part is the name on the server. A servername consists of a server and a domainname. As example: members.bangbros.com, members is the server, bangbros.com is the domain. Together they define a unique webserver.

Part 3
The second part is the path on the server, which boils down to the directory on the server. Now the directory is a virtual directory. The real physical path could very well be something like /user/bangbros/www/milfsoup/ (as example, if any Unix like server was used) or c:\inetput\www\www.bangbros.com\milfsoup\ (if Windows is used as web server). But you should not care about the real physical path.

Part 4
This is the last part of the simple url. It tells you the name of the html file on the harddisk on the server. A plain .html file is often user created, which basically means that some guy or girl has actually taken time in DreamWeaver or FrontPage (God forbid it) or some other tool and actually written and composed the page. Then he or she has saved the page to the proper directory on the server.

When the humans leave
Well, a plain html url is where it all began. But human edited urls whould require billions of people to constantly sit and re-write webpages everytime you visit the TV Guide or Google or Wired or somewhere else. The contents is simply too alive (dynamic) to have people edit html files and put them on servers. So what to do?
Well instead we get a programmer, preferabilly a webprogrammer to write a program which can generate/create a webpage based on certain details we specify for his program. The details we specify for his program are the ones which makes up for the dynamic contents.
The programmer then writes a program which can read small details and then, based on the details, do some lookup's in database tables and the take that contents from there and put on the webpage with a specific pre-defined style. Ie. CGI which means common gateway interface. A webserver can use CGI to run programs instead of just loading .html pages to the user.


PHP, ASP, PL, JSP ? = and & etc
URL 1), 2) and 4) are all CGI based URL's. And those are the most common way to do things. CGI based urls can use a horde of programs. In the beginning we saw real PC based .EXE files which were run by the webserver. Today it is more common to see .ASP if the webserver is Windows based or PHP if it is Unix based. If you run into JSP then the webserver is hooked up on a Java based engine to execute the programs (Windows or Unix). PL (Perl) is used on both Windows and Unix, but most commonly we see it on the Apache based webserver opposite to ASP which works only on the Windows webserver (IIS Internet Information Server).

CGI based URL's
Have a look url 4). and take notice of the clear red part of the url. Ie. the part which spells membercheck, this is the name of the program written in PHP, ASP, PL, JSP or something else. There are several reasons why the webmasters tend to hide the language used inside the webserver. Because it makes harder for an exploiter to find his way in. Look at the url below and I bet you can answer which language is used and which part is the program

hxxp://nikkisplaymates.com/sd3.php?show=recent_gal_updates

Correct, PHP is the language and sd3 is the name of the program which generates the webpage.

If we look at the url in case 4) we have a part after the domain name. It looks like this:


/membercheck?path=ms4276/streaming&fname=ms4276500k.wmv


We already agreed that membercheck is the program name written in some language. Now this program has to have some details to perform what ever it does to find the proper information for our webpage. Those details are called parameters and in our case we have:

2 parameters:

path
fname


The parameters are seperated by the & sign. Both parameters have values. The value for path is ms4276/streaming and the value for the fname parameter is ms4267500k.wmv.

2 values for the 2 parameters

ms4276/streaming
ms4267500k.wmv


Now, we can only guess that the program (membercheck) uses the path parameter to find the proper directory on the webserver and further more the program finds the video file by the information in the fname parameter. Last there is a fat chance that the program checks that the user is authenticated, as the program name is membercheck.

It is VALUES of the PARAMETERS which you have to figure out, when you make a download list for ReGet Deluxe or OEEE, or whatever program you choose.

Invisible program ?
Well, take a look at url 1) and 2). As we know that the program name is located behind the / for the domain name and before the first ? then we realise that the program name is not shown. It is simply invisible.

This is surely not the case, because there is a program there. Except you do not know the name of it. The reason is that any webserver has some default pages it loads if the user does NOT specify which specific page to load. The most common default pages are:


index.html
index.htm
default.asp
index.jsp
index.php


The web administrator can configure this as he sees fit. So if you find and url as 1) or 2) then the case is just that the webserver is using the configured default file as the program which generates the real webpages.

No comments:

Post a Comment