Chapter 1 Resources
Perhaps the most familiar part of the web is the HTTP address. When I want to find a recipe for a dish featuring broccoli, which is almost never, then I might open my web browser and enter http://food.com in the address bar to go to the food.com website and search for recipes. My web browser understands this syntax and knows it needs to make an HTTP request to a server named food.com. We'll talk later about what it means to "make an HTTP request" and all the networking details involved. For now, we just want to focus on the address: http://food.com.
Resource Locators
The address http://food.com is what we call a URL—a uniform resource locator. It represents a specific resource on the web. In this case, the resource is the home page of the food.com website. Resources are things I want to interact with on the web. Images, pages, files, and videos are all resources.
There are billions, if not trillions, of places to go on the Internet—in other words, there are trillions of resources. Each resource will have a URL I can use to find it. http://news.google.com is a different place than http://news.yahoo.com. These are two different names, two different companies, two different websites, and therefore two different URLs. Of course, there will also be different URLs inside the same website. http://food.com/recipe/broccoli-salad-10733/ is the URL for a page with a broccoli salad recipe, while http://food.com/recipe/grilled-cauliflower-19710/ is still at food.com, but is a different resource describing a cauliflower recipe.
We can break the last URL into three parts:
1. http, the part before the ://, is what we call the URL scheme. The scheme describes how to access a particular resource, and in this case it tells the browser to use the hypertext transfer protocol. Later we'll also look at a different scheme, HTTPS, which is the secure HTTP protocol. You might run into other schemes too, like FTP for the file transfer protocol, and mailto for email addresses.
Everything after the :// will be specific to a particular scheme. So, a legal HTTP URL may not be a legal mailto URL—those two aren't really interchangeable (which makes sense because they describe different types of resources).
2. food.com is the host. This host name tells the browser the name of the computer hosting the resource. The computer will use the Domain Name System (DNS) to translate food.com into a network address, and then it will know exactly where to send the request for the resource. You can also specify the host portion of a URL using an IP address.
3. /recipe/grilled-cauliflower-19710/ is the URL path. The food.com host should recognize the specific resource being requested by this path and respond appropriately.
Sometimes a URL will point to a file on the host's file system or hard drive. For example, the URL http://food.com/logo.jpg might point to a picture that really does exist on the
12
food.com server. However, resources can also be dynamic. The URL http://food.com/recipes/brocolli probably does not refer to a real file on the food.com server. Instead, some sort of application is running on the food.com host that will take that request and build a resource using content from a database. The application might be built using ASP.NET, PHP, Perl, Ruby on Rails, or some other web technology that knows how to respond to incoming requests by creating HTML for a browser to display.
In fact, these days many websites try to avoid having any sort of real file name in their URL. For starters, file names are usually associated with a specific technology, like .aspx for Microsoft's ASP.NET technology. Many URLs will outlive the technology used to host and serve them. Secondly, many sites want to place keywords into a URL (like having /recipe/broccoli/ in the URL for a broccoli recipe). Having these keywords in the URL is a form of search engine optimization (SEO) that will rank the resource higher in search engine results. Descriptive keywords, not file names, are important for URLs these days.
Some resources will also lead the browser to download additional resources. The food.com home page will include images, JavaScript files, CSS, and other resources that will all combine to present the "home page" of food.com.
Figure 1: food.com home page
Ports, Query Strings, and Fragments
Now that we know about URL schemes, hosts, and paths, let's also look at a URL with a port number:
http://food.com:80/recipes/broccoli/
13
The number 80 represents the port number the host is using to listen for HTTP requests. The default port number for HTTP is port 80, so you generally see this port number omitted from a URL. You only need to specify a port number if the server is listening on a port other than port 80, which usually only happens in testing, debugging, or development environments. Let's look at another URL.
http://www.bing.com/search?q=broccoli
Everything after ? (the question mark) is known as the query. The query, also called the query string, contains information for the destination website to use or interpret. There is no formal standard for how the query string should look as it is technically up to the application to interpret the values it finds, but you'll see the majority of query strings used to pass name–value pairs in the form name1=value1&name2=value2.
For example:
http://foo.com?first=Scott&last=Allen
There are two name–value pairs in this example. The first pair has the name "first" and the value "Scott". The second pair has the name "last" with the value "Allen". In our earlier URL (http://www.bing.com/search?q=broccoli), the Bing search engine will see the name "q" associated with the value "broccoli". It turns out the Bing engine looks for a “q” value to use as the search term. We can think of the URL as the URL for the resource that represents the Bing search results for broccoli.
Finally, one more URL:
http://server.com?recipe=broccoli#ingredients
The part after the # sign is known as the fragment. The fragment is different than the other pieces we've looked at so far, because unlike the URL path and query string, the fragment is not processed by the server. The fragment is only used on the client and it identifies a particular section of a resource. Specifically, the fragment is typically used to identify a specific HTML element in a page by the element's ID.
Web browsers will typically align the initial display of a webpage such that the top of the element identified by the fragment is at the top of the screen. As an example, the URL http://odetocode.com/Blogs/scott/archive/2011/11/29/programming-windows-8-the-sublime-to-the-strange.aspx#feedback has the fragment value "feedback". If you follow the URL, your web browser should scroll down the page to show the feedback section of a particular blog post on my blog. Your browser retrieved the entire resource (the blog post), but focused your attention to a specific area—the feedback section. You can imagine the HTML for the blog post looking like the following (with all the text content omitted):
...