HTTP protocol or HyperText Transfer Protocol is used to format and transmit messages on the Internet. It defines what web servers and clients have to do to accomplish data retrieval (and storage) on the Web. For example, when we enter a web address in our browser, the browser will send a HTTP formatted command to the web server, which will in turn process that command and return a HTTP formatted response back to our browser.
In this article we will take a look at how the HTTP protocol works on the low-level. To do that we will use a command line. We encourage you to follow along, so if you are on Windows 10, you can use the bash shell in Windows Subsystem for Linux. On earlier versions of Windows, you can use the Git Bash terminal program from Git. On Mac OS, you can use the built-in Terminal program, or another such as iTerm. On Linux, you can use any common terminal program such as gnome-terminal or xterm.
We will also use the ncat tool which reads and writes data across networks from the command line. In other words, it is used for sending and receiving messages over a network connection. We’ll be using it to see how web servers and browsers communicate.
Ncat is part of the Nmap network testing toolkit. So, if you are on Windows download and run Nmap setup from https://nmap.org/download.html. If you are on Mac, in the terminal, run ‘brew install nmap’ (or if without Homebrew, download and install Nmap DMG from https://nmap.org/download.html). If on Debian/Ubuntu/Mint, in the terminal, run ‘sudo apt-get install nmap’. If on Fedora, run ‘sudo dnf install nmap’.
To check if ncat is working, we will open up two terminals. In the first terminal, we will enter the command ‘ncat -l 9999’. This will initialize a simple network server which will listen on port 9999. In the second terminal, we will then enter the command ‘ncat localhost 9999’. This will initialize a network client which will connect to our server on port 9999.
After that we can type whatever we want in either terminal. For example, let’s enter “HTTP rules the web” and hit the Enter key.
This communication actually happened on the TCP networking layer (not HTTP), but we will use that TCP connection to shape and then send real HTTP requests.
If you get an error about the port 9999 already being in use, just pick another port and make sure to use the same port on both the server and the client side. To exit the ncat tool, type Control+C in the terminal.
HTTP Protocol, Requests and Responses
With every HTTP transaction, we always have a server and a client. Whenever we surf the web using our browser, we are actually using a client (browser) which sends HTTP requests to web servers. Web server in turn send responses back to our browser. Whenever we click a link on some page, we send HTTP requests to some server.
Keep in mind that browsers are not the only HTTP clients available. Various applications use various HTTP clients under the hood to send HTTP requests to servers (not only web servers) and receive data from servers. For example, an app on your phone can send HTTP request to fetch some data from a server. As you can see, HTTP is very powerful and widely supported, and basically we are using it all the time.
Originally, HTTP was used to serve hypertext documents (text documents interconnected by hyperlinks). But today, in addition to serving HTML documents, it is also used for images, videos, or any other media that the page needs.
In general, server is a program which accepts network connections. HTTP server handles incoming HTTP requests and return HTTP responses. When we start an HTTP server, it will wait for connection on specific port. When our browser sends a request to server to that specific port (for example, to get some page), server handles that request (runs some code) and then returns the response (the result of the operation) back to our browser. Specialized web server programs, like Apache, Nginx, or IIS, can serve static content from disk storage very quickly and efficiently. They can also provide access control, allowing only authenticated users to download particular static content. They can also used advanced features like load balancing, etc.
For the purposes of this article, we will use a simple web server which is available in Python interpreter. For this to work, we have to install it. If you use Windows or Mac, install it from python.org (https://www.python.org/downloads/) and choose the verion 3.*. If you have Mac with Homebrew, in the terminal run: ‘brew install python3’. If you have Debian/Ubuntu/Mint, in the terminal run: ‘sudo apt-get install python3’.
To check if it is successfully installed, run:
Depending on your system, the Python 3 command may be called python or python3. Make sure to check it and use the correct one from now on. Once you have it installed, navigate to a directory that has some files in it, like text files, images or similar. Then, in terminal, enter:
python -m http.server 8000
This will start HTTP server which will listen for requests on port 8000. We have to keep it running and leave the terminal open. Now, in our browser we can enter the URL: http://localhost:8000/. This will send the request to the web server and the server will return the response. In this case, the response will be the listing of files in the directory.
Any computer on the network could open up this address access our files this way (if we allow the access through the firewall and use the IP address instead of the ‘localhost’ part in URL), so it’s actually pretty neat.
If we take a look at the HTML source (right click > view page source) of the Directory listing page, we will see something like this.
Note that we didn’t create that HTML file. The Python web server generated that HTML itself and sent it to our browser, because it didn’t find any other, default HTML file to serve when we open the http://localhost:8000/ URL. Let’s try and change that. In the same directory we started the Python web server, let’s create a new file called ‘index.html’. The content of that file will simply be ‘Hello from index.html file’.
Now try to open http://localhost:8000/ again.
So, if we have a file called index.html in root directory, we’ll see the contents of that file in instead of the directory listing.
Let’s see what happens if we try to open some file which doesn’t exist, for example http://localhost:8000/i-dont-exist.html:
When we try to open some page which doesn’t exist, HTTP server will return a HTTP status 404, which means “Not Found”. We’ll talk about HTTP status codes more later.
Note that in our terminal we can see a log of all requests to the server:
Uniform Resource Identifier (URI) is what we often call a web address. We also often use the term Uniform Resource Locator (URL), which is actually a type of URI. URL means that we refer to the URI which identifies a resource on the network. We do that by using the “http://” or “https://” part in the URI / URL. For more in depth explanation, read this article: https://danielmiessler.com/study/url-uri/
URIs are made out of different parts, with some parts being optional. Let’s take a look at this example URI:
This URI has three parts:
- scheme: “https”
- hostname: “www.utilizewindows.com”
- path: “category/web”
The first part of a URI is the scheme. Scheme tells the client how to access the resource. Other URI schemes we use include “http”, “https”, “ftp”, “file”, etc. HTTP and HTTPS URIs point to resources served by a web server, FTP URIs point to resources served by a FTP server, while file URIs tell the client to access a file on the local filesystem.
The difference between HTTP and HTTPS URIs is that when a client goes to access a resource with an HTTPS URI, it will use an encrypted connection to do it. Encrypted Web connections were originally used to protect passwords and credit-card transactions, but today it is used on all pages to help protect users’ privacy. There are many other URI schemes: http://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
The next thing that appears in URI, after the scheme, is a hostname. In our case that was “www.utilizewindows.com”. The hostname tells the client which server to connect to. A hostname can only appear after a URI scheme that supports it, such as http or https. In these URIs, there will always be a “://” between the scheme and hostname.
Note that not every URI has a hostname part. For instance, a “mailto” URI just has an email address, for example: “mailto:firstname.lastname@example.org”. This is a well-formed “mailto” URI. This tells us that the “:” goes after the scheme, but the “//” goes before the hostname. Mailto links don’t have a hostname part, so they don’t have a “//”.
In an HTTP URI the next thing that appears is the path, which identifies a particular resource on a server. A server can have many resources on it, such as different web pages, videos, or APIs. The path tells the server which resource the client is looking for. In some cases, paths will correspond to files on the filesystem, and in other cases URI paths don’t necessarily equate to specific filenames. For instance, in our case we have a path “category/web”, but this doesn’t mean that there’s literally a file on a server filename of “category/web” or file called “web” in a directory called “category”. The server interprets the path to figure out what resource to send.
When we write a URI without a path, such as http://www.utilizewindows.com, the browser fills in the default path, which is written with a single slash. That’s why http://www.utilizewindows.com is the same as http://www.utilizewindows.com/ (with a slash on the end). The path written with just a single slash is also called the root. It’s the root of the resources served by the web server, and the server won’t let a web browser access files outside the directory that it’s running in.
If you ever written any HTML code, maybe you have used a relative URI when defining a link to some resource, for example:
<a href="/category/web">Web articles</a>
<a href="category/web">Web articles</a>
Note that we didn’t include a scheme or a hostname, just a path. This is a relative URI reference. It’s “relative” to the context in which it appears. In the first example it is relative to the root path, and in the second example it is relative to the page it’s on. The browser can figure out scheme and hostname from context. If we click on one of those links, the browser knows from context that it needs to fetch it from the same server that it got the original page from.
When we wan’t to designate the port in URI, we can do it after the hostname part, like this:
We don’t have to designate it here because the browser does it automatically, because we are using the HTTPS scheme (default port for HTTPS is 443). If we were to use the HTTP scheme (non secure version), the browser would use the port 80 (default port for HTTP is 80). If we ever need to use any other port, we can simply designate it like in the example above.
Other URI parts
There are other parts that can occur in a URI. Consider the difference between these two Wikipedia URIs:
If we follow these links in our browser, it will fetch the same page from a web server, but the second one displays the page scrolled to the article title. The part of the URI after the # sign is called a fragment, and it lets a link point to a specific named part of a resource. In HTML pages it links to an element by id. Note that the browser doesn’t even send fragments to the web server, it does it uses it locally.
In contrast, let’s take a look at this URI:
The “?s=smtp&d=extend” is a query part of the URI. Query parameters are key value pairs. Query part starts after the path using the “?”. Key value pairs are separated using the “&”. So, in our case this means that we have query parameter “s” with value “smtp” and “d” with value “extend”. All those query parameters do get sent to the server.
More on hostnames and ports
A full HTTP or HTTPS URI includes the hostname of the web server, like www.utilizewindows.com or google.com. If we put http://22.214.171.124/ in our browser, we’ll end up at Google.
In network terminology, a host is a computer on the network, one that hosts services. Computers tell computers apart by their IP addresses. Every piece of network traffic on the Internet is labeled with the IP addresses of the sending and receiving computers (network cards on computers). In order to connect to a web server such as www.utilizewindows.com, a client needs to translate the hostname into an IP address. Our operating system’s network configuration uses the Domain Name Service (DNS) to look up hostnames and get back IP addresses. DNS referes to a set of servers maintained by Internet Service Providers (ISPs) and other network users.
In the terminal, we can use the command “nslookup” (Windows OS) or “host” (Linux OS) to look up hostnames in DNS, like this:
$ host utilizewindows.com utilizewindows.com has address 126.96.36.199 utilizewindows.com mail is handled by 10 mxa.mailgun.org. utilizewindows.com mail is handled by 10 mxb.mailgun.org.
$ nslookup utilizewindows.com Server: dns.google.com Address: 188.8.131.52 Non-authoritative answer: Name: utilizewindows.com Address: 184.108.40.206
IP addresses come in two different versions: the older IPv4 and the newer IPv6. IPv4 look like this: 127.0.0.1 or 220.127.116.11. IPv6 addresses are much longer, such as 2001:0db8:85a3:0000:0000:8a2e:0370:7334, although they can also be abbreviated.
The IPv4 address 127.0.0.1 and the IPv6 address ::1 are special addresses that mean “this computer”. When we enter “localhost” in our browser, it will connect to those special addresses, or in other words, it will connect to our own computer.
Another special address is 0.0.0.0. This is not a regular IP address, but a special code for “every IPv4 address on this computer”. That includes the localhost address, but it also includes our computer’s regular IP address.
When we want to connect to special port on the server, we need to put that special port in the URI, for example:
This URI has a port number of 8000. But most of the web addresses we use on the Internet don’t have a port number on them. This is because the client usually figures out the port number from the URI scheme. For instance, HTTP URIs imply a port number of 80, whereas HTTPS URIs imply a port number of 443. If the server uses another port to receive requests, we have to put the port in the URI.
When talking about ports, we need to understand the following. All of the network traffic that computers send and receive is split up into messages called packets. Each packet has the IP addresses of the computer that sent it, and the computer that receives it, and it also has the port number for the sender and recipient. In this scenario, IP addresses are used to distinguish computers, while port numbers are used to distinguish programs on those computers.
We say that a server “listens on” a port, such as 80 or 8000. “Listening” means that when the server starts up, it tells its operating system that it wants to receive connections from clients on a particular port number. When a client (such as a web browser) “connects to” that port and sends a request, the operating system knows to forward that request to the server that’s listening on that port.
Every HTTP request begins with a verb. The verb tells the server what a client wants to do. The most common HTTP verbs are:
GET requests are used to ask a server to send back a copy of a resource. Let’s take a look at the request log of our local python web server we already mentioned in this article.
When we request a page from the local python web server, an entry appears in the logs, like this:
127.0.0.1 - - [06/Jul/2018 14:30:47] "GET / HTTP/1.1" 200 -
The part after the date and time is “GET / HTTP/1.1”. This is the actual text of the request that the browser sent to the server. This request has three parts:
- GET – the word GET is the method or HTTP verb being used. This says what kind of request is being made. GET is the verb that clients use when they want a server to send a resource, such as a web page or image.
- / – the path of the resource being requested (in this case it is the root path). Notice that the client does not send the whole URI of the resource here. It doesn’t say http://localhost:8000/. It just sends the path.
- HTTP/1.1 – the protocol of the request. Over the years, there have been several changes to the way HTTP works. Clients have to tell servers which dialect of HTTP they’re speaking. HTTP/1.1 is the most common version today, but the migration to HTTP 2 is on its way.
To see a bit more detailed request according to HTTP protocol, let’s take a look at another GET request which has headers displayed:
Again, we see the actual request definition “GET / HTTP/1.1”, and then we see a bunch of HTTP headers. The only mandatory header is the Host header which defines the hostname on which the web server is located. All other headers are optional. That means that the bare minimal HTTP GET request can look like this:
GET / HTTP/1.1 Host: www.utilizewindows.com
In our example, the User-Agent header describes the client which made the request. Accept header defines the format of the data the client will accept in the response. We will talk more about HTTP headers later.
Other HTTP verbs in short
We have seen how a GET request looks like, but we have mentioned that there are also other HTTP methods we can use.
POST method is used to send data to the server. The data is located in the body of the request and type of that data is indicated by the Content-Type header.
PUT method is used to either replace the existing resource, or to create a new resource if it doesn’t exist.
PATCH method is used to edit existing resource, but not the whole resource (only partial modification). In contrast, PUT method is used to replace the whole resource.
DELETE method is used to delete specified resource.
HEAD method is used to only return the headers from the server (without the body). It can be used to check the size of the content before issuing a GET request to download data or to check if a locally cached resource is outdated.
OPTIONS method is used to check which HTTP methods can be made on the server.
Sending GET request manually
Interesting thing about HTTP is that it’s textual protocol, which means that we can read it. It also means that we can write our own HTTP requests by hand. Let’s try and do that now. First we will start python web server:
We will leave this CMD opened. Next, we will start a new CMD and use a ncat command to connect to our python server and send it an HTTP request by hand. First, we will enter command
ncat 127.0.0.1 8000
This will connect to our pythong server. Nex, we will start writing the HTTP request. We will enter these two lines:
GET / HTTP/1.1 Host: localhost
After the second line, we have to press Enter twice. As soon as we do, the response from the server will be displayed on our terminal.
In our case the response is relatively short. If your’s is longer, you will probably need to scroll up to se the beginning of the response. In our case, at the beginning of the response we see a status line that says “HTTP/1.0 200 OK”. After that we see several lines of headers (Server, Date, Content-type and Content-Length). After headers we see a body of the response, which is HTML code. All those parts make up the HTTP response that the server sends.
After we typed “Host: localhost” and pressed Enter twice, we sent the request and the server sent back a response. This request and response exchange is happening every time the browser asks a server for a page, an image, or anything else.
Response status codes
In the response we got from our python server, the status line said “HTTP/1.0 200 OK”. The status line tells the client whether the server understood the request, whether the server has the resource the client asked for, and how to proceed next. It also tells the client which dialect of HTTP the server is speaking.
The number 200 here is the HTTP status code. There are may different HTTP status codes. The first digit of the status code indicates the general success of the request. As a shorthand, web developers describe all of the codes starting with 2 as “2xx” codes, where x’s mean “any digit”. Here is the short breakdown of status codes:
- 1xx – Informational. The request is in progress or there’s another step to take.
- 2xx – Success. The request succeeded. The server is sending the data the client asked for.
- 3xx – Redirection. The server is telling the client a different URI it should redirect to. The headers will usually contain a Location header with the updated URI. Different codes tell the client whether a redirect is permanent or temporary.
- 4xx – Client error. The server didn’t understand the client’s request, or can’t or won’t fill it. Different codes tell the client whether it was a bad URI, a permissions problem, or another sort of error.
- 5xx – Server error. Something went wrong on the server side.
You can find out more about HTTP status codes here: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
An HTTP response (like request) can include many headers. Each header is a line that starts with a keyword, such as Location or Content-type, followed by a colon and a value. Headers are a sort of metadata for the request or response. They aren’t displayed by browsers or other clients; instead, they tell the client various information about the response.
Many features of the Web are implemented using headers. For instance, cookies are a Web feature that lets servers store data on the browser, for instance to keep a user logged in. To set a cookie, the server sends the Set-Cookie header. The browser will then send the cookie data back in a Cookie header on subsequent requests.
A Content-type header indicates the kind of data that the server is sending. It includes a general category of content as well as the specific format. For instance, a PNG image file will come with the Content-type image/png. If the content is text (including HTML), the server will also tell what encoding it’s written in. UTF-8 is a very common choice. This way the browser knows which parsing engine to use.
Headers will often contain more metadata about the response body. For instance, we will always see a Content-Length header, which tells the client how long (in bytes) the response body will be (the size of the body in the response). This way the browser knows how many bytes it can expect to receive after the header section and can show us a meaningful progress bar when downloading a file.
Last-Modified is a header that contains the date when the document was last changed. We also have an ETag header stands for entity tag, and is a unique identifier that changes solely depending on the content of the file. Most servers actually use a hash function like SHA256 to calculate the ETag.
Cache-Control allows the server to control how and for how long the client will cache the response it received.
If-Modified-Since permits the server to skip sending the actual content of the document if it hasn’t been changed since the date provided in that header. For ETag the header is called If-None-Match and does exactly that. If the ETag for the document is still matching the ETag sent in the If-None-Match header, the server won’t send the actual document. Both If-None-Match and If-Modified-Since can be present in the same request, but the ETag takes precedence over the If-Modified-Since, as it is considered more accurate.
For more information about headers, you can refer to this link: https://en.wikipedia.org/wiki/List_of_HTTP_header_fields
The headers end with a blank line. Everything after that blank line is part of the response body. If the request was successful (a 200 OK status, for instance), this is a copy of whatever resource the client asked for — such as a web page, image, or other piece of data. But in the case of an error, the response body will contain the error message. For example, if we request a page that doesn’t exist, and we get a 404 Not Found error, the actual error message shows up in the response body.
Sending HTTP responses manually
Let’s try to write and send an HTTP response manually in terminal. To do that, first we will run ncat in terminal to listen for connections, like this:
ncat -l 9999
Next, we will open up a browser and enter the following URI: http://localhost:9999
When we press enter in browser, in our terminal we will now see the text of the request that the browser sent:
At this point, we can send an HTTP response back to our browser by typing it into the terminal, right after the headers the browser sent, like this:
HTTP/1.1 307 Temporary Redirect Location: https://www.utilizewindows.com/
Note that we sent a 307 Temporary Redirect status code, together with a Location header. This will make our browser to go to https://www.utilizewindows.com/ site.
Let’s try something else. Again we will run ncat by using “ncat -l 9999” and point our browser to “http://localhost:9999”. This time we will return the following response (keep the empty line between the headers and the body):
HTTP/1.1 200 OK Content-type: text/plain Content-length: 6 Hello!
So, this time we send status code 200 OK, and provide exact body of the response. If we now go to our browser, we can now see the content of the response body.
Great! We have seen how we can manually play the part of an HTTP client or server.
More on Caching
Imagine a web service that does a lot of complicated processing for each request. Pretty often, users make the same request repeatedly, so it’s useful if the service can avoid recalculating something it just figured out a second ago. It’s also great if the service can avoid re-sending a large resource to the client if it doesn’t have to.
One way that web services avoid this is by making use of a cache, a temporary storage for resources that are likely to be reused. Web systems can perform caching in a number of places, but all of them are under control of the server that serves up a particular resource. That server can set HTTP headers indicating that a particular resource is not intended to change quickly, and can safely be cached.
There are a few places that caching usually can happen. Every user’s browser maintains a browser cache of cacheable resources like images. The browser can also be configured to pass requests through a web proxy, which can perform caching on behalf of many users. Finally, a web site can use a reverse proxy to cache results so they don’t need to be recomputed by a slower application server or database.
More on Cookies
Cookies are a way for a server to ask a browser to remember a piece of information, and send it back to the server when the browser makes subsequent requests. Every cookie has a name and a value. It also has rules that specify when the cookie should be sent back.
Cookies are used for several things. For example, the server can send each client a unique cookie value, and in that way tell each clients apart (when they send those cookies back). This can be used to implement things like sessions and login. Cookies are often used by analytics and advertising systems to track user activity from site to site. They are sometimes used to store user preferences for a site, etc.
The first time the client makes a request to the server, the server sends back the response with a Set-Cookie header. This header contains three things: a cookie name, a value, and some attributes. Every subsequent time the browser makes a request to the server, it will send that cookie back to the server. The server can update cookies, or ask the browser to expire them.
For example, in the browser the cookies look like this:
There are eight different fields used in HTTP Cookies. The first two, the cookie’s name and value. They will both be sent back to the server. There are some syntactic rules for which characters are allowed in a cookie name; for instance, they can’t have spaces in them. The value of the cookie is where the “real data” of the cookie goes, for instance, a unique token representing a logged-in user’s session.
The next two fields, Domain and Path, describe the scope of the cookie. This indicates which queries will include it. By default, the domain of a cookie is the hostname from the URI of the response that set the cookie. But a server can also set a cookie on a broader domain, within limits. For instance, a response from www.utilizewindows.com can set a cookie for utilizewindows.com, but not for com.
Expire field indicates a time when the server wants the browser to stop saving the cookie. There are two different ways a server can set this: it can set an Expires field with a specific date and time, or a Max-Age field with a number of seconds. If no expiration field is set, then a cookie is expired when the browser closes.
SameSite field allows servers to assert that a cookie ought not to be sent along with cross-site requests, which provides some protection against cross-site request forgery attacks (CSRF).