"What new thing will you tell me about today?"
"Many things. But to start, I think we'll discuss working with a network and the Internet. Interested?"
"Yep. The Galactic Internet is super cool."
"Okay, but let's start with some history. At the beginning of the 21st century, the situation was this…"
"Every computer connected to the Internet had a unique number. This was an ordinary 4-byte number. It is called the IP address."
"But humans have poor memory and struggle to remember something like 2108458776, so they often write each byte separately."
"If we split the four-byte number 2108458776 into separate bytes, we get 22.214.171.124. As you will recall, each byte consists of 8 bits and can contain numbers from 0 to 255."
"So, that's how we're writing the number?"
"Yes. It's just easier (for humans) to remember four-byte numbers when they are written this way."
"As it happened, the choice to use only 4 bytes soon played a cruel trick on them. The number of devices connected to the Internet grew so quickly that they soon ran out of numbers."
"How did they get around that?"
"They did what humans typically do."
"They came up with a new standard for IP addresses and proudly named it IPv6."
"Unlike a normal IP address (called IPv4) that uses 4 bytes to form a unique number, the new standard uses 16."
"Just think of it, humans couldn't remember 10 digits in an ordinary number (like 2108458776), so they had to divide them into 4 parts, but then they thought to use numbers consisting of 16 bytes."
"Yeah, sometimes humans are weird."
"Yep. Humans are humans."
"That said, they did get out of their predicament."
"They got tired of remembering the numbers and decided to replace them with words."
"How's that? Could you give me an example?"
"Of course, web.mail.com, google.com, new.books.amazon.com, …"
"This sort of name is called a domain."
"In order for this Internet to work properly, they created a special table called the Domain Name System (DNS) that stores the IP address of each domain name."
"Here's how it works."
1) A user enters an address in a browser, for example, web.mail.com.
2) The browser accesses the DNS and uses the domain name to get the IP address.
3) A request with the required URL is sent to this IP address.
"That doesn't look very simple."
"But this approach does have several benefits:"
"1) Humans find it easy to remember names that can be verbalized."
"2) Domain names can be built hierarchically by adding subdomains to the beginning of a name. Exactly like a package name in Java."
"3) If you need to change the IP address of the web server, you only need to change the DNS record, and everything will work as it did before — users don't have to remember a new address."
"The DNS looks something like this:"
|Domain name||IP address|
"Anyway, a domain is the name of a computer, but we don't need the computer — we need what's on the computer. This is what URLs are for."
"Initially, a URL was actually a link to a file on another computer. For example:"
|http is the protocol for client-server communication
info.codegym.cc is the computer's domain name
user/info/profile.html is the path to the file on the computer
"At the very beginning of the network development, a web server was only able to use a URL to serve files that it was storing somewhere. The URL was actually a global path to the file: computer name + path."
"Later, when web servers started generating files themselves, URLs changed a bit and became a request to the web server. Request parameters were also added."
"Today it's rare to see a file extension at the end of a URL. "A modern URL is just a unique link with parameters. More like a method call rather than a global file path."
"A classic modern URL looks like this:"
|Parsing the URL|
|Description of parts of the URL|
|codegym.cc is the domain name — the unique name (address) of a computer on the Internet|
|http is the protocol for client-server communication|
|alpha/api/contacts is the web server request or request for a webpage on the server|
|userid=13 & filter=none & page=3 is a string with the request parameters|
"Yeah, I remember. You told me about URLs recently."
"And about ports too. You used the example of an apartment building."
"It'd be better to tell me what 'http' is. I see 'protocol' written everywhere, but I'm not clear on what that is."
"OK. I'll tell you without further ado."
"HTTP stands for HyperTextTransportProtocol and is for transferring hypertext."
"Roughly speaking, a protocol is a set of rules for communication. It describes the requests that can be sent to a web server, and in what format, as well as how the web server should respond."
"In short, the situation is this. Ordinary text files or, if you like, large chunks of text are sent between the client and server."
"A request comes to the server, and server provides a response to each request."
"Here are examples of such a request and response:"
GET alpha/api/contacts HTTP/1.1 Host: codegym.cc User-Agent: Mozilla/5.0 (X11; U; Linux i686; en; rv:1.9b5) Gecko/2008050509 Firefox/3.0b5 Accept: text/html Connection: close
GET – request subtype alpha/api/contacts – request to the web server HTTP/1.1 – protocol version – HTTP/1.1 Host: codegym.cc – domain name User-Agent: Mozilla/5… – unique browser name Accept: text/html – requested document type: HTML Connection: close – close the server connection after processing the request.
"The first line is the actual request. What follows are additional request parameters, also known as 'header fields'."
"And here's an example of a response:"
HTTP/1.1 200 OK Date: Wed, 11 Feb 2009 11:20:59 GMT Server: Apache X-Powered-By: PHP/5.2.4-2ubuntu5wm1 Last-Modified: Wed, 11 Feb 2009 11:20:59 GMT Content-Language: en Content-Type: text/html; charset=utf-8 Content-Length: 1234 Connection: close <html><body><a href="http://ample.com/about.html#contacts">Click here</a></body></html>
HTTP/1.1 200 OK - «200 OK» means everything is okay. Date: Wed, 11 Feb 2009 - Date on which the request was processed Server: Apache - Name of the web server X-Powered-By: PHP - The server uses PHP Last-Modified: Wed, 11 Feb 2009 - The time of the last update of the requested file Content-Language: en - The language of the file Content-Type: text/html; charset=utf-8 – This is an HTML-file with UTF-8 encoding Content-Length: 1234 - The response is 1234 bytes long Connection: close - The connection will be closed after the request is handled <html><body><a href="http://ample - The HTML file itself.
"I want to draw your attention to two things:"
"First, no matter what you request, it looks like a file request to the server. It doesn't matter whether the file is on the server or the server generates it in response to the request."
"Second, the file itself is sent as part of the HTTP response. In other words, we see some additional data at the beginning of the server's response, and then the body of the file being served."
"How interesting! I'm not sure I understood everything. I'll read this again later."
"Oh, I do want to tell you about one other small, but interesting, thing: cookies."
"What are those?"
"According to the HTTP protocol, cookies are small pieces of information that the server sends to the client for storage on the client. And they are sent back to the server as part of subsequent requests."
"And what's the point of that?"
"Suppose a user signs in on the home page of a website. The server creates a session object on the server for this user, and a unique session number is sent to the client as a cookie. During the next request from the client to the server, this session number, together with other cookies, will be sent back to the server. This means the server can recognize the user who sent the new request."
"Yep. When you write your own servlets, we'll take a closer look at this topic. But for now, let's take a break."
"Whatever you say."
GO TO FULL VERSION