The DNS (Domain Name System) is used to translate a domain name (e.g: www.grokit.ca) into an IP address (e.g: 74.125.20.121). This is a necessary since data is routed on the Internet using IP addresses, not domain names.
Even though you type a human-readable domain name in your browser, it cannot be directly used to convey data on the Internet (IP addresses are used for that purpose). Domain names are used because we, humans, can remember better long series of letters than long series of numbers. Therefore, we built a system to allow us to type a (human-readable) domain name in a browser that gets resolved into an IP address by the Domain Name System. This IP address can then be used to communicate on the Internet.
A DNS request is typically done using UDP User Datagram Protocol over port 53.
In order to observe the mechanics of DNS, let's capture Internet traffic and see what happens when I type a domain name in my browser. In order to do that, I'll start a Wireshark capture and issue a ping to www.grokit.ca (ping is an ICMP protocol command, which will be covered in a separate article. The thing to remember here is that ping needs to know the IP address associated with the queried domain name so it will trigger DNS lookup). Here it is:
43 1.161347000 192.168.1.8 192.168.1.1 DNS 73 Standard query 0x338a A www.grokit.ca
52 1.259355000 192.168.1.1 192.168.1.8 DNS 123 Standard query response 0x338a CNAME ghs.googlehosted.com A 74.125.28.121
You probably noticed that it goes to address 192.168.1.1, which is a local router. DNS is a hierarchically cached system, with caches at router, ISP, closest DNS cache and root server. This is done so that if many people connected to my router issue a DNS request for the same server, it only has to go so far up the cache. Since DNS entries do not change very quickly, caching works remarkably well.
If you are curious to see all the bytes, here is the request:
43 1.161347000 192.168.1.8 192.168.1.1 DNS 73 Standard query 0x338a A www.grokit.ca
==>
0000 00 26 62 ae 71 b4 a0 88 b4 e4 31 64 08 00 45 00 .&b.q.....1d..E.
0010 00 3b 10 a9 00 00 80 11 a6 af c0 a8 01 08 c0 a8 .;..............
0020 01 01 c1 de 00 35 00 27 4d 15 33 8a 01 00 00 01 .....5.'M.3.....
0030 00 00 00 00 00 00 03 77 77 77 06 67 72 6f 6b 69 .......www.groki
0040 74 02 63 61 00 00 01 00 01 t.ca.....
Up to "33 8a ..." it is just a standard IP-UDP message from 192.168.1.8 (my computer) to 192.168.1.1 (the local DNS cache, which is my router).
Response:
00000000 33 8a 81 80 00 01 00 02 00 00 00 00 03 77 77 77 3....... .....www
00000010 06 67 72 6f 6b 69 74 02 63 61 00 00 01 00 01 c0 .grokit. ca......
00000020 0c 00 05 00 01 00 00 07 08 00 16 03 67 68 73 0c ........ ....ghs.
00000030 67 6f 6f 67 6c 65 68 6f 73 74 65 64 03 63 6f 6d googleho sted.com
00000040 00 c0 2b 00 01 00 01 00 00 00 44 00 04 4a 7d 1c ..+..... ..D..J}.
00000050 79
If you look at the last hexadecimal values: 4a 7d 1c 79. Translated to decimal this is (using Python shell):
>>> 0x4a, 0x7d, 0x1c, 0x79
(74, 125, 28, 121)
74.125.28.121, which is the IP address corresponding to www.grokit.ca.
You could dig into the DNS RFC (or here), but for the scope of this article I think it's sufficient to show that you needed to know the IP address for www.google.ca, and you got 74.125.28.121. Nice, now you can issue an HTTP request and communicate with the website.
The astute observer may have noticed that visiting http://74.125.28.121 brings you to a generic Google page, which is different than what you get if you point your browser to http://www.grokit.ca. Is DNS lying? No, there is a small detail remaining so that we can complete our DNS understanding in the context of visiting a website.
Although in some case the IP address will directly be the computer which serves the website, in other case it points to a server that hosts many website. For example, my website is hosted on Google App Engine. Like most large web service, it uses Virtual Hosting. In simple terms, it means that it is using the same infrastructure / VIP / computer / IP address to serve many websites, which makes a lot of sense if you are a large company with beefy computers and a limited set of IP addresses.
In order to disambiguate which website that beefy computer should serve, it simply leverages the Host MIME header. When the HTTP request hits the server (74.125.28.121), the browser inserts the Host MIME header. mitmproxy (use Fiddler if on Windows), allows to capture the HTTP request:
Host: www.grokit.ca
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=137660006.823405121.1434689563.1434694212.1440612362.3;
__utmz=137660006.1234689563.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)
Connection: keep-alive
So that machine sees www.grokit.ca
, and probably routes the traffic internally to whichever VM happens to currently be running my service. Just for fun, trigger a HTTP GET to ghs.googlehosted.com with and without the proper Host
header, and you will see the how without the header it will reply 404 Not Found
.
We can use curl to issue a simple GET without the Host MIME header:
$curl ghs.googlehosted.com
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 404 (Not Found)!!1</title>
[...]
... and with the Host MIME header:
$ curl -H 'Host: www.grokit.ca' ghs.googlehosted.com
<!DOCTYPE html>
<html lang="en">
<head>
<link rel="stylesheet" href="./default.css" type="text/css">
<title>List of All Content</title>
[...]
Success, it returns my website!
The tool dig
allows you get get information about DNS servers:
$ dig
[...]
ADDITIONAL SECTION:
a.root-servers.net. 493617 IN A 198.41.0.4
a.root-servers.net. 493633 IN AAAA 2001:503:ba3e::2:30
b.root-servers.net. 493804 IN A 192.228.79.201
b.root-servers.net. 544839 IN AAAA 2001:500:84::b
[...]
It also allows to resolve URL -> IP:
$ dig www.grokit.ca
; <<>> DiG 9.9.6 <<>> www.grokit.ca
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34283
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;www.grokit.ca. IN A
;; ANSWER SECTION:
www.grokit.ca. 865 IN CNAME ghs.googlehosted.com.
ghs.googlehosted.com. 214 IN A 74.125.28.121
;; Query time: 55 msec
;; SERVER: 10.221.228.12#53(10.221.228.12)
;; WHEN: Fri Dec 11 17:53:03 PST 2015
;; MSG SIZE rcvd: 92
Can also use dig to do IP -> URL lookup:
$ dig -x 74.125.28.121 +short
pc-in-f121.1e100.net.
nslookup also allows URL -> IP and IP -> URL translation. For URL -> IP:
$ nslookup -query=any -debug www.yahoo.com
Server: 127.0.1.1
Address: 127.0.1.1#53
------------
QUESTIONS:
www.yahoo.com, type = ANY, class = IN
ANSWERS:
-> www.yahoo.com
canonical name = fd-fp3.wg1.b.yahoo.com.
ttl = 293
AUTHORITY RECORDS:
ADDITIONAL RECORDS:
------------
Non-authoritative answer:
www.yahoo.com canonical name = fd-fp3.wg1.b.yahoo.com.
A fun thing to do with nslookup is to point to a specific DNS server: nslookup www.domainname.com dns_server
.
IP -> URL:
$ nslookup 207.241.224.2
Server: 127.0.1.1
Address: 127.0.1.1#53
Non-authoritative answer:
2.224.241.207.in-addr.arpa name = www.archive.org.
If there are only 13 root servers, it must be trivial to DDoS the DNS service, right? Actually, no. Because: