You open your web browser and browse to google.com and hit enter. What happens? This is an interview question at Google, actually. I'll try to hit the right mix between technical details and general concepts. So, what happens in your browser? The answer is a lot. A lot of electrons moved over a lot of wires. A lot of people own these wires. A lot of people agreed on how those signals are organized into data that can be read by a computer. Then that data is read and understood by the operating system and web browser. A lot of people agreed on the protocols that are used to read that data. There are layers upon layers of meaning in the electrical signal across that wire. And everyone has to agree on how things work on every level, or else traffic just doesn't flow. There are technical problems, like how should data get from one side of the country to the other, but there are also complex issues of authority and ownership. We'll focus on these. We'll find that they often have technical solutions, as they were built by a technical community. We're going to cover two parts of this gigantic picture tonight: Names and physical media (wires). Names are important because every machine on the internet must be uniquely identifiable. The physical media is interesting because wires are tangible goods that are owned by companies, but all their value comes from the data moved over them, which is not material. # So just to load a web page: # * you cross property owned by several companies # * you utilize dozens of protocols and formats. Alphabet soup: TCP/IP, UTF-8, DNS, HTTP, HTML, PNG Very basic assumptions: * You have a computer * It has an IP address. This is something that looks like 98.237.252.207. It is the only computer on the internet with that address. * There is a computer somewhere on the internet with the webpage you want. * It has a unique IP like 66.102.7.103. * You don't want to type in 66.102.7.103, though. You want to type in google.com, which is a "hostname" * The only way to contact a particular computer on the internet is via its unique IP address. * We'll talk more about IP addresses later Hit Enter in your browser, and let's see what happens. * The browser needs to find an IP address for google.com. * This is analogous to looking someone up by name in what they used to call 'phonebooks'. * The browser asks the operating system to perform a "DNS lookup"-- to "resolve" google.com to an IP address that can then be contacted. * The operating system doesn't know, so it needs to ask someone. * Your computer has been configured, probably automatically by your ISP, to know of at least one other computer on the internet that can act as a DNS server. So your computer knows at least one IP address in the world. My DNS server is "8.8.8.8". * Your computer contacts your specified DNS server and sends a question: What is the IP for google.com? * We'll talk about how the two computers speak later, but first... * This is our first major diversion: Who has the authority to say google.com should point to the machine 66.102.7.103? DNS YO! * Essential enough to the internet that it is considered part of the "Internet Suite", TCP/IP. As an example, the world wide and email are not part of TCP/IP. * Without DNS we wouldn't know about "google.com", we'd be talking about 66.102.7.103. Which is a lot harder to market. * Created in 1983 and standardized over the next 5 years. * We've asked a friendly DNS server to resolve google.com * It doesn't know, so it needs to start the lookup process 1. Let's take about "google.com". It's composed of "google" and "com". "com" is its TLD, top-level domain. To proceed, the server needs to find someone who is an authority for the ".com" TLD. So who do they ask? The almighty "root servers". Every DNS server has these servers' IP addresses built-in. Otherwise, they would not know where to start. * There are 13 root servers out there. * Who owns those? * VeriSign * USC-ISI * Cogent Communications * University of Maryland * NASA * Systems Consortium * Defense Information Systems Agency * U.S. Army Research Lab * Autonomica * RIPE NCC * ICANN * WIDE Project * Why? They have to server 10 billion requests a day. Each request is tiny (half a kilobyte), but that's still a lot. * RIPE NCC: a not-for-profit whose mission is to support the infrastructure of the internet. They are funded by their members, for whom they provide various services. * VeriSign: They make a lot of money off the internet and want to give back. Their brand is boosted by the prestige of having a hand in the root-most infrastructure of the internet. And they get a seat at meetings. * University of Maryland hosts a root server as a public service. Like some other hosts, they receive grants to keep the servers running. * Various defense or governmental organizations host root servers as a public service and because the infrastructure of the internet is now critical to military and government operation. * What would happen without them? * In about 48 hours, with the system left as it is, no DNS queries would resolve. Nothing would work. * Allegedly, in 2002, half of them were taken down by a cyberattack. No one on the internet noticed. * In reality, they are 200 servers masquerading as 13, and even if they were all somehow held down for 48 hours, there are contingency plans. Don't worry. * They even run different software where feasible; they don't share their operational knowledge so that each setup is fairly unique. * The servers are coordinated by ICANN, the Internet Corporation for Assigned Names and Numbers. (actually through the ICANN subsidiary, IANA, the Internet Assigned Numbers Authority) * Not-for-profit * Until 2007, Vint Cerf, the "creator of the internet", was Chairman of the ICANN board * ICANN has control over all the TLDs. They fund their operations by charging for access to .com/.uk/etc. * Members of the ICANN board are put through a nominating process and, in theory, ICANN takes significant input from the community. For instance, ICANN went through the process of trying 2. We've found out what server is authoritative for ".com". So, ask it for the authoritative DNS server that handles "google.com". * It turns out the .com server is also administrated by VeriSign. * VeriSign thus has the power to tell you which DNS server to go ask about google.com. They hold a lot of power here, so what do they do with it? * Make a lot of money. * The US Department of Commerce legally has authority over .com. They contract the administration of it to VeriSign. * A domain costs about $50. About $35 goes to VeriSign. $15 goes to a government fund. (as of many years ago) * When you buy a domain from godaddy.com or networksolutions.com, etc, you are buying an entry in VeriSign's DNS server. 3. Contact google's authoritative DNS server. This is a server designated by Google to be authoritative for its domain. It is run by Google or by someone hired by Google to host their DNS information. * The DNS server acting on your behalf asks this third server for the IP of "google.com" and it finally answers. * Why all the effort to distribute the records? Why not trust a central authority with all of them? * There is so, so much data. So many domain names. So much traffic. Must be reliable. * Who would we trust? Microsoft? Google? Each has their own motives and shouldn't be trusted. * No one is to be trusted to handle all of a certain kind of traffic. Thus a system is devised to distribute it. Cool pattern. * DNS works this way. Email works this way (and a significant part of mail is built atop DNS). The massive amounts of cooperation and work that must into such systems makes these the only two systems that are so distributed. * Twitter does not work this way. Facebook is the only way to message people you know on Facebook. How do you feel about that? * DNS is the only central distributed database on the internet. It's all we need to build everything else on top of it. Thanks to DNS, we now have google.com's IP address and we can go ask it for the webpage we wanted. But first, let's discuss just what is the deal with IP addresses. This is simpler. * Who gives out IP addresses? * In the USA, it's ARIN, the American Registry for Internet Numbers * Always non-profits, manage the 'stewardship' of the internet * An ISP buys blocks of IP addresses from ARIN at about $1250/yr for a few thousand. * Then your ISP gives an IP to you when you pay them. We just have to figure out how to move the right electrons from Seattle to Mountain View. More basic information: * Data on the internet is broken up into "packets", small discrete chunks that move independently. * Packets generally travel several 'hops' to get from one place to another. There's no single line from Google's office to this house. * In a long communication composed of many packets, two packets next to each other might take completely different paths. * This is packet-switching. It's different from how old telephones used to work-- circuit-switching. In circuit switching, a path is set up and held open for the entirety of the communication. I think that's what's happening when you see old movies of operators switching plugs at a switchboard. Gross. * So, I pay Comcast $40/month for my pipe to the internet. Google pays some bigger ISP a lot of money for their pipe. * But Comcast still doesn't have wires in the ground that connect directly to Google, and it might not even have wires that connect to Google's ISP (though it probably does) * Some companies that have fiber optic cable in the ground: * Level 3 * Verizon * AT&T * Sprint * Suppose a user of small ISP A needs to download an MP3 via Napster from a user on small ISP B. These small ISPs maybe have cables laid in their respective cities, but they probably just lease them from a huge telephone-type company. * A can't get to B, but A has a peering agreement with e.g. Level 3. * There is a physical location where A and Level 3 have cables plugged into the same router. * Then Level 3 has long-range fiber cables across the country. * If A is a very small ISP, then A is paying Level 3 for the right to peer with them. This is called paying transit. * Level 3 has never heard of ISP B, but Level 3 routers can consult routing tables using the Border Gateway Protocol and toss packets from A closer to B. * BGP is another kind of distributed database, but it is not nearly as governed as DNS. It is more of an implementation detail. * Suppose the path Level 3 picks goes across Verizon-owned cables and routers. Then the Level 3 routers hand off the packets to Verizon routers. * This is a true peering agreement, as neither Level 3 nor Verizon are paying the other for transit. * They are Tier 1 networks. The definition of a Tier 1 network is that they can route to any other Tier 1 network without paying transit. They're what you would generally consider the global internet backbone. * Verizon hands the packet to ISP B, who is probably paying Verizon transit for the privilege. * Here's a sample traceroute of getting a packet from my house in Capitol Hill to a machine at MIT. 1 10.0.1.1 (10.0.1.1) 0.673 ms 0.626 ms 0.474 ms 2 * * * 3 68.87.205.65 (68.87.205.65) 10.104 ms 8.816 ms 9.978 ms 4 po-10-ur02.seattle.wa.seattle.comcast.net (68.85.240.110) 8.482 ms 10.867 ms 8.480 ms 5 be-70-ar01.seattle.wa.seattle.comcast.net (68.85.240.105) 16.977 ms 10.123 ms 10.983 ms 6 pos-0-10-0-0-cr01.seattle.wa.ibone.comcast.net (68.86.90.209) 13.980 ms 8.894 ms 11.955 ms 7 te-3-3.car1.seattle1.level3.net (4.79.104.109) 12.474 ms 8.377 ms 15.980 ms 8 ae-32-52.ebr2.seattle1.level3.net (4.68.105.62) 20.978 ms 19.592 ms 17.978 ms 9 ae-2-2.ebr2.denver1.level3.net (4.69.132.54) 44.464 ms 37.062 ms 35.961 ms 10 ae-3-3.ebr1.chicago2.level3.net (4.69.132.62) 61.962 ms 65.891 ms 59.729 ms 11 ae-6-6.ebr1.chicago1.level3.net (4.69.140.189) 61.436 ms 66.618 ms 71.660 ms 12 ae-1-5.bar1.boston1.level3.net (4.69.140.93) 113.426 ms 91.546 ms 93.671 ms 13 ae-7-7.car1.boston1.level3.net (4.69.132.241) 90.930 ms 112.296 ms 92.206 ms 14 * * * 15 oc11-rtr-1-backbone-2.mit.edu (18.168.1.41) 99.675 ms 100.044 ms 102.168 ms 16 * * * * Note that Comcast is not in the business of getting packets from Seattle to Denver, as they hand off the packet while still in Seattle. * But, all this is changing. 5 years ago, peering agreements and payments depended on the amount of data being moved across the pipes. * ISPs are quickly conglomerating, however. More traffic is going through bigger ISPs, and also to bigger destinations. * Google is responsible for between 6% and 10% of all traffic on the internet. * YouTube is essential to what people consider to be the functioning of the internet; thus ISPs want better and faster connections to YouTube, driving down the price for Google's internet access. * Down so much, in fact, that YouTube doesn't pay for bandwidth. This is unprecedented. But now one could imagine a situation where Comcast approaches Google offering to pay them transit for a better connection to YouTube. Then Comcast could advertise that it plays YouTube in HD better than its competitors. This is a very recent development and it's highly unclear where this trend will go. We've looked at stakeholders involved in the most basic levels of how the internet works, but the pattern continues all the way up to protocols like the world wide web, languages like HTML, and video formats like MPEG, and every other level of communication on the internet. At each level there are interests represented by a stewardship organizations, government organizations, and corporate entities. Remember that for anything of meaning to happen on the internet, there is a lot that must be agreed upon. Luckily, systems have been established that seem to get this balance right. There is fairly little dissent about the core architectural issues of the internet.