How is data transferred over the internet?
September 21, 2020 5:19 PM   Subscribe

Is there a relatively simple but also detailed breakdown of how data is transferred over the internet? It's not just a series of tubes, right?

For example, if I send a text message from one person in the US to another person in, say, Europe, presumably the following happens:

*A sensor in my cell phone records the letters I'm pressing in my phone and turns it into data
*The software in my phone takes the digital data of the message I've written (perhaps encrypting it first if it's via an app like Whatsapp), and sends it to a cell phone tower.
*The cell phone tower connects to the Internet, which is a series of interconnected computers across the world.
*Within the internet, the data gets sent through a transatlantic fiber optic cable.
*In Europe, the data gets sent to a Wifi router in someone's home.
*The wifi sends the data to a cell phone
*The software on the cell phone decrypts the data and then displays it by lighting up some very, very small LCD lights.

I'd imagine what I just wrote above is very simplistic at times, skipping key steps all over the place, and wrong at other times. And of course the key points would be different depending on where/what the data is coming from, going to, and I'm sure much more. Does every step happen at the speed of light? Famously the internet is NOT just a series of tubes... but how does it work?

I'm interested in the physical infrastructure as much as what's happening digitally - computer servers, cell towers, data cables, server farms, satellites (e.g. Starlink and similar, to the extent that is or could be real in the years to come) -- that sort of thing.

If not an explain-like-I'm-5 explanation, I'd love an ELI in high school explanation! Perhaps a Wired article or similar -- something readable but comprehensive.
posted by lewedswiver to Technology (27 answers total) 35 users marked this as a favorite
Andrew Blum's Tubes: Behind the Scenes at the Internet makes this topic a fun, engaging read, and yes he gets right in amongst the aerials/dishes, pipes, tubes, tunnels, underpasses and easements - all the hard-infrastructure places, plus enough bits and bytes to give you a feel, along with meeting a range of people that run things.
posted by unearthed at 5:38 PM on September 21, 2020 [10 favorites]

Does every step happen at the speed of light?

Very few steps happen at the speed of light. The velocity factor of materials indicates how fast materials are compared to the speed of light. Even fiber doesn't operate at the speed of light - internal reflection in the cable causes the propagation to be more like 2/3 the speed of light. Radio communication (wireless) is pretty close to the speed of light, but distances are significantly limited.

I think you'd be interested in Ingrid Burrington's Networks of New York which gets into the details of the invisible infrastructure of the internet with very specific examples.

You might also be interested in internet exchange points, where major telecommunications networks physically connect with each other. Essentially, IXPs are where the "interconnected networks" of the internet actually connect. They are generally not-too-interesting buildings with rooms that contain equipment from multiple telecommunications companies, with wires connecting each rack of equipment from each company.
posted by saeculorum at 5:50 PM on September 21, 2020 [9 favorites]

Also, fun fact ... signals in metallic and fiber optic cables typically travel closer to 2/3 the speed of light!

Edit: Damn. Looks like saeculorum already pointed this out!
posted by ZenMasterThis at 5:51 PM on September 21, 2020 [1 favorite]

Oh Gosh, I haven't read Blum's book mentioned above, but just some of the things that comes to mind are the development of the Internet from its ARPA foundations (communications in case of nuclear war which influenced its packet structure, redundancy, flexible routing and TTL or time to live to insure a "packet" would die if it didn't reach its home in a certain number of hops).

Back in the early days there were RFC (Request for Comment) documents describing elements of the Internet, and even an RFC TCP Tutorial.

There are even humorous RFCs such as RFC 1149.

It was also a kinder/gentler time and that unfortunately meant that the Internet left itself open for DOS (Denial of Service) attacks, where third party sites would be attacked by "sorry, wrong number" responses for messages that party never sent.

Of course there are also many different forms the data takes, such as for Web pages, file downloads, remote terminal, video camera, Usenet (first rule of Usenet, don't talk about Usenet), etc.

There's speed of light in a vacuum, through copper, out to a geosynchronous satellite and back to a ground station, etc. There's proxy servers and CDNs to mediate some of this.

I am sure others will do a better job than I, but I have always found this interesting, and I especially wanted to make sure the humorous RFC was mentioned.
posted by forthright at 5:54 PM on September 21, 2020 [4 favorites]

What you're asking for is basically an entire college level networking course (or at least, a pretty significant chunk of one; I know this because I've TAed such classes many years ago). I can't cram all of that into a comment, but I can give you some jumping off points (in no particular order).

* Traditionally the various protocols and such have been modeled using 7 layers, from the physical transmission mechanisms (how a signal on a wire is converted to 0/1) all the way up to the application layer (what the user sees).

* Here's an explainer whitepaper from 20 or so years ago that is pretty much still all true.

* On the specific topic of undersea fiber, Neal Stephenson wrote an article for Wired back in the 90s where he followed an undersea fiber project from end to end. One of my all-time favorite wired pieces.

* Staying just within, say, the US (thus avoiding the issue of transcontinental cable), the internet isn't just a bunch of computers, it's actually a tiered system. Big networks like, say, AT&T are termed Autonomous Systems (AS), and they communicate between themselves (using Border Gateway Protocol [BGP]) to get data packets where they need to go. Within each AS is a subnetwork of routers and switches that move traffic either in from the border to a customer or from a customer to the border (or sometimes from customer to customer). This is accomplished using an Interior gateway protocol.

* You can think of an AS sort of like Fedex or UPS, except imagine that instead of Fedex handling the whole delivery from point A to point B, some people were in Fedex territory and some in UPS territory, so sometimes a Fedex package gets handed over to UPS at some point for delivery to their customer. In the real world, networks often either try to offload packages onto the other guy's system as fast as possible (making him do all the work) or keep it for as long as possible (to retain control); this is referred to as either hot potato or cold potato routing.

* Autonomous systems and their constituent networks interact in physical places called points of presence, which are typically big air conditioned data centers chock-a-block with computers and cables and some generators out back (or on the roof).

* At the edges of a network are folks like you and me. The local connection to my house (or yours) is often referred to as the last mile.

That's a pretty big swath of the internet I've touched on, though I am sure I missed some interesting nooks and crannies, but hopefully that's a good point to go down some rabbit holes and read about the parts that interest you.
posted by axiom at 6:02 PM on September 21, 2020 [16 favorites]

A related question that's become such an interview cliché that there's a rather comprehensive answer for it:

What happens when you type into your browser's address box and press enter?
posted by CrystalDave at 6:22 PM on September 21, 2020 [9 favorites]

So I hate to say this, but pick anything else, like Whatsapp as an example, because SMS messages don't necessarily get routed over the internet. They are very old phone company infrastructure that was deployed at a time when yes, technically the internet existed, but it did not overlap with phones at all. So sometimes, yes, SMS messages go over the internet, but sometimes they go over an older telephone network protocol called SS7. And SS7 is an internetwork of sorts, but it is absolutely 100% not The Internet as it is commonly understood (UDP, TCP, etc).

Blum's book is as good a place to start as any but as axiom says, this is several courses of university-level engineering curriculum. Every single one of your bullet points has immense near-bottomless rabbit holes. For example: wifi and all mobile phone WAN protocols (3G, 4G etc) use some form of spread spectrum signalling which is in some cases pretty straightforward (TDM) and in some cases is the result of incredible world-altering mathematical breakthroughs which are hard to understand and moderately hard to explain - code-division multiple access (if you recall the days of Verizon being advertised as a "CDMA" network)works because we understand Shannon's limit. Which has been superseded in practice by OFDM used in modern 4G networks.

Anyway, all that is just radio transmissions! You ask a question to which the answer has no end.
posted by GuyZero at 6:25 PM on September 21, 2020 [4 favorites]

*Within the internet, the data gets sent through a transatlantic fiber optic cable.

I also feel compelled to point out that I took an entire senior-level engineering course on just this and failed it, although I was pretty close! This covers the physics of light transmission in a bounded medium, the material science of making glass fibers a fraction the diameter of a human hair and choosing which wavelengths it transmits or rejects, the operational aspect of laying and managing optical signals over long distances (like optical amplifiers) and the entire associated field of making laser diodes, narrowband photo detectors, etc. Not to mention the dirty details of splicing two fiber optic cables where you need to fuse two pieces of glass, each a quarter millimeter in diameter, while ensuring the joint remains optically clear.

(Those labs were terrible and weren't even why I did so bad! It was all the triple integrals!)

Once again, this one line opens a vast amount of things to explain.
posted by GuyZero at 6:34 PM on September 21, 2020 [4 favorites]

* On the specific topic of undersea fiber, Neal Stephenson wrote an article for Wired back in the 90s where he followed an undersea fiber project from end to end. One of my all-time favorite wired pieces.
Came in to mention this. It's called "Mother Earth Motherboard" and it's very 90s what with the "hacker tourist" framing and everything, but oh boy is it a good read. It's like 50 thousand words, so be warned. But I surely recommend it. I think it'll scratch the itch as described.
posted by Horkus at 6:47 PM on September 21, 2020 [5 favorites]

Something that's been mentioned in a roundabout way but not directly yet, is that all data strings contain headers and footers which say hi I'm this type of data, I came from z I'm going to y here's the data, and then the same thing at the end noting the end of the message. All devices have unique identifiers so the packets don't get sent off to the wrong place , but user error can make this happen.
posted by AlexiaSky at 8:39 PM on September 21, 2020 [1 favorite]

Those legacy telco networks are quite interesting in their own right. The difference between the two is a fantastic example of multiple technologies with vastly different engineering philosophies behind them where the notionally "better" one loses out due to economic forces that drive the cost down of the worse, but more widespread, system. (Better in the sense of *guaranteeing" reliability as opposed to working around expected unreliability)

In practice, of course, both models work very well, which highlights that there is always more than one way to do things even though hindsight often makes it look like whatever ended up reaching ubiquity was the one true way and patently obvious from the very beginning.

As you dive down the rabbit hole of the nuts and bolts of the hardware and protocols you can really see how the two underlying philosophies are so radically different in how they approach the transmission of information yet converge on the same approximate appearance to the end user in nearly every respect.
posted by wierdo at 8:41 PM on September 21, 2020

Yeah, so the thing with telcos is that historically they were circuit switched networks, which allocated guaranteed resources to a call until it is complete, which results in better quality but worse efficency. So it's not surprising that when you look at SS7 or really anything that would be built by telcos, their first instinct is going to be to think in terms of circuit switching (n.b.: this is not a shortcoming necessarily, and there are a lot of historical factors at play).

The internet is a best-effort packet switching system, which utilizes its bandwidth very efficiently but relies on higher level protocols to improve reliability/quality.

If I were going to point at one thing that's really powerful about the internet (and maybe computer science in general) is that it's all about abstraction. If you think of packets just like envelopes, you start to realize that you can put one envelope inside another, and so long as everyone handles their type of envelope right, everything will work out.

Imagine it's say 1851, California just became a state, and you want to send a letter to cousin Joe in Sacramento from your house in NYC. Say you live in an alternate universe where each state has its own separate postal service. Well, you'd just take your letter and put it in a CA state letter addressed to Joe, then that inside a Nevada one addressed to CA, etc. working backward to NY on the outside, and drop it in the mailbox. Each successive state would unwrap its letter and do its thing. Well the same thing can happen with packets, so you can take one messaging format and stuff it inside another for transit across the medium that requires that other format. Indeed, the networking stack does the same thing but instead of physical mail, you put a higher level-format packet inside the kind of packet used by the lower level (perhaps then inside another even lower level), so your computer passes your letter down through its stack, across a wire, then another and another, and up through the Joe's computer's stack so he can read your message.
posted by axiom at 9:04 PM on September 21, 2020 [4 favorites]

encapsulaton; decapsulation

headers being added and removed

it served me well to keep those concepts in mind as i learned about internetworking.
posted by armoir from antproof case at 11:43 PM on September 21, 2020

Is there a relatively simple but also detailed breakdown of how data is transferred over the internet?

Short answer: No.

Long answer:

Data transfer over the internet is a process of absolutely mind-mangling complexity compared to transfer over, say, AM radio - which was the dominant technological mode of information transfer when I was a kid, and allowed anybody to make their own receiving station using stuff you could find lying around.

The only way to wrap your head around how the internet works is to build your understanding in abstracted chunks that deliberately ignore as much detail as possible in order to free up cognitive resources for grasping the main ideas. This will give you the basics, but in order to understand how the internet fails you need to start digging deeper into each of those chunks in order to work out how and why your abstractions are leaking.

And there is simply no end to the depth of that digging. Given the state of the art in 2020 you would literally need many lifetimes to get it all done. But there are many many many more people than you working on changing and updating and (by and large) improving the state of the art, so by the time you'd expended the required number of lifetimes, you'd need literally exponentially more of them to catch up with the new state of the art.

The best way I've yet seen to organize my own knowledge of how the internet works is to start with a structure akin to the OSI seven layer model and hang everything else off of a generalization of that.

So in your question, you told a small story about how some sensor impressions on your phone made a trip across the internet and wound up affecting some LCD lights on somebody else's. The hero in that story was "the data", and the story gives the impression that the storyteller thinks of "the data" as something identifiable in its own right, created at the point where finger meets sensor, and with some kind of object persistence akin to that of a mark that a pencil might make on a piece of paper.

But to my way of thinking that's not a particularly illuminating way to think about what's happening. Rather than thinking of "data" as something that starts with touches on a sensor on your phone, you'd be better off thinking about "the message" as a thing-in-itself that starts in your mind and which you're trying to copy, whole, to somebody else's mind.

It then becomes clear that the internet itself is a detail that you can and should abstract away in the first instance. It becomes just one of many comms methods you could use to implement the top-level task of transferring an abstracted message from an abstracted sender to an abstracted receiver. Moving your finger over a phone sensor then becomes recognized as an activity several levels down in one particular protocol stack, and you'll realize that you also had to move that finger over that sensor rather earlier than when you were transcribing the first part of the message itself, because the protocol you're using required you to define and/or choose some kind of envelope to contain that message before you got anywhere near worrying about its actual content.

At which point it becomes apparent that moving the finger over the sensor is really not something you need to be dealing with at all right now, because what you really need to understand is something rather further up the stack: how does your phone know how to send anything to somebody else's? And does it actually matter that it's a phone, and not, say, a tablet? Or a desktop computer? Or a spam bot?

And now you've stepped far enough away from your message to be asking the right kind of questions and you're on a solid path to understanding how some particular underlying comms pathway gets its many many jobs done.
posted by flabdablet at 11:45 PM on September 21, 2020 [8 favorites]

Middling answer: what armoir from antproof case said.
posted by flabdablet at 11:51 PM on September 21, 2020 [1 favorite]

A few years ago I wrote a page that tried to describe in layman's terms how data is sent over the internet. It kind-of got away from me but you might find the first half of it useful: How You Are Reading This Page.

Briefly: the internet IS a series of tubes. The clever part is the machines on the ends of those tubes that break data into sendable chunks (called packets) and reassemble them. In the linked document I use the analogy of sending a novel via postcards. The postcards are packets - the Internet is the system of sending the postcards through a succession of hops plus the scheme for reassembling the novel at the receiver's end.
posted by AndrewStephens at 5:35 AM on September 22, 2020 [3 favorites]

Oldie but goodie: 8-minute YouTube video, in German but found one with English subtitles. From the (to German speakers) iconic kids TV series Die Sendung mit der Maus (the show with the mouse). Used to show it in the first lecture when teaching networking to computer science students.
posted by meijusa at 8:45 AM on September 22, 2020

There are actually devices that do each of those conversions that are mentioned above - it is complex, but then again not. Each of the function people are describing above are just different specialized computers.

For example, a Packet Data Network Gateway changes the message data your cell message transmits into a internet-message and then there is another in Europe to change it back.

CISCO PDNGW or PGW for short They aren't that different from the router sitting in your home.
posted by The_Vegetables at 8:49 AM on September 22, 2020

Each step in the process of communication is either transformation or transportation. Data is either changed from one form to another, or moved from one place to another. The whole point of the internet is that you don't have to care about how it happens, because anything that accepts data at one point and reproduces it at another point can be part of the internet (and sometimes is). A pigeon could be part of the internet, sending packets back and forth to a lighthouse on a rock. It would be slow, and would need some error connection, but it would be perfectly valid.

It's just fun to think about. A container full of hard drives is often a faster way to get a huge amount of data from one place to another.

So anyway, yeah, it's all tubes, and importantly also adapters that let different kinds of tubes connect to each other.
posted by seanmpuckett at 11:02 AM on September 22, 2020 [1 favorite]

Understanding the 7-layer OSI model, cited by axiom above, is the best place to start. It abstracts the different levels of communication implied by any device-to-device interaction without getting bogged down in the details of this or that protocol. Once this general model is clear in your head, understanding the various implementations that are used at every level become a lot more comprehensible.
posted by SPrintF at 11:44 AM on September 22, 2020

The OSI 7-layer model is indeed very conceptually useful, but it's also important to point out that it was written by the losers of the networking wars and that the modern internet has numerous protocols that operate on multiple levels at the same time. The 7-layer model does not model TCP/IP.

I think something more useful might be to just understand packet-switched networking in general, which helps set the stage for things like network routing, names resolution, connection-oriented vs connectionless protocols etc.
posted by GuyZero at 12:37 PM on September 22, 2020 [2 favorites]

If you like the idea of starting with learning about electrical pulses representing ones and zeros going over a wire and working up from there, this video series by Ben Eater is just under two hours long and is both clear and pretty thorough for something that short. It might be a good next step after you've read some of the excellent introductory articles linked about.
posted by Busy Old Fool at 3:04 PM on September 22, 2020 [2 favorites]

The 7-layer model does not model TCP/IP

More nearly true to say that TCP/IP doesn't map terribly cleanly onto the model. That reflects conceptual muddiness in the design of TCP/IP, not anything terribly wrong with the model.

The reasons that TCP/IP "won" the networking wars were always more about politics and ego than engineering merit.

If you want to understand why a "loser" model remains absolutely fundamental to a solid grasp of internetworking, get yourself a copy of Interconnections: Bridges, Routers, Switches, and Internetworking Protocols (2nd Edition) by Radia Perlman.
posted by flabdablet at 5:49 PM on September 22, 2020 [3 favorites]

It might be worth noting here that the research Stephenson did for his article at least partially inspired his novel Cryptonomicon, which involves Nazi super-submarines, gold, cryptography, a lot of caves, and undersea cables. Among other things.
posted by lhauser at 7:25 PM on September 22, 2020

"displays it by lighting up some very, very small LCD lights"

Actually, it's the reverse of this. An LCD works by blocking [or reflecting*] light, so normally you'll have an LED powered backlight and this is selectively blocked/unblocked by very, very small LCD cells, forming an image.

[*] You don't see this type much these days.
posted by HiroProtagonist at 10:09 PM on September 22, 2020 [1 favorite]

You might check out this: CCNA Routing & Switching 200-125 - Free CCNA Study Guide. That's for one part of the Cisco Certified Network Associate certification exams.

Bits are specific to Cisco devices, but the rest is pretty generic networking stuff that apply to almost everything.

If you really want to fool around with networking... check out ns-3 | a discrete-event network simulator for internet systems, or the prettier version GNS3 Free Network Emulator Tool, or one of the other network simulator programs.

I was a University network person for some 16 years for a small city's worth of people with fingers deep into a bunch of stuff. But I learned by the seat-of-my-pants from other network people so I don't have much in the way of crash-course help. I think axiom and others are more helpful. :)

Networking is broad and deep in some parts, but sorta mostly it's bits and pieces where you only need to know the basics *and* the other bits that you need to use in your environment. And each bit on it's own really isn't that hard to understand. It's mostly just matching numbers and labels (at least until something breaks or you find a bug).
posted by zengargoyle at 9:19 AM on September 23, 2020

The OSI 7-Layer Burrito
posted by rhizome at 11:09 AM on September 23, 2020

« Older I ate it, but should I have? (moldy sweet potato...   |   Other Mobile Games like "The Room" Series Newer »

You are not logged in, either login or create an account to post comments