What are the phases in website infrastructure development?
March 13, 2014 5:04 AM Subscribe
I am curious about the process of scaling a website from a small, shared hosting site all the way up to "one of the biggest 10 sites on the Internet" scale. I've heard terms like "dedicated hosting," "sharding the database" etc., but I'm sure there's a lot more that I haven't heard about. What changes on the backend through this process, and is there a typical order that those changes occur?
Well, nowadays, there's lots that happens more granularly of course, moving from shared servers to individual servers to servers + CDN to one's own datacenter (heh), but a truly significant number of both small and very large sites have moved to AWS—Amazon has a $3-BILLION web hosting business, including Netflix and, possibly, the CIA. This is an incredible consolidation and also means that a single host can scale from zero visitors to millions; the client moves through equipment rather seamlessly. (Some of us disapprove of the idea, at least just structurally, that a single company should host a significant percentage of the web. It's gonna be a real event if/when there's a hiccup.) But this has pretty much up-ended the way that we formerly understood companies migrating through from small to larger systems.
posted by RJ Reynolds at 5:14 AM on March 13, 2014
posted by RJ Reynolds at 5:14 AM on March 13, 2014
Best answer: There are three general categories of Stuff Websites Need:
* The web server itself
* Data and file storage (and retrieval)
* Bandwidth
...and the upgrade path for each of those is going to differ depending on what type of website you're dealing with: a site that's serving out a lot of high-bandwidth material like videos or large files for download is going to have a very different structure than one that needs to do small, simple operations for many simultaneous users (a newspaper website for example), which is going to be very different from one that has to do a lot of data processing or complex database lookups for each user session (web-based games for example).
This is a massive oversimplification, because they're generally interrelated, but overall you upgrade the webserver when you have too many simultaneous users, the database server when you have complex or large amounts of data, and buy a bigger data pipe when you, well, need more bandwidth.
At the bottom end of the scale you're on a single computer on a single pipe, and that computer will be shared among many other websites.
The next step up is to have the whole computer to yourself. (This is "dedicated hosting".)
The next step up from that is (generally) to put the database and the front-end server on separate machines.
And at some point you start having to think about optimizing whatever you're doing based on whichever ceiling you're bumping your head into: for example if your web server is getting overloaded you might start pre-rendering whatever you can so the server doesn't need to do as much processing on the fly. How and what you optimize (and the degree to which it can be usefully done) is really situation-specific.
This is the end of simple: from here on out it's SRS BZNS. The next step is what you referred to as "sharding" -- there are lots of different strategies and techniques but basically this boils down to replicating your data and code across multiple database servers or web servers, and splitting your traffic among those various servers so each individual one doesn't get overloaded.
As the number of separate servers grows you probably also start splitting them up geographically as well, to give quicker response to users in different locations and to keep your bandwidth costs under control (a dozen smaller pipes can be cheaper than one ginormous pipe.)
"Top 10 sites on the Internet" is well outside my area of experience, and their setup probably is very specific to those particular sites' needs anyway, so I can't really speak to how that works.
For everyone except the very top and (occasionally) the very very bottom of that spectrum, you're almost certainly not doing any of this stuff in-house; you're renting server space and bandwidth from companies that do that and only that.
As others have mentioned, the existence of AWS and services like it have really fundamentally changed how companies think about doing this. Stuff like keeping data synchronized across multiple servers, or spinning up a new machine automatically when you get a spike in usership for example, is complicated and takes expertise; AWS does a really good job of automating a lot of it and putting it within reach of much smaller websites than would have been able to cope with it five or ten years ago. (I'm currently working with a startup with a development staff consisting of Front End Guy (me) and Back End Guy; there's no way on earth we'd be able to do what we're doing if Amazon weren't handling the nitty gritty.)
posted by ook at 9:51 AM on March 13, 2014 [1 favorite]
* The web server itself
* Data and file storage (and retrieval)
* Bandwidth
...and the upgrade path for each of those is going to differ depending on what type of website you're dealing with: a site that's serving out a lot of high-bandwidth material like videos or large files for download is going to have a very different structure than one that needs to do small, simple operations for many simultaneous users (a newspaper website for example), which is going to be very different from one that has to do a lot of data processing or complex database lookups for each user session (web-based games for example).
This is a massive oversimplification, because they're generally interrelated, but overall you upgrade the webserver when you have too many simultaneous users, the database server when you have complex or large amounts of data, and buy a bigger data pipe when you, well, need more bandwidth.
At the bottom end of the scale you're on a single computer on a single pipe, and that computer will be shared among many other websites.
The next step up is to have the whole computer to yourself. (This is "dedicated hosting".)
The next step up from that is (generally) to put the database and the front-end server on separate machines.
And at some point you start having to think about optimizing whatever you're doing based on whichever ceiling you're bumping your head into: for example if your web server is getting overloaded you might start pre-rendering whatever you can so the server doesn't need to do as much processing on the fly. How and what you optimize (and the degree to which it can be usefully done) is really situation-specific.
This is the end of simple: from here on out it's SRS BZNS. The next step is what you referred to as "sharding" -- there are lots of different strategies and techniques but basically this boils down to replicating your data and code across multiple database servers or web servers, and splitting your traffic among those various servers so each individual one doesn't get overloaded.
As the number of separate servers grows you probably also start splitting them up geographically as well, to give quicker response to users in different locations and to keep your bandwidth costs under control (a dozen smaller pipes can be cheaper than one ginormous pipe.)
"Top 10 sites on the Internet" is well outside my area of experience, and their setup probably is very specific to those particular sites' needs anyway, so I can't really speak to how that works.
For everyone except the very top and (occasionally) the very very bottom of that spectrum, you're almost certainly not doing any of this stuff in-house; you're renting server space and bandwidth from companies that do that and only that.
As others have mentioned, the existence of AWS and services like it have really fundamentally changed how companies think about doing this. Stuff like keeping data synchronized across multiple servers, or spinning up a new machine automatically when you get a spike in usership for example, is complicated and takes expertise; AWS does a really good job of automating a lot of it and putting it within reach of much smaller websites than would have been able to cope with it five or ten years ago. (I'm currently working with a startup with a development staff consisting of Front End Guy (me) and Back End Guy; there's no way on earth we'd be able to do what we're doing if Amazon weren't handling the nitty gritty.)
posted by ook at 9:51 AM on March 13, 2014 [1 favorite]
Scale Up: Get a more powerful computer
Scale Out: Get many smaller computers working together (and then scale them up)
It's usually easier to scale up than out because it doesn't take any major redesign or conceptual changes but there's only so far you can go with a more powerful computer (usually cost becomes an issue).
posted by blue_beetle at 11:03 AM on March 13, 2014 [1 favorite]
Scale Out: Get many smaller computers working together (and then scale them up)
It's usually easier to scale up than out because it doesn't take any major redesign or conceptual changes but there's only so far you can go with a more powerful computer (usually cost becomes an issue).
posted by blue_beetle at 11:03 AM on March 13, 2014 [1 favorite]
This thread is closed to new comments.
posted by backwards guitar at 5:14 AM on March 13, 2014