How are large web applications like Facebook architected?
April 23, 2016 1:17 PM   Subscribe

I'm recently getting into front-end code for large applications. I understand how complex backend processes fit in (more or less), but I am a bit confused about how things like Facebook chat works, at a high level. I'm assuming that a large, separate team works in a separate codebase then the rest of the Facebook teams. How does it get injected into the DOM them? How do they ensure no conflicts with other Facebook features? Is there a separate "master" server that pulls all this together before outputting everything?

Facebook is just an example, and I'm less concerned with scalability. What I'm concerned about is breaking up a giant, monolithic application into separate maintainable projects.

A more simple example: Google's top nav has a logo and links to various apps. I'm sure these apps don't just keep a copy of the HTML for the top nav in each separate application. How then is the DOM built out? Is there a "mock" for this and in higher environments it calls the actual DOM?

To see where I am coming from as a mainly backend programmer. If I had a large team or even a large project, in order to keep it from becoming ghastly, I'd create a service bus or something similar, then in development mock out sample data or even a small bus that delivers mock data. Then in production or whatever, I would simply have a config change or something similar that say look at the real data. Then you can keep nice happy code bases separate.
posted by geoff. to Computers & Internet (11 answers total) 11 users marked this as a favorite
I'm not hands on with front end code, but based on conversations I see it seems like at my company, this is handled (when it's handled well) by using a modular framework that encapsulates different parts of the functionality. So there's a master application that then imports a bunch of modules. I think conceptually it's like you have a view that a module higher up in the hierarchy can give a specific DOM element to own and then it operates within that context.

AFAIK, the modules are in separate repos and are combined and minimized during a build process so you don't end up with a zillion individual files to load at runtime.

But TBH, up until you hit truly facebook/google scale, I think most companies just slog through a single monolithic repo and deal with the challenges of the occasional conflict through education, tribal knowledge, and (ideally) continuous integration testing.

Mocks are an option and I've seen them used in some situations, but DOM mocks in particular are kinda unwieldy and there is enough variation in actual browser implementations that you're just setting yourself up for testing frustration later in the integration process. I've only really seen them used for headless applications for testing or back-end HTML rendering.
posted by heresiarch at 1:42 PM on April 23, 2016

I don't do web stuff, but having a modular framework and having a monolithic repo are not mutually exclusive - Google, Facebook, and Twitter all have all of their code in one large monorepo. Here's a Wired article about Google's codebase, and a more technical piece on the advantages of a monolithic code depository.
posted by Itaxpica at 2:01 PM on April 23, 2016 [2 favorites]

(Credentials: I've been doing front-end for a major American retailer's web store for about a year now.)

Google's top nav has a logo and links to various apps. I'm sure these apps don't just keep a copy of the HTML for the top nav in each separate application. How then is the DOM built out? Is there a "mock" for this and in higher environments it calls the actual DOM?

Things like this are commonly handled via edge-side includes. Basically, they let the Web cache or CDN worry about it to reduce the load on the core application servers.

I'm assuming that a large, separate team works in a separate codebase then the rest of the Facebook teams. How does it get injected into the DOM them? How do they ensure no conflicts with other Facebook features?

If you mean the Javascript code that runs it, it's pretty straightforward to keep the code separate. A particular project "owns" a particular DOM element in the core template, and is responsible for controlling it. Thanks to closures, Javascript makes it easy to run different copies/versions of the same libraries on the same page (assuming they don't go around polluting the global namespace).

In fact, the tricky part is having multiple projects share their core libraries. At my work we try to solve this by having a "standard" set of libraries available globally (this version of jQuery, this version of Lodash, this version of React, etc.) and the whole company upgrades at once, but even with this approach it's hard to keep everything in sync.

It's far easier to just have a separate build for each app with all its dependencies bundled up so you don't have to worry about it. However this balloons up your JS builds and is, in my opinion, the pattern largely responsible for the website obesity crisis.

What I'm concerned about is breaking up a giant, monolithic application into separate maintainable projects.

At the UI level at least, Facebook's React library helps a lot with this by letting you break an interface down into separate components which can then be shared between projects.
posted by neckro23 at 2:21 PM on April 23, 2016 [4 favorites]

You can tell a lot about how a company writes code (or how they think code should be written) by looking at the libraries they create. Check out angular (Google) or React (Facebook) for UI goodness.
posted by blue_beetle at 2:22 PM on April 23, 2016

You basically have the idea right. There is some central service that puts together the DOM that will be sent down and a load balancer in front of it. Usually some caching layer behind the LB for front end assets. It's not as sophisticated as you might think.
posted by deathpanels at 2:28 PM on April 23, 2016

(Credentials: I work on a major public-facing web app with hundreds of millions of users for a large company. This is also a bit rambly, as I haven't really tried to organize my thoughts.)

I think you'll find as many answers to this question as there are projects, as each app's architecture will tend to grow somewhat organically out of its roots. The only time this isn't the case is if you can't get those roots to scale out as it grows, in which case there's probably a painful round of rewriting and refactoring involved. In any case, what I'm describing is my experience, and I would definitely not claim that it's the only way (or even the best way) to do it. I'm also limited in exactly how much detail I can provide, so this will by necessity be in fairly broad strokes.

A big part of the reason for all the differences is that front end work is generally not about creating well-tested and understood APIs as is the case for backends, so I feel like there's a lot more ad-hoc, "do what works given the requirements" attitude when it comes to front-end work. You don't want to write bad code, and testing is still important, but there are probably fewer consequences to an architectural misstep at this level as you can go back and iterate on it without worrying about it breaking something downstream (generally).

In my case, we mostly worked in the same repository on the same code for the front-end, and tried to keep the code itself modular enough (to varying degrees of success) that it was unlikely that someone working on one feature would step on the toes of someone on a different feature, and we let the SCM do a lot of the heavy lifting as far as integrating the work from the various contributors goes.

As far as shared components like headers and footers and other UI widgets, most of these would go in a central "shared" repository that would be part of the front-end repository. Front-end work would draw on that shared work as needed. In the past, the shared components were in a separate repository and we would take build drops from that repository as the team in charge of shared components worked on a release, but that caused a lot of friction, so now everything is in the same repository. The front end servers will serve up all of the HTML (including shared components), but we would use edge-caches or edge-proxies (similar to CloudFlare) to keep things performant. Static content like images, JS and CSS would go onto a CDN like Akamai (for example).

Some shared components are developed by another part of the company that we'd either need to get build drops from, or they'd host their own servers and provide a JS library hosted on a CDN that our app could call into to have it do its thing. It's all very ad-hoc, and given that some portions of this system were originally conceived around two decades ago, there is an interesting mix of technologies and approaches. Since each team that provides a component we integrate with is forced to provide at least an API to write to, that's how we usually tend to manage dependencies between teams.

Mocking wasn't a practice that was in the vogue when our app was written, so there is an "integration" backend environment that is separate from the production one and most development would be done against that environment. This is slowly changing over to more mock-based testing, where heavy UI automation testing is increasingly being replaced with nimbler unit tests, but it's slow going, and even with them there will still be a need to test against a real environment, so I don't imagine the separate backend will go away any time soon.

All of that said, the one constant is change, and what works in the past, or for one team, may not work in the future, or for a different team, and adaptability is key. There's always a lot of new tech and approaches in the front-end space, and you ignore that at your peril. On the other hand, chasing after trends is a great way to not get work done, so don't let shiny new frameworks and tech make you lose sight of that, especially if you're working on an existing project rather than some new green-field project.
posted by Aleyn at 3:49 PM on April 23, 2016 [1 favorite]

Oh wow, this is great and really informative. Please keep them coming. I think my mental hurdle was that for a lot of complex back-end systems I've been apart of, there was no way to get the entire environment on a developer's machine. No one wants to spend 20 minutes spinning up 15+ services during each build, even if that was possible (which is not in cases where the service call is proprietary like an SAP instance, and we don't have the licenses to go around). So there's a natural tendency to create services/applications that do one thing and do it well.

Now that I think of it, having it all in the same repo and separated in separate folders or somehow segregated is not a big deal. If you branch off the release/stable branch everything should build anyway, and UI builds are very quick compared to what I am used to.
posted by geoff. at 4:41 PM on April 23, 2016

Different repos can work fine if the things in question don't share code. The downside of the giant monorepo is that merging becomes nightmarish. I worked at a biggish company that had an embarrassingly convoluted system for getting code into the master branch which just slowed everything down, and it was all because developers could not possibly handle all the merge conflicts.
posted by deathpanels at 5:07 PM on April 23, 2016

Yeah, I work at Google and I should mention that developing in our monorepo is a dream, but what makes it work is an enormous investment in to tooling and infrastructure to make development as seamless and pleasant as possible.
posted by Itaxpica at 9:48 PM on April 23, 2016

I know you said you aren't interested in scalability per se, but those design choices to achieve scalability are going to drive how the front end gets organized. The High Scalability blog is a single topic blog on web scalability that covers a lot of waterfront, including front end organization of apps. Your "for example" of Facebook has a number of blog entries there, including one or two just on the messaging service.

The other thing to look for are articles/blog posts on design patterns for large scale javascript applications. Start with the search string "large scale javascript" and you'll find a number of interesting blog posts and conference talks to get started.
posted by kovacs at 5:52 AM on April 24, 2016

I've worked at large scale web companies you've heard of, for a decade.

There are *lots* of different ways.

One common-ish way is that there are different teams that work on different parts of the problem, like one team for login, one team for news feed, one team for ads, one team chat, etc, and those are all separate products that have their own code bases, their own hosts, their own deployment processes, etc. and they expose (hopefully) well-documented apis for talking to each of them.

Then when you go to, you're hitting the product of the team that handles the frontend and client stuff, and their product talks to all those different other teams' products via their apis to get the content to display.
posted by colin_l at 8:31 AM on April 24, 2016

« Older Isn't this supposed to be the other way around?   |   What are the most revealing signs you're not into... Newer »
This thread is closed to new comments.