What technologies are used to create sites such as myspace.com?
August 10, 2005 10:54 AM   Subscribe

I'm curious what codes are used to create sites such as myspace.com. Obviously html for the design, but what about the rest of the site. It is essentially a database. I'm not a programmer so I have no idea what type of coding is involved for searching and allowing others to fill out forms to create a profile. I'm also wondering what type of equipment would be needed to host and maintain a site such as this. I'm basically looking for what is involved on the backend on these type of sites.
posted by ieatwords to Technology (23 answers total) 1 user marked this as a favorite
 
The scripting language behind Myspace.com is an interesting combination of ColdFusion and ASP.NET. You can tell what technology is being used by looking at the file extensions on sites. Most of the pages have a .cfm extension, which means it's ColdFusion, while some are .aspx - ASP.NET. (More specifically, the stuff after the extension on the CF pages show that they are actually using Fusebox, which is a ColdFusion development methodology.)

As for the DB ... there's no way to tell for sure, but the use of ASP.NET means that they are also using Windows servers, so I'd venture a guess that they are using something like Microsoft SQL Server for their DB. That's just a guess, though.

To follow up on Odinsdream, I have to disagree a bit. Whether or not MySQL will make sense to you will depend on whether or not you have any prior relational database knowledge. If not, Access will be a much easier entry point.

The same applies to PHP. If you have some programming knowledge (which you implied that you don't), I don't think that you'll find PHP terribly intuitive. Unfortunately, you probably won't find any of the other options, such as ColdFusion or ASP.NET intuitive, either. CF does tend to have to lower learning curve, and you can download a free "developer edition" from Macromedia to learn on, but it is more expensive to actually deploy than the others.

I would agree with Odinsdream that you should head up to the bookstore and look around for a good text as a starting point. Just remember to "buy where you shop."
posted by robhuddles at 11:17 AM on August 10, 2005


There are many, many different ways to do this, but basically there is a database on a server (oracle, mysql, ms sql server, etc.) that is accessed by a web application (PHP, JSP, ASP, Cold Fusion) also running on a server (could be the same as the database) that uses the data to build HTML pages, which are what the user finally sees. That's a simplification, but there are a lot of books out there on how to do this - it's not too difficult, although some programming skills would help, but it's not unheard of for people to teach themselves how to do this. I would look for something about 'beginners guide to building database driven web applications' or something like that. There are a lot of such books, although I haven't looked at one in a long time to recommend one. I've liked books published by O'Reilly in the past.

As far as servers, any old server will do - you have to install some stuff (the database, the application server, web server, etc.) but a lot of ISPs, like dreamhost for example, offer PHP and MySql with your hosting account.
posted by drobot at 11:21 AM on August 10, 2005


It's a combination of relational database design combined with interface design, which usually is comprised of the user front end, a model or network of functions for the user front end, and perhaps an administrative front end for the site admins, so they can do admin stuff without actually having to repeatitively code stuff. (Which is dangerous, anyway, unless the site is very, very small and not at all complex.)

These days there's generally very little or no raw HTML coding involved, except for whatever small amount might be used for a wrapper around the front end. Usually most or all of the HTML portion that the user sees is generated by PHP or ColdFusion or other programmed interface.

There are numerous models of how this is handled, but the most basic one I can think of goes something like this:

The web user or client makes an HTTP (web) request by interacting with their browser, say, a mouse click on a link. The browser passes the information to the server. The server's web server software (Say, Apache) is configured to pass that interaction on to an engine. (Say, ColdFusion or PHP). The engine takes that request and does the appropriate pre-programmed things with it. (Say, adding someone to your contact list.) Since the link we're talking about here was presented in the browser by the engine itself, it's coded by the engine for this function, and knows that clicking on that link means "add this contact to this user's contact list". The engine then makes the appropriate changes to the relational database (MySQL) and updates (or writes to) the appropriate fields or tables in the database, and recods whatever changes need to be made to the user's profile, which are then updated by the engine in the next page load to the user via the engine.

So: Remote user interaction -> web server -> engine -> DB updates -> engine -> web server -> Remote client browser.

Repeat.

You could do something like MySpace in ColdFusion and MySql, but I wouldn't recommend it unless it was just for you and a few (or few dozen or so) of your friends. (On preview: Heh, for some reason, I never bothered to look at MySpace's served page extensions. That explains a lot, like how often it errors out and how flaky it often is.)

SQL stands for "Structured Query Language". More here.

Also, relational database concepts can be totally non-intuitive to grasp. A 'flat' database is basically just a spread-sheet like construct of information, with rows and columns and/or fields to store snippets of data in to recall them later with various functions - as found in SQL. A relational database has a different structure or topology. It can be all kinds of different topologies. But what makes it a 'relational' database is that specific groups or individual fields of data can be 'related' to other specific groups or fields in a manner that updating one field will effect what's stored in another field or fields. Though, I probably didn't explain that very well.

Disclaimer: I am not a web application programmer or rDB programmer. I may have mixed up some terminology. I'm just a random information-soaking sponge of a nerd. Feel free to correct me if you know more.
posted by loquacious at 11:37 AM on August 10, 2005


But what makes it a 'relational' database is that specific groups or individual fields of data can be 'related' to other specific groups or fields in a manner that updating one field will effect what's stored in another field or fields.

Not quite. SQL will never change data unless you use an UPDATE command. Relationships can be expressed through indices, keys, constraints, and a few other fun things.

The basic gist is something like this:
You have a table Customers with fields CustomerID, FirstName, LastName, Age.

Then you have a table Orders, with fields OrderID, CustomerID, ProductID, OrderDate.

The relationship comes in when CustomerID in Orders is the same as a CustomerID in Customers. You can then do a join on the two tables, and make a "view." The view would be something like OrderID, FirstName, LastName, Age, ProductID, OrderDate.

You would have a similar setup for ProductID and a Products table... you can see how it goes.
posted by devilsbrigade at 11:45 AM on August 10, 2005


UPDATE query, not command. I'm not a DBA, just a programmer, so my DB language is a bit off.

If you want to talk more about this/ideas/whatever, feel free to email me.
posted by devilsbrigade at 11:47 AM on August 10, 2005


Everyone else has covered the languages. To recap, those sites are just very complex integrations of various building blocks in various languages. The easiest to learn is probably PHP with a MySQL backend.

To run something like that, well... there's various levels... We're currently hosting about 500 customers on a content management system on a single Dell PowerEdge 1850 with a 3 ghz Xeon Processor. It's a rack-mountable server with dual everything ... dual power supplies, dual processors, memory in dual configurations, dual hard drives mirrored to one another, etc. We're not anywhere near the limits. The database for that system runs on a Dell PowerEdge 2850, which is similar just with bigger/more hard drives since most data lives in the database.
The 1850 is responsible for parsing the PHP and delivering the websites, via the Apache HTTP server, to the visitor. The 2850 is responsible for feeding data to the 1850 and storing everything in MySQL. They're connected by gigabit ethernet on the backend, and commune with the rest of the world via a 10 mb/s connection. (which we never, ever max out.)

What we have is called a two-tier hardware solution by the marketing people. We have an application layer and a data layer. Other sites, like amihotornot.com, will have a three layer solution of some sort -- a big, beefy database box at the bottom, then a bunch of application servers to process data in the middle, and a bunch of caching servers at the edge to keep the page requests low.

All this kind of stuff either takes big money to buy ... the hosted website thing was a $20,000 outlay for servers ... and to run. You can do it for less by buying older servers off of eBay, but you still have the hosting costs to consider. Make sure you know how you're going to make your money back before you get started.
posted by SpecialK at 11:59 AM on August 10, 2005


I know of no instance of someone actually using Access for this purpose, while many many people use MySQL or another similar database. Moving from MySQL to MSSQL or another database is easier than coming from Access, and MySQL costs nothing but time to learn.

I have used Access for a website once (one that got several thousand hits per day), and eventualy switched to Microsoft SQL server, because as a Comp Sci student it was free. Otherwise I would have used Postgres.

Access's SQL intrepreter is actualy more advanced then MySQL's. You can do subselects, for example. Access databases are limited to two gigabytes, but if you need less then that you're better off with Access. In fact, unlike MySQL Access is a real RDBM.

It also comes with a nifty, easy to use GUI, which is probably why robhuddles recommended it. Access was designed for desktop use.

MySQL blows goats. If you want to learn database management for free, go with postgres.
posted by delmoi at 11:59 AM on August 10, 2005


That said, most cheap web hosting providers will give you a MySQL database to play around with MySQL+PHP is the most portable solution, which will let you move around to diffrent hosts.
posted by delmoi at 12:00 PM on August 10, 2005


Also, the kind of hardware you're going to get determines how many users you can handle (at the same time). You can get MySQL+PHP hosting for about $5 a month that might be able to handle 5 or 10 thousand hits a day.
posted by delmoi at 12:02 PM on August 10, 2005


MySQL will now do subselects. The main issues with MySQL are scaling with many database writes (locking isn't the best), and the incredible lack of constraint checking.

For a personal site, or one where a very simple, fast database is needed, its fine. IE, a site where the primary focus isn't the database, but the site.

When you get to the point where the database is the primary focus, its time to move to something else. Postgres is a good free choice. Something like MS SQL, Oracle, or DB2 are all prominant choices for enterprise work.
posted by devilsbrigade at 12:07 PM on August 10, 2005


Derail: MySQL is the perfect solution for a website where you have an assload of selects and few updates, which most db-driven *websites* do. It's select performance on many simple selects (as opposed to a few complex selects, which is the way they teach things in CS classes) is tremendously fast, even across very large database tables.
But just like anything else, it's a tool, and you don't use a screwdriver to drive a nail.
posted by SpecialK at 12:35 PM on August 10, 2005


Yes, yes. I didn't mean to come off as a zealot either, and I did point out that MySQL is practically ubiquitous on hosting providers.
posted by delmoi at 1:14 PM on August 10, 2005


You're giving away all our secrets!
posted by matildaben at 1:21 PM on August 10, 2005


ieatwords, if you feel there's a programmer deep within you fighting to get out then by all means start installing databases and scripting languages, but if you want to get a full-scale project going right away you'll need to work with someone who shares your enthusiasm for the idea and can bring in the kind of technical overview and planning skills you need.

Note that I didn't say programmer. Before you even think about coding you need to sort out the higher level stuff:
What technologies might be best?
What hardware might you need?
What interface and design issues need addressing?
What would be a sensible phased approach?
How long will it take?
What can be done to promote the service online?
What can be learned from similar services already out there?
Is the whole thing a waste of time?

Forget about coding for now, work with a 'web consultant' (yeah, I know, the C word) first (who may also do coding, of course, but the broad knowledge of technical and strategic matters is vital).
posted by malevolent at 3:40 PM on August 10, 2005


43things and 43places are powered by Ruby on Rails. Most good new developments will be powered by same.
posted by wackybrit at 4:18 PM on August 10, 2005


Friendster was re-coded in PHP (from Java, I think) about a year ago.
posted by sachinag at 5:25 PM on August 10, 2005


I know you didn't ask specifically to learn programming languages but I'd like to respond to some of the posts here. There are many programming languages out there - some of them you'll find intuitive, others cumbersome and awkward to you. It's down to what works for you. Some people will tell you "this language is where the money's at!" but if you find it as dull as dishwater to use, then why bother? Syntax is surprisingly influential as to what language people will go for. Go with what you enjoy. Otherwise, why bother? Be it ASP (VBScript/Javascript), PHP, ColdFusion, whatever - go with what works for you - then you'll enjoy your work and be good at it too.
posted by FieldingGoodney at 6:25 PM on August 10, 2005


Most good new developments will be powered by same.

Ruby on Rails isn't the end-all be-all for web programming. Its new, just like everything was at some point. 43things.com isn't an amazing technological achievement; the same thing could have easily been done with any language.

If anything, I think we'll see a shift from scripting languages towards application platforms, like Java, as people start realizing that hacking together a large site doesn't work to well.
posted by devilsbrigade at 6:32 PM on August 10, 2005


Lemme inject some of the Operations mindset, and answer the part of your question that has been left untouched- namely, what kind of equipment is needed to host a site like myspace.

Like the people responding in this thread, I've written in my work experience some simple- to- complex 2- and 3- tier DB driven sites using ASP.NET and SQL, for example; things like XML service feeds for collected perf and eventing data, gridded and form-based user input/"time clock" systems, graphical representations of datacenters based on DB data, etc.. Replace those terms with other SQL engines and other server-side programming languages, and it's six of one, half a dozen of the other. These little projects were hacky, made-to-work code in service of my main jobs, which have been operations for systems running someone else's far more professional and QA tested code (systems as small as 150-200 servers and as large as about 8,000 servers for a single site)

Anyone can throw mysql and php on a simple server, spend a little time funking around, and be able to handle a few hits. The fact that I could do it with .NET and MS SQL shows how easy it can be in 2005 to whip up nice relational DBs and interfaces in a snap.

However, once you've passed the 'light traffic' level and have entered the arena of extremely heavily trafficked sites, with millions of dynamic page generations every day, the real hurdle is scaling out. At this point, you run into problems you never see or even consider when first prototyping your site on a single dev box.

Things like:
  • Failure is the normal state of things. Expect it, code for it, plan for it, build recovery into your system as if you were always failing individual servers or components. You will never solve this problem with beefier servers or better human process; solve it by assuming it will happen no matter what you do, and planning for that.
  • Teaching your developers how to better handle sessions and persistence for users, since the live site won't be running from the same dev box in their office but instead potentially bouncing any given user between a fleet of load-balanced servers, indiscriminately. You'd be surprised how many skilled developers get thrown for a loop by anomalies in user experience that occur when their perfect code in the lab gets tossed onto 100 internet-facing web servers.
  • Similarly educating your developers on the notion that 24/7, servers don't work reliably- their code should be written to expect that the server or resource it gets or gives its data to/from for each directly connected tier is going to be flaky, or error prone, etc. Building in retry algorithms, timeout values, etc., is a must, with real-time adjustable "battle knobs" for an operations team to tweak either manually or automatically in response to dramatic changes in traffic patterns or user behavior
  • Architecting your systems such that disk-intensive components are easily code-separated from memory- and CPU- intensive, or bandwidth intensive, components of the system. The "one-box" approach never scales well: as sites grow in traffic, the bottlenecks do not grow equally, and the inability to grow as needed those components proves costly and inefficient.
  • Implementing a variety of geo-location, DNS distribution, TCP aggregation, and page element or fragment caching technologies to ensure that the round-trip-time for initial page retrieval is kept low, and that the user experiences the fastest page load for your site with the least bandwidth, time, and direct hit on your actual servers. And again, informing your developers of these changes to better make them aware of how having different parts of the page be delivered in different ways can impact development and testing methodologies.
  • Building an infrastructure and code deployment methodology that ensures N servers can quickly grow to Y*N servers without great increases in human effort or the increased chance that "bad" or malfunctioning servers go unnoticed by you- but noticed by your users. This means enforcing "simple" install/rebuild processes, and not allowing the duct-tape and baling-wire mentality of a lab to seep into your live site methodologies. This also means a whole system just for monitoring, alerting, repairing, analyzing, etc, your production systems, and the challenges that entails.
The list goes on and on, but the big thing is realizing that there are new technological problems, not in the core coding process, but in making that code work intelligibly in the kind of large-scale hosted environment that very popular sites like myspace.com require.
posted by hincandenza at 12:41 AM on August 11, 2005


Ruby on Rails isn't the end-all be-all for web programming. Its new, just like everything was at some point. 43things.com isn't an amazing technological achievement; the same thing could have easily been done with any language.

Sure, in five times the time.
posted by wackybrit at 6:27 AM on August 11, 2005


wackybrit, you've drank the kool aid. Ruby on Rails is excellent because it provides a strong framework for quickly creating apps, but assuming that you'll be five times faster is silly. There are a few rapid development toolkits popping up for other environments, some modeled on Rails (like Cake for PHP). I still think that Rails qualifies for best-thing-ever status, but I'm not sure how it would scale to a site like MySpace. Maybe perfectly, I just haven't seen it done so I'm reserving judgement.
posted by mikeh at 7:02 AM on August 11, 2005


Let's remember that MySpace's solution isn't scaling too well either, but then again, few sites have seen the kind of growth they have.
posted by Good Brain at 10:45 PM on August 11, 2005


Don't know if you're still around, but try this old, but good Webmonkey link.
posted by BigBrownBear at 9:09 AM on January 12, 2006


« Older What does it mean to overturn a Supreme Court...   |   Any advice for a would be self-employed web... Newer »
This thread is closed to new comments.