looking for some good articles about Amazon EC2
December 10, 2010 12:34 AM   Subscribe

The Amazon Elastic Compute Cloud (Amazon EC2) service is blowing my mind and melting my brain a bit. It seems like the stuff of science fiction, yet I know it is real. Can people here provide pointers to good articles about it, written more for the "intelligent layperson" instead of for sysadmins with multiple PhDs in BioInformatics and Algorithmic Analysis?

So, much the same way MAME would emulate older CPU types while running on a Pentium PC, the supercomputers in the Amazon EC2 cloud can emulate PCs, Linux machines, etc.; you send them 'jobs' and they either return the data/results, or they "crash" and fail.

If I had a computer-graphics rendering job with lots of ray-tracing and radiosity that takes an hour to render each 4000x3000 frame, I guess I could send all the frames to EC2 and have them all done at once --- in an hour. Can you buy "400x super-speed" and have them all done at once, in nine seconds?
posted by shipbreaker to Technology (10 answers total) 6 users marked this as a favorite
 
Your MAME comparison is off. There are no supercomputers in EC2, just higher-end PCs partitioned to appear as multiple virtual computers. For your "super-speed" example, you'd need an actual faster CPU, which probably doesn't exist. The Wikipedia article lists the available configurations, and, unsurprisingly, they are no faster than the actual CPUs they're running on, so the answer is "no". In general, parallelizing things is hard, so the benefits are limited.

EC2's main value-add is that it is efficient, since for your rendering example, you'd have to buy a bunch of computers, set them up, and let them idle for most of the time until you'd actually render your movie. Then, you render it (you'd probably not get one computer per frame and run for an hour, but rather find some manageable compromise between cost and rendering time), and afterwards you're left with an unused datacenter. In the EC2 case, Amazon just runs other people's work on the computers beforehands and afterwards, and there's no need to shuffle physical stuff around.
posted by themel at 1:28 AM on December 10, 2010 [1 favorite]


Ah yes, and sorry for missing your first question. I couldn't find anything useful, but I guess it doesn't get written about much for popular consumption since it's not something that a non-IT person could play with.
posted by themel at 1:49 AM on December 10, 2010


Virtualization is one of those things that almost seems like magic and has really taken off in the last 5 years or so with programs like VMware and Virtual PC, but like many things in this field it's a very old idea. IBM S/370 mainframes as early as 1972 featured full virtualization capabilities. Really the thing that is relatively new is the ability to virtualize hardware like the x86 that was never designed for it, unlike the old mainframes. Nowadays however things have come full circle and modern 64 bit x86_64 instruction sets from both Intel and AMD include specific hardware support to speed up and simplify virtualization so that a lot of the hacks are no longer needed.
posted by Rhomboid at 2:01 AM on December 10, 2010


You're asking the question using the wrong words (this begs the question of how does one ask a question with the correct words when you don't know the answer? This is almost like Meno's paradox. But now I'm wildly digressing).

Also, I disagree with themel on a key point; yes parallelising computing tasks is hard, but no the benefits are subsequently not limited. This is like saying maths is hard, hence calculus offers no help to engineers. Of course it's hard.

The questions you're looking to answer are "what is virtualisation?", and, as a lay person or MBA would ask, "what is the cloud?". Don't let the techno-babble scare you off: this is easily the most important business trend of the coming decade, and well worth your time in understanding it. In a nutshell, "virtualisation" means big computers pretending to be a bunch of smaller computers; you get these behemouth 32-64 core machines to run 32 separate, distinct "pretend" computers inside it. As themel correctly stated, Amazon don't run supercomputers, but instead offer massive numbers of virtualised servers on a per-minute basis. After that, the "cloud" basically implies providing software-as-a-service (SaaS) on virtualised servers.

The take-home message about Amazon EC2 is that it doesn't offer any baked-in functionality; you can rent out hundreds of servers but you'll need to figure out the hard part of getting the software to efficiently parallelise by yourself. Fortunately they offer a wide range of very useful tools to do so, such as high-end distributed, fault-tolerant data storage, high-end DNS servers, etc. Whereas before an e.g. movie studio would need to purchase hundreds of Mac Pros for a movie, they can instead rent out servers from Amazon EC2. That's it. The software used to parallelise and perform the calculations? That's where the magic is.

Here are some articles that'll go over it:
  • Introduction to server virtualisation (Ars Technica):
    Virtualization is a method of running multiple independent virtual operating systems on a single physical computer. It is a way of maximizing physical resources to maximize the investment in hardware.
  • A survey of corporate IT (Economist):
    Now, this special report will argue, computing is taking on yet another new shape. It is becoming more centralised again as some of the activity moves into data centres. But more importantly, it is turning into what has come to be called a “cloud”, or collections of clouds. Computing power will become more and more disembodied and will be consumed where and when it is needed.
  • The Cloud: a short introduction (Ars Technica):
    Cloud computing is an approach to client-server in which the "server" is a dynamically scalable network of loosely coupled heterogeneous nodes that are owned by a single institution and that tends to be biased toward storage-intensive workloads, and the "clients" are a wide variety of individuals and institutions that use fractions of shared nodes to run jobs that are transient with respect to time, lightweight with respect to compute-intensity, and anywhere from lightweight to heavy with respect to storage-intensity.

posted by asymptotic at 5:47 AM on December 10, 2010 [6 favorites]


I think you'll find that EC2 and other hosting services (which is what they really are) aren't that different than what most companies are doing in house or what you can do on your personal computer right now. Download VirtualBox or VMWare and configure a virtual machine. You can have a different computer running in a window on your desktop.

I used to run VMWare ESX so I'd have a server, which is nothing more than a more powerful computer typically with disk and power redundancy, running multiple virtual computers. Its really old news. Amazon just has a datacenter or three full of servers running virtualized instances, but at the end of the day they're just racks of cheap off-the-shelf servers. Things like supercomputers are really government and academic devices built for specialized tasks like simulating the weather, simulating nuclear testing, etc.

Also, its fair to point out that every industry has fads. Some pan out, most don't. Not too long about it was thin-client computing, or the linux desktop, or OSX in the enterprise, or FreeBSD everywhere, etc, etc. While hosted virtualization is useful, it may not be this earth-shattering thing the tech media makes it out to be. The tech media's primary concern is attracting advertisers and getting ad impressions. Accuracy and predicting the future aren't their forte especially when big advertisers tell them to promote their products in their articles.
posted by damn dirty ape at 6:57 AM on December 10, 2010


"The cloud" is just using someone else's hardware. You can buy computers, you can lease computers or you could rent time on Amazon's computers.

As others have said, not a new idea, just a new way of selling it.

The benefit to Amazon is that virtualization not only allows their hardware to be more efficient on a machine by machine basis, selling 10 virtual machines for every one physical machine they own, but that it also lets the virtual machines be moved around to different physical machines pretty much on demand. The "controller" software sees a virtual machine taking excessive resources, it can pause it, move it to a less utilized node, and start it back up. This happens in milliseconds. The end result is that they can perhaps sell 200 virtual machines for every 10 physical machine and maintain the same or better quality of service.
posted by gjc at 7:26 AM on December 10, 2010


EC2 and other hosting services ... aren't that different than what most companies are doing in house

Exactly. We have a small cloud where I work. It takes less than half an hour to bring up a new server. Our current bottleneck is how much disk space we have; we can't bring up any new VMs until we expand our SAN.

The basic idea of virtualization is that many of your computer's resources are used only a small percentage of the time. A machine may have four CPUs, but they spend 99% of their time idle and very rarely are all CPUs pegged at 100%, and then only for a few seconds. Why not host five virtual machines on that computer? Now you have five computers for the price of one, and chances are that nobody using them will notice any degradation in performance.

Memory is another thing that is underutilized most of the time, so you might only need 10 GB of physical RAM to give five VMs 4 GB (especially if the VM software is smart enough to share memory among VMs when they all have loaded the same files, for example the operating system).

Disks are the slowest part of a computer, so for a VM host you are going to want some really fast disks (usually a RAID). And since if a disk fails it could take down several VMs, you want really reliable storage (again, a RAID). This stuff is expensive, so it's not really five computers for the price of one. Still, you can cut your hardware costs significantly, and save the IT staff a lot of time (and therefore money), which explains why it is becoming so popular.
posted by kindall at 7:28 AM on December 10, 2010 [1 favorite]


EC2 in a nutshell:

- Instead of buying computers to do work, you're renting them; oh, and they don't physically exist.
posted by blue_beetle at 8:45 AM on December 10, 2010


I was surprised to flip through the thread and not see Xen, the open-source virtualization software that EC2 is heavily based on, mentioned by name. While it's an oversimplification, think of EC2 as just being a whole lot of computers in a handful of data centers running Xen to allow for all of the magic to happen.

The Wikipedia article on Xen explains the basics of how it works, and this presentation goes into even more detail -- at least for a sysadmin who is already familiar with virtualization.

If that's still a bit high-level: cloud computing is facilitated by machines that boot to something called a hypervisor as opposed to a full operating system. That hypervisor, in conjunction with features in modern processors, controls the operation of multiple operating systems running simultaneously on that same machine. When there are large numbers of these machines, it's possible to write software that allows a customer to dynamically request (and get access to, and configure) an instance of an operating system running on a hypervisor-based machine. asymptotic's links do a great job of taking it from there.
posted by eschatfische at 2:45 PM on December 10, 2010 [1 favorite]


Regular PCs can be sliced into multiple virtual computers. EC2 is roughly 50,000 regular PCs in 4 sites around the world that you can rent a slice of by the hour. They also have disk space and a fast network connection that you can use for a fee. The computers physically exist and they're not supercomputers, there's nothing really new from a computer engineering perspective. The main innovation is the pricing model -- you can get started without a huge inflexible contract with an old-fashioned hosting provider. It's very very useful, and I couldn't get my work done without EC2 or something like it, but it's not magic.
posted by miyabo at 3:46 PM on December 11, 2010


« Older "I swear, the cheescake tastes better with *this*...   |   How do I get one more hour (credit) at the last... Newer »
This thread is closed to new comments.