Computers within computers
January 26, 2012 1:44 PM Subscribe
Why is virtualization important in the context of cloud computing?
I know that virtualization is a important trend in computing. I have a decent understanding of how it works (I can run windows in linux or vice versa), but I do not understand how virtualization would work in cloud computing. Is it just that it allows people migrating to cloud architectures to simulate the environment they already have? Is there more to it?
I know that virtualization is a important trend in computing. I have a decent understanding of how it works (I can run windows in linux or vice versa), but I do not understand how virtualization would work in cloud computing. Is it just that it allows people migrating to cloud architectures to simulate the environment they already have? Is there more to it?
If you can split your computational problem (one computer making lots of widgets and sending them to lots of people, for example) into small chunks (lots of computers making just a few widgets for just a few people), it's easier to distribute the work to lots of virtualized images running a consistent, common environment.
posted by Blazecock Pileon at 1:52 PM on January 26, 2012
posted by Blazecock Pileon at 1:52 PM on January 26, 2012
Virtualization is precisely the point of something like Amazon EC2. You have machines that are virtual configurations, and these are activated and then you can have 1, 10, 1000 identical machines to serve as anything you need them to be. From Amazon:
Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of operating systems, load them with your custom application environment, manage your network’s access permissions, and run your image using as many or few systems as you desire.posted by artlung at 1:53 PM on January 26, 2012
I'm involved in an application virtualization project in a university setting. Students can now use software on their own computer, and from wherever they choose, that we couldn't otherwise distribute to them on an individual basis. Or, to put it another way, the computer lab comes to them.
posted by JohnFredra at 1:58 PM on January 26, 2012
posted by JohnFredra at 1:58 PM on January 26, 2012
Best answer: A whole bunch of reasons:
1) consistent virtual hardware running on highly heterogeneous bare metal hardware: you have one operating system image for every machine.
2) ability to migrate virtual machines between hosts as load increases and decreases: you can scale up and down the type of machine you need quickly.
3) one very powerful bare metal machine can run multiple virtual hosts: less hardware redundancy achieves the same level of uptime, so instead of having 20 power supplies that could go, you have two or three redundant power supplies on one beefy machine. Same for hard drives, network connections, the works.
and finally, although it's not a real benefit of the technology itself, it encourages failsafe, shared nothing application design. You have to build your application in ways in which you expect nodes to be added and removed at random. Netflix has a program called chaos monkey which actively breaks a random server occasionally in production. They do this so that they never get lazy with their application architecture: it must support random intermittent hardware failure.
posted by Freen at 1:59 PM on January 26, 2012 [6 favorites]
1) consistent virtual hardware running on highly heterogeneous bare metal hardware: you have one operating system image for every machine.
2) ability to migrate virtual machines between hosts as load increases and decreases: you can scale up and down the type of machine you need quickly.
3) one very powerful bare metal machine can run multiple virtual hosts: less hardware redundancy achieves the same level of uptime, so instead of having 20 power supplies that could go, you have two or three redundant power supplies on one beefy machine. Same for hard drives, network connections, the works.
and finally, although it's not a real benefit of the technology itself, it encourages failsafe, shared nothing application design. You have to build your application in ways in which you expect nodes to be added and removed at random. Netflix has a program called chaos monkey which actively breaks a random server occasionally in production. They do this so that they never get lazy with their application architecture: it must support random intermittent hardware failure.
posted by Freen at 1:59 PM on January 26, 2012 [6 favorites]
It is especially important for workloads that are infrequent or have highly variable demands. In the case of infrequent use, that virtual machine might be only using a fraction of the physical machines' capabilities. The rest of that capability can be devoted to other virtual machines on that same box that need the memory/cpu/disk.
In the case of a workload that is highly variable (think Amazon.com from Thanksgiving through Christmas), the ability to quickly ramp up and add capacity is a hallmark of virtualization. It is far easier to provision virtual machines than it is to rack and wire up physical servers.
posted by mmascolino at 2:03 PM on January 26, 2012
In the case of a workload that is highly variable (think Amazon.com from Thanksgiving through Christmas), the ability to quickly ramp up and add capacity is a hallmark of virtualization. It is far easier to provision virtual machines than it is to rack and wire up physical servers.
posted by mmascolino at 2:03 PM on January 26, 2012
The biggest advantage is the hardware abstraction - frees you up to do all kinds of things, including providing a path to overall lower capital costs associated with physical infrastructure.
Cloud computing is a buzz word, in abstract it means a ny number of things. Generally in todays market it means the ability to buy computing resources without a direct correlation to underlying hardware. The value of virtualization is based on the flexibility inherited from the hardware abstraction layer.
posted by iamabot at 2:16 PM on January 26, 2012 [1 favorite]
Cloud computing is a buzz word, in abstract it means a ny number of things. Generally in todays market it means the ability to buy computing resources without a direct correlation to underlying hardware. The value of virtualization is based on the flexibility inherited from the hardware abstraction layer.
posted by iamabot at 2:16 PM on January 26, 2012 [1 favorite]
Best answer: I disagree with most of the above answers, which aren't arguments for virtualization so much as for fault tolerant distributed computing. You can get the same benefits without VMs by letting customers run processes on shared mainframes.
So why virtualization?
1. Isolation:Protect client VMs from each other, protect the host OS from clients. OS-level isolation for Windows and Linux is not great, in part because there's a big attack surface (privileged system calls, enormous trusted computing base) that can be used to escalate privileges. Using a VMM gives a smaller trusted computing base.
2. Customers have total control over their VM's OS. This lets them set up the exact software environment they want.
posted by qxntpqbbbqxl at 2:22 PM on January 26, 2012 [2 favorites]
So why virtualization?
1. Isolation:Protect client VMs from each other, protect the host OS from clients. OS-level isolation for Windows and Linux is not great, in part because there's a big attack surface (privileged system calls, enormous trusted computing base) that can be used to escalate privileges. Using a VMM gives a smaller trusted computing base.
2. Customers have total control over their VM's OS. This lets them set up the exact software environment they want.
posted by qxntpqbbbqxl at 2:22 PM on January 26, 2012 [2 favorites]
The main reason for us is an extra layer of security between virtual machines, so depending on how individual customer accounts are spread out, an attack on one virtual machine is less likely to affect accounts in the other virtual machines. Yes, it may, for a time affect the entire host as a whole until a monitoring system detects a problem, but likely limits the damage (or provider liability) to just that single virtual machine.
posted by rwheindl at 2:24 PM on January 26, 2012
posted by rwheindl at 2:24 PM on January 26, 2012
In terms of cloud computing, you probably have no idea how the provider decides to manage where your account resides, and how many other accounts might be on your same VM, so it's up to them to manage the risks. I don't know if many cloud computing VM providers advertise how they set up and distribute the load because that would then create another potential security risk for them.
posted by rwheindl at 2:27 PM on January 26, 2012
posted by rwheindl at 2:27 PM on January 26, 2012
Best answer: So, the whole point of "the cloud" is that most resources that have traditionally been tied to physical objects are abstracted out into simple resource pools. You need storage? You don't need to go physically buy a hard drive, you just add some abstract blocks of storage to this task's pool.
This is most achievable in a virtualized environment because virtualization best divorces a running task and the hardware currently physically backing that task's resource use. Without virtualization, only applications specifically written to be "cloud-aware" can be placed on a cloudy network. For instance, hack against the EBS API and you can have "cloud" storage in a program running on just a single node. But, that doesn't make your application cloudy.
Functionally, virtualization allows you to homogenize computational resources across your entire network. Instead of trying to add a new task to your network by finding an appropriate server with the appropriate operating system and with enough spare resources to meet your (perhaps erroneously) projected requirements, you just check whether or not there is "space" on your network for another 2-core/2GB task, and make it the task's responsibility to define its software environment. And heaven forbid, but if you were over optimistic on your requirements, expanding its resource allocation is as simple as being like "No, wait, I mean a session-level load-balanced 32-core/64GB task."
There are alternatives to virtualized cloud computing, though. You can also do homogeneous clustering, where all nodes on the network are at least logically, but preferably physically, identical and cluster them with something like corosync. You can migrate tasks around between nodes, keep machines available for failover and/or load balancing, and generally manage shit at the task level. This is super if the kinds of tasks that you'll be processing are all cheaply available on the same operating system, if you expect high availability and resource requirements from all tasks, and if you have sufficiently smart shared storage.
But one day, you will need a super lightweight service running on your network for a small (but vital) internal project. Unfortunately, the only free implementation of that service only runs on BSD. Now you must build an entire BSD cluster in order to provide reliability and network services for that one trivial service.
Also, adding new hardware to expand available cloud resources gets to be annoying in a heterogeneous environment. So, someday, that particular model of machine of which you already own twelve thousand? It's gonna end of life from Dell. And you're still going to need new capacity. Now you've got a real complicated issue regarding hardware compatibility and network boot images.
Whereas, with a virtualized environment, you'd have spun up your BSD instance and been running the service quicker than it took you to read this answer. And that emergency shipment of 300 off-brand Liberian army surplus servers just need to be boot flashed and racked to be contributing members of your servciety.
posted by Netzapper at 2:28 PM on January 26, 2012
This is most achievable in a virtualized environment because virtualization best divorces a running task and the hardware currently physically backing that task's resource use. Without virtualization, only applications specifically written to be "cloud-aware" can be placed on a cloudy network. For instance, hack against the EBS API and you can have "cloud" storage in a program running on just a single node. But, that doesn't make your application cloudy.
Functionally, virtualization allows you to homogenize computational resources across your entire network. Instead of trying to add a new task to your network by finding an appropriate server with the appropriate operating system and with enough spare resources to meet your (perhaps erroneously) projected requirements, you just check whether or not there is "space" on your network for another 2-core/2GB task, and make it the task's responsibility to define its software environment. And heaven forbid, but if you were over optimistic on your requirements, expanding its resource allocation is as simple as being like "No, wait, I mean a session-level load-balanced 32-core/64GB task."
There are alternatives to virtualized cloud computing, though. You can also do homogeneous clustering, where all nodes on the network are at least logically, but preferably physically, identical and cluster them with something like corosync. You can migrate tasks around between nodes, keep machines available for failover and/or load balancing, and generally manage shit at the task level. This is super if the kinds of tasks that you'll be processing are all cheaply available on the same operating system, if you expect high availability and resource requirements from all tasks, and if you have sufficiently smart shared storage.
But one day, you will need a super lightweight service running on your network for a small (but vital) internal project. Unfortunately, the only free implementation of that service only runs on BSD. Now you must build an entire BSD cluster in order to provide reliability and network services for that one trivial service.
Also, adding new hardware to expand available cloud resources gets to be annoying in a heterogeneous environment. So, someday, that particular model of machine of which you already own twelve thousand? It's gonna end of life from Dell. And you're still going to need new capacity. Now you've got a real complicated issue regarding hardware compatibility and network boot images.
Whereas, with a virtualized environment, you'd have spun up your BSD instance and been running the service quicker than it took you to read this answer. And that emergency shipment of 300 off-brand Liberian army surplus servers just need to be boot flashed and racked to be contributing members of your servciety.
posted by Netzapper at 2:28 PM on January 26, 2012
Best answer: What is interesting in this discussion is the difference between cloud computing (where someone else owns the infrastructure and clients rent capacity) and virtual computing (where the people using the virtual infrastructure own and operate the bare metal infrastructure).
When people talk about scaling up or down, adding an abstract block device, etc. I feel like they are talking about cloud computing, as opposed to virtualizing your infrastructure. Cloud computing is the commoditized service of virtualization and is a result of the emergent benefits of virtual computing.
Let me put it this way: amazon was using virtual machines to power their infrastructure long before they offered amazon ec2 to consumers. For amazon, to grow their disk capacity, it very much meant actually buying physical hard drives and plunking them down into some physical computers and powering them. From a certain perspective, they still had the same process. However, the ability to extract far more value at far lower prices with a great deal more operational flexibility due to virtualization enabled them to offer Cloud computing as a product to consumers. It's a brilliant move, commoditizing your components, first as an internal service and then to your own competitors.
The main gist of what I'm trying to say is that even if you aren't adding capacity instantly because someone else installed the hard drives and plugged in the servers, meaning you actually own the infrastructure, virtualization still makes a heck of alot of sense.
posted by Freen at 3:17 PM on January 26, 2012
When people talk about scaling up or down, adding an abstract block device, etc. I feel like they are talking about cloud computing, as opposed to virtualizing your infrastructure. Cloud computing is the commoditized service of virtualization and is a result of the emergent benefits of virtual computing.
Let me put it this way: amazon was using virtual machines to power their infrastructure long before they offered amazon ec2 to consumers. For amazon, to grow their disk capacity, it very much meant actually buying physical hard drives and plunking them down into some physical computers and powering them. From a certain perspective, they still had the same process. However, the ability to extract far more value at far lower prices with a great deal more operational flexibility due to virtualization enabled them to offer Cloud computing as a product to consumers. It's a brilliant move, commoditizing your components, first as an internal service and then to your own competitors.
The main gist of what I'm trying to say is that even if you aren't adding capacity instantly because someone else installed the hard drives and plugged in the servers, meaning you actually own the infrastructure, virtualization still makes a heck of alot of sense.
posted by Freen at 3:17 PM on January 26, 2012
rwheindl: other than being able to rapidly reboot from hard drive image, how are two virtual machines on one host more secure than two separate hosts? I'm interested, because I've never thought about it from a security perspective.
posted by Freen at 3:20 PM on January 26, 2012
posted by Freen at 3:20 PM on January 26, 2012
Two virtual machines on one host are less secure than two separate physical hosts. I think rwheindl was talking about replacing one physical host with multiple virtual machines.
posted by qxntpqbbbqxl at 4:36 PM on January 26, 2012
posted by qxntpqbbbqxl at 4:36 PM on January 26, 2012
Best answer: Three aspects I'm not sure anybody has mentioned yet:
1. It is quite easy to make a virtual copy of a physical machine. The virtualised copy will do what the original did but it can be rapidly re-configured,taken off line, or copied. The physical machine can be thrown out or re-used. The best place to store such machines is often in the cloud where they can be cheaply supported and made accessible from anywhere.
2. You can convince a virtual machine that it has certain resources without having to really provide them. For instance my "100g=Gb" hard disk file will only consume 100Gb if it really contains that many files - and its "dedicated" DVD drive may correspond to a physical drive that is shared by many VMs.
3. VMs can be good tools for simulating physical (or virtual) systems that are in development. We can simulate network traffic for example.
posted by rongorongo at 4:40 PM on January 26, 2012
1. It is quite easy to make a virtual copy of a physical machine. The virtualised copy will do what the original did but it can be rapidly re-configured,taken off line, or copied. The physical machine can be thrown out or re-used. The best place to store such machines is often in the cloud where they can be cheaply supported and made accessible from anywhere.
2. You can convince a virtual machine that it has certain resources without having to really provide them. For instance my "100g=Gb" hard disk file will only consume 100Gb if it really contains that many files - and its "dedicated" DVD drive may correspond to a physical drive that is shared by many VMs.
3. VMs can be good tools for simulating physical (or virtual) systems that are in development. We can simulate network traffic for example.
posted by rongorongo at 4:40 PM on January 26, 2012
other than being able to rapidly reboot from hard drive image, how are two virtual machines on one host more secure than two separate hosts? I'm interested, because I've never thought about it from a security perspective.
It's more virtualization is more secure than shared hosting i.e. one physical host and one server instance running multiple sites with multiple associated user accounts (which has performance as well as security implications).
posted by tallus at 8:32 PM on January 26, 2012
It's more virtualization is more secure than shared hosting i.e. one physical host and one server instance running multiple sites with multiple associated user accounts (which has performance as well as security implications).
posted by tallus at 8:32 PM on January 26, 2012
« Older What word processor handles foreign languages well... | Should I file a claim with my auto insurance? Newer »
This thread is closed to new comments.
posted by Good Brain at 1:50 PM on January 26, 2012