posted last year in Dev Platform category by Soyoung Jeong
These days we can see virtualization technology everywhere in our daily life. In this article I will explain about the basic knowledge of virtualization, mainly with x86 architecture, the technology used for CPU, memory, and IO, as well as the trends of virtualization technology development.
Benefits of Server Virtualization
The universalization of virtual machines has brought many positive effects on computing environments and internet business. The most noticeable advantage of virtual machines is that they reduce server costs significantly. For example, many internet venture companies benefit from Amazon EC2. In addition to such cost merit, virtual machines also provide many benefits in terms of operation. They enable you to provide server infrastructure fast, and to control resources, such as memory size, remotely.
Since 2008, NHN has used virtual machines (VM) when a server is required for development/test purposes or, in some cases, to provide actual services. Ncloud Dev. Team at NHN released Ncloud, an integrated cloud solution service, in 2011, and has provided virtual servers both internally and externally.
The benefit of cloud solutions like Ncloud to developers is that it has universalized the application and use of servers. VMs enable you to get a server that you can use within a few minutes without any particular payment.
The field of servers is not the only place where such VMs are used. Many general users also use a virtualization solution product, such as VirtualBox and VMWare. VMs are in such a close distance to us.
Then, how can virtualization be made?
Hypervisor, the First Step to an Understanding of Virtualization
To understand VMs, you should learn about hypervisor, first. Basically, it is a program which controls and manages the operating system. Though the hypervisor method is the most used one among the virtualization technologies, it is not the only method for virtualization. This article will mainly focus on hypervisor but I will also explain other methods.
Figure 1: Classification of Virtualization Techniques.
A hypervisor is a piece of software that enables you to run one or more VMs on a physical server (host). In server virtualization, each VM is called a guest system (in some cases, guest OS), and a physical server on which such guest system runs is called a host system (in some cases, host OS).
Solutions such as Xen, VirtualBox, and VMWare are applicable to hypervisor. A hypervisor is divided into type 1 and type 2.
- Type 1: A hypervisor runs on the host system without an OS (i.e., without a host OS).
- Type 2: A host OS is installed on the host system and a hypervisor runs on it.
The hypervisor type that is familiar to general PC users is, of course, Type 2. VirtualBox and VMWare Workstation are all Type 2.
A VM solution does not require a host OS at all times. This is because their purpose is to provide VMs. This Type 1 includes Xen and VMWare ESX.
The CPU installed on the host is only one set, but each VM that runs on the host requires their own CPU. This means the CPU itself also needs to be virtualized. Such CPU virtualization is made by a hypervisor. A hypervisor converts a set of CPU commands called by the OS of a VM (guest) and delivers it to the CPU and receives the result and delivers it.
There are several methods for converting and delivering a set of CPU commands. To better understand this, you need to first learn about privileged architecture, which is called the Hierarchical Protected Domains or Protection Ring of x86 server architecture. Figure 2 below shows this architecture:
Figure 2: x86 Privileged Architecture.
An OS that runs on x86 is designed on the assumption that it has all access/control authority for hardware. x86 architecture, as shown in Figure 2, uses four levels of Ring 0, 1, 2, and 3 to manage hardware access authority. While a user application is executed at Ring 3, an OS is performed at Ring 0 as it needs to directly access to memory or hardware.
This also applies to VMs themselves. VMs also require an OS (guest OS), and this OS requires Ring 0 authority. As this guest OS is unable to have Ring 0 authority directly, it should obtain Ring 0 authority by using a very complex method. This is the reason the complexity of x86 virtualization occurs. After all, the key to CPU virtualization is how privileged commands requested by the guest OS are processed at which level (i.e., by whom), and depending on it, CPU virtualization is divided into several methods, such as full virtualization and para virtualization, as shown in Figure 1 above.
In full virtualization, usually the hypervisor has Ring 0 authority and the guest OS has Ring 1 authority (see Figure 3 below):
Figure 3: Full Virtualization.
In full virtualization, machine language codes of the guest OS are converted into the machine language codes of the host through binary translation process. When a privileged command, such as device driver access, is required, a trap for device access event is executed on the hypervisor. Therefore, a variety of OS kinds, including MS Windows, can run on the hypervisor without any modification, but the speed is somewhat low due to the machine code conversion process.
To improve this, CPU manufacturers, including Intel (Intel VT) and AMD (AMD-V), provide a variety of functionalities to reduce virtualization overhead at the hardware level. A variety of technologies are being developed so that full virtualization can also provide as good performance as para virtualization (which is regarded to show better performance).
Figure 4: HW-assisted Virtualization.
A CPU that supports hardware-assisted virtualization additionally provides Ring-1 level (don't confuse with Ring 1), as shown in Figure 4 above, and the hypervisor runs on Ring-1 while the OS runs on Ring 0. Therefore, basically it does not require the process of binary translation for privileged commands, and a command is executed directly to hardware via the hypervisor, and this makes it much faster than the full virtualization method This has significantly reduced the performance gap between the full virtualization method and the para virtualization method (see Figure 5):
Figure 5L: 'Intel VT-x'-based Full Virtualization.
In the para virtualization method, when a privileged command is to be executed on the guest OS, it is delivered to the hypervisor by using a hypercall, a kind of system call, instead of OS, and the hypervisor receives this hypercall and accesses the hardware and returns the result (see Figure 6):
Figure 6: Xen's Para Virtualization.
As a guest OS is able to have the authority for direct control of resources, such as CPU and memory, the para virtualization method is relatively faster than the full virtualization method. However, it is a disadvantage that the OS kernel should be modified so that the VM OS can use a hypercall. Unlike full virtualization, therefore, para virtualization provides only a limited number of guest OSs the hypervisor can support (for Linux, approximately 20% of the entire kernel codes has been modified for para virtualization).
"Does Xen perform better than VMWare, as VMWare uses full virtualization and Xen uses para virtualization?"
I was personally asked of this question from some of my acquaintances. It's half right and half wrong. The early versions of VMWare employed full virtualization, but through a long-term commercialization process, it has been changed to support para virtualization as well. Currently, VMWare also runs many VM OSs in the para virtualization method (see Figure 7 below), and thus commercialized VMWare is now regarded to have little difference in functionality and performance, compared to Xen's hypervisor.
Figure 7: VMWare's Para Virtualization.
Host OS Virtualization
Host OS virtualization is a method in which an OS itself provides the hypervisor functionality (see Figure 8):
Figure 8: Host OS Virtualization.
This method could be thought of as an efficient method as virtualization environment is supported on the host OS. However, in fact, it is hardly used in server environments due to its weakness in inter-VM resource management, performance and security. For example, when a security issue occurs on the host OS where the hypervisor runs, the reliability of the entire guest OS can also have a problem. However, you can use it without any particular problem when you use it to use multiple OSs on your PC at the same time.
The container-based virtualization method is a method in which a Virtual Server Environment (VSE) is added to configure VM OS on the host OS instead of hypervisor and only the guest OS is emulated (see Figure 9).
Figure 9: Container-based Virtualization.
Whereas the virtualization technique by using the hypervisor method is the one for the entire hardware, the container-based method is only for the guest OS environment. For this reason, container-based virtualization is relatively light and shows good performance. However, this method is not much used in server environments because, like the host OS virtualization method, it has a weakness in inter-VM resource management and security management.
Of various memory management methods used on a guest OS, this article will explain the most used Shadow Page Table method.
Shadow Page Table
In general, an OS uses a page table to manage memory. Each process on an OS has a virtual memory address system (virtual address) it recognizes. A page table is used when matching the virtual memory address with a memory address system provided by a physical server (physical address). Frequently used address matching information is processed fast through hardware caches, and this is called Translation Lookup Buffer (TLB). Therefore, a guest OS also should have a page table, but the problem is that the guest OS can't access the memory address system of the host directly but only through the hypervisor. For this reason, a page table on the guest OS changes a virtual address on the OS into a physical address recognized by the guest OS. It is additionally required to match the memory address system of each VM provided by the hypervisor with the actual address system of the host (see Figure 10):
Figure 10: Configuration of Memory between Guest and Host.
A shadow page table is a technique used to reduce the overhead required for such virtual memory address translation. It allows you to avoid redundant virtual address translation by enabling you to translate a virtual memory address system on the guest OS directly into the address system of the host OS. For this, when any update occurs on the page table of the guest OS, the same update should be made on the shadow page table as well. The problem is that this task may make a significant negative influence on the guest OS performance. To address this problem, CPU manufacturers, such as Intel and AMD, provide functionalities to enhance memory-related performance at the hardware level, such as an Extended Page Table (EPT) (see Figure 11):
Figure 11: Intel EPT (Extended Page Table).
The methods provided by Intel and AMD have almost the same mechanism. Simply speaking, a kind of super TLB enables the virtual address of the guest OS to be translated directly into the physical address of the host.
Although TLB provides such information, TLB may be flushed if the guest OS itself is context-switched. To prevent this, a unique tag for each VM is provided to TLB, and through this, TLB flush can be prevented because it becomes recognizable to which guest each TLB entry belongs. As a result, this can provide a faster and more efficient memory management method.
When it comes to network virtualization, a variety of methods are provided for each hypervisor. The most common method is the one in which virtual switch is provided in the hypervisor and the network interface of the guest is connected with the NIC of the host (see Figure 12):
Figure 12: Network Virtualization.
The disadvantage of this method is that, when multiple guests run on the host and the network traffic is high, the virtual switch could be a bottleneck. To avoid this, you should use multiple NICs on the host and adjust the virtual switch connected to each NIC so that the network bandwidth used by the guest can be distributed.
However, this method is not perfect, either. It is not that efficient in some tasks, such as placing guests dynamically and moving a guest to another host by using live migration.
For a more ultimate solution, Intel provides the functionality of improving network performance by using some methods such as VT-c. The main VT-c technologies are: VMDq and SR-IOV.
VMDq (Virtual Machine Device Queue)
A hypervisor internally provides the virtual switch in software method. It provides the functionality of forwarding and switching packets sent from outside (see Figure 13 below). However, these methods do not provide fast network speed of guest systems.
To address this, Intel provides a method to boost the speed of packet classification by allowing the NIC level of the host to have a packet queue for each guest (see Figure 14).
Figure 13: When VMDq is not Used.
Figure 14: When VMDq is Used.
However, VMDq is supported by only some NICs of Intel.
SR-IOV (Single-Root IO Virtualization)
The concept of SR-IOV is as follows: it is a method in which the host PCI is virtualized as if multiple PCIs physically exist, and these virtual PCIs are directly allocated to each guest (see Figure 15 below). Strictly speaking, SR-IOV is not an independent technology of Intel. It is a technology, which is designed by Intel in a way to meet the standards provided by PCI-SIG (PCI -Special Interest Group) and is provided by our NIC. With SR-IOV, one physical I/O port is virtualized into multiple Virtual Functions (VFs), and each VF is allocated directly to the guest. In this case, the guest can exert I/O performance similar to when a physical PCI is directly allocated, and the deterioration of network performance due to virtualization is also significantly resolved.
Figure 15: SR-IOV.
However, SR-IOV is also supported only by some NICs of Intel.
Cloud solution developers' ultimate goal is to allow VM users to feel little difference in terms of functionality and performance, compared to when they use a physical server. One should understand, however, it is impossible to provide VMs that provide completely same functionality and performance as those of a physical server. As hypervisor and server hardware are making rapid progress at the moment, much more advanced VMs than the current ones will be provided in the future. As the genuine value of VMs is fast supply and cost reduction, which is way better than a physical server, using VMs for a variety of purposes is a way to enhance your development competitiveness and also help reduce costs.
By Soyoung Jeong, General Manager at Ncloud Development Team, NHN Corporation.