CSE221 - lec10: Virtualization: VM370 & Xen
date
Nov 6, 2024
slug
cse221-lec10
status
Published
tags
System
summary
type
Post
10 - Virtualization: VM370 & Xen
Approaches to virtualize hardware
- difference: where does the virtual machine monitors run
- for hosted: user level
- for bare metal: privileged level
The Origin of the VM/370 Time-Sharing System
Why Virtual Machine?
- convenient OS development: could utilize OS functionality when developing OS
- sharing while maintaining better isolation
- emulate new hardware
- handle hardware evolution
- continous operation: just migrate vm
- full access to hardware
Approach to virtualization
- when execute privileged instructions, trap into virtual machine monitor
- the vmm emulate the instructions for the guest OS
- “trap and emulate”
Xen and the Art of Virtualization
Xen is a very influential paper for several reasons: 1. released open-source 2. convenient for research experimentation 3. industry: amazon AWS
Why VMs (in the 2000s)
- better isolation, support many OSes , … as in VM370
- datacenters settings
Requirements for datacenter VMs
- security & isolation
- support unmodified applications
- resource management & accounting
- high performance, low overhead
- scalibility (support lots of VMs concurrently)
- reliability
Structure of Xen
Challenges of virtualizing x86
- hardware managed TLB (x86, arm)
- privileged instructions don’t always trap:
- may have different behaviors under different privilege levels
- may fails silently
- lack of tagged TLB
- I/O overheads
Xen’s Approach: paravirtualization (instead of full virtualization)
- expose an vm abstraction different from underlying hardware, may modify guest OSes
- replace privileged instructions with hypercalls into hypervisor
- similar to L4’s approach to replace syscall to L4Linux with messages to L4 kernel
VMWare’s Approach: binary rewriting
- rewrite privilged instructions to call into hypervisor at runtime
- more overhead
Resources to virtualize
- CPU
- protection: Guest OS must run at a lower privilege level than Xen
- exceptions: Guest OS must register a descriptor table for exceptions handlers with Xen.
- syscalls: Guest OS may install a ‘fast’ handler for syscalls, allowing direct calls from an application
- interrupts: hardware interrupts are replaced with a lightweight event system(per-domain bitmap, Xen will set bits before invoking callback handlers)
- time: each guest OS has a timer interface and is aware of both ‘real’ and ‘virtual’ time.
- Memory
- Guest OS has direct read access to hardware page tables, but updates are batched and validated by the hypervisor. A domain may be allocated discontiguous machine pages.
- similar to Exokernel’s approach
- VMWare’s Approach: shadow page table, similar to L4’s approach
In a traditional OS, page tables map from virtual pages to physical pages (purple arrow below). With virtual machines, there is an additional level of indirection to map the "physical" pages seen by the GuestOS to the actual hardware or machine pages. In Xen, the way that page tables work is that the GuestOS's page tables map all the way from virtual pages to hardware pages. These are the page tables that are exposed to the hardware, to use to resolve TLB misses. GuestOSes modify these page tables by issuing hypercalls to Xen, so that Xen can validate changes. But, how does a GuestOS know the hardware address for a given "physical" page and vice versa? Xen exposes data structures to help maintain this information, which can be updated by GuestOSes (with validation, as usual). Guest OSes do not maintain page tables to explicitly map from virtual pages to "physical" pages.As an example of how this all fits together, consider what happens when a process in a VM tries to access a newly allocated region of memory. This will trigger a page fault. The GuestOS in the VM will select a free "physical" page to allocate to this process. Then the GuestOS will look up the corresponding hardware page number for that page and issue a hypercall to Xen to install the mapping from virtual page number to hardware page number in the page table. When the hypercall returns and the process runs again, it will retry its memory access, and the MMU will be able to retrieve the virtual to hardware mapping from the page table.
- Device I/O
- provide virtual devices abstraction
- data is transferred using async I/O rings
- an event mechanism replace hardware interrupts for notifications (by using an event bitmap)
Summary
- trap and emulate approach to virtualize hardware
- x86 is hard to virtualize
- approach: paravirtualization
- approach: binary rewriting
- approach: hardware support
Hardware Support for Virtualization
CPU
- an extra privilege level
- ring 3: applications
- ring 0: guest OS
- root ring 0: hypervisor
Memory
- Intel’s extended page table (V → “P” → P)