CSE221 - lec10: Virtualization: VM370 & Xen

date
Nov 6, 2024
slug
cse221-lec10
status
Published
tags
System
summary
type
Post

10 - Virtualization: VM370 & Xen

Approaches to virtualize hardware

notion image
  • difference: where does the virtual machine monitors run
    • for hosted: user level
    • for bare metal: privileged level

The Origin of the VM/370 Time-Sharing System

Why Virtual Machine?

  • convenient OS development: could utilize OS functionality when developing OS
  • sharing while maintaining better isolation
  • emulate new hardware
  • handle hardware evolution
  • continous operation: just migrate vm
  • full access to hardware

Approach to virtualization

  • when execute privileged instructions, trap into virtual machine monitor
  • the vmm emulate the instructions for the guest OS
  • “trap and emulate”
notion image

 Xen and the Art of Virtualization

Xen is a very influential paper for several reasons: 1. released open-source 2. convenient for research experimentation 3. industry: amazon AWS

Why VMs (in the 2000s)

  • better isolation, support many OSes , … as in VM370
  • datacenters settings

Requirements for datacenter VMs

  • security & isolation
  • support unmodified applications
  • resource management & accounting
  • high performance, low overhead
  • scalibility (support lots of VMs concurrently)
  • reliability

Structure of Xen

notion image

Challenges of virtualizing x86

  • hardware managed TLB (x86, arm)
  • privileged instructions don’t always trap:
    • may have different behaviors under different privilege levels
    • may fails silently
  • lack of tagged TLB
  • I/O overheads

Xen’s Approach: paravirtualization (instead of full virtualization)

  • expose an vm abstraction different from underlying hardware, may modify guest OSes
  • replace privileged instructions with hypercalls into hypervisor
  • similar to L4’s approach to replace syscall to L4Linux with messages to L4 kernel

VMWare’s Approach: binary rewriting

  • rewrite privilged instructions to call into hypervisor at runtime
  • more overhead

Resources to virtualize

  • CPU
    • protection: Guest OS must run at a lower privilege level than Xen
    • exceptions: Guest OS must register a descriptor table for exceptions handlers with Xen.
    • syscalls: Guest OS may install a ‘fast’ handler for syscalls, allowing direct calls from an application
    • interrupts: hardware interrupts are replaced with a lightweight event system(per-domain bitmap, Xen will set bits before invoking callback handlers)
    • time: each guest OS has a timer interface and is aware of both ‘real’ and ‘virtual’ time.
  • Memory
    • Guest OS has direct read access to hardware page tables, but updates are batched and validated by the hypervisor. A domain may be allocated discontiguous machine pages.
    • similar to Exokernel’s approach
    • VMWare’s Approach: shadow page table, similar to L4’s approach
    • In a traditional OS, page tables map from virtual pages to physical pages (purple arrow below). With virtual machines, there is an additional level of indirection to map the "physical" pages seen by the GuestOS to the actual hardware or machine pages. In Xen, the way that page tables work is that the GuestOS's page tables map all the way from virtual pages to hardware pages. These are the page tables that are exposed to the hardware, to use to resolve TLB misses. GuestOSes modify these page tables by issuing hypercalls to Xen, so that Xen can validate changes. But, how does a GuestOS know the hardware address for a given "physical" page and vice versa? Xen exposes data structures to help maintain this information, which can be updated by GuestOSes (with validation, as usual). Guest OSes do not maintain page tables to explicitly map from virtual pages to "physical" pages.
      As an example of how this all fits together, consider what happens when a process in a VM tries to access a newly allocated region of memory. This will trigger a page fault. The GuestOS in the VM will select a free "physical" page to allocate to this process. Then the GuestOS will look up the corresponding hardware page number for that page and issue a hypercall to Xen to install the mapping from virtual page number to hardware page number in the page table. When the hypercall returns and the process runs again, it will retry its memory access, and the MMU will be able to retrieve the virtual to hardware mapping from the page table.
notion image
  • Device I/O
    • provide virtual devices abstraction
    • data is transferred using async I/O rings
    • an event mechanism replace hardware interrupts for notifications (by using an event bitmap)
notion image

Summary

  • trap and emulate approach to virtualize hardware
  • x86 is hard to virtualize
    • approach: paravirtualization
    • approach: binary rewriting
    • approach: hardware support

Hardware Support for Virtualization

CPU

  • an extra privilege level
    • ring 3: applications
    • ring 0: guest OS
    • root ring 0: hypervisor

Memory

  • Intel’s extended page table (V → “P” → P)

© Lifan Sun 2023 - 2024