public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* VMI Interface Proposal Documentation for I386, Part 1
@ 2006-03-13 19:50 Zachary Amsden
  0 siblings, 0 replies; only message in thread
From: Zachary Amsden @ 2006-03-13 19:50 UTC (permalink / raw)
  To: Linux Kernel Mailing List

This still didn't seem to make it through vger, but gives a very good 
overview of what the interface is designed to do.


    Paravirtualization API Version 2.0

    Zachary Amsden, Daniel Arai, Daniel Hecht, Pratap Subrahmanyam
    Copyright (C) 2005, 2006, VMware, Inc.
    All rights reserved

Revision history:
         1.0: Initial version
         1.1: arai 2005-11-15
              Added SMP-related sections: AP startup and Local APIC support
         1.2: dhecht 2006-02-23
              Added Time Interface section and Time related VMI calls

Contents

1) Motivations
2) Overview
    Initialization
    Privilege model
    Memory management
    Segmentation
    Interrupt and I/O subsystem
    IDT management
    Transparent Paravirtualization
    3rd Party Extensions
    AP Startup
    State Synchronization in SMP systems
    Local APIC Support
    Time Interface
3) Architectural Differences from Native Hardware
4) ROM Implementation
    Detection
    Data layout
    Call convention
    PCI implementation

Appendix A - VMI ROM low level ABI
Appendix B - VMI C prototypes
Appendix C - Sensitive x86 instructions


1) Motivations

   There are several high level goals which must be balanced in designing
   an API for paravirtualization.  The most general concerns are:

   Portability      - it should be easy to port a guest OS to use the API
   High performance - the API must not obstruct a high performance
                  hypervisor implementation
   Maintainability  - it should be easy to maintain and upgrade the guest
                      OS
   Extensibility    - it should be possible for future expansion of the
                      API

   Portability.

     The general approach to paravirtualization rather than full
     virtualization is to modify the guest operating system.  This means
     there is implicitly some code cost to port a guest OS to run in a
     paravirtual environment.  The closer the API resembles a native
     platform which the OS supports, the lower the cost of porting.
     Rather than provide an alternative, high level interface for this
     API, the approach is to provide a low level interface which
     encapsulates the sensitive and performance critical parts of the
     system.  Thus, we have direct parallels to most privileged
     instructions, and the process of converting a guest OS to use these
     instructions is in many cases a simple replacement of one function
     for another. Although this is sufficient for CPU virtualization,
     performance concerns have forced us to add additional calls for
     memory management, and notifications about updates to certain CPU
     data structures. Support for this in the Linux operating system has
     proved to be very minimal in cost because of the already somewhat
     portable and modular design of the memory management layer.

  High Performance.

     Providing a low level API that closely resembles hardware does not
     provide any support for compound operations; indeed, typical
     compound operations on hardware can be updating of many page table
     entries, flushing system TLBs, or providing floating point safety.
     Since these operations may require several privileged or sensitive
     operations, it becomes important to defer some of these operations
     until explicit flushes are issued, or to provide higher level
     operations around some of these functions.  In order to keep with
     the goal of portability, this has been done only when deemed
     necessary for performance reasons, and we have tried to package
     these compound operations into methods that are typically used in
     guest operating systems.  In the future, we envision that additional
     higher level abstractions will be added as an adjunct to the
     low-level API.  These higher level abstractions will target large
     bulk operations such as creation, and destruction of address spaces,
     context switches, thread creation and control.

  Maintainability.

     In the course of development with a virtualized environment, it is
     not uncommon for support of new features or higher performance to
     require radical changes to the operation of the system.  If these
     changes are visible to the guest OS in a paravirtualized system,
     this will require updates to the guest kernel, which presents a
     maintenance problem.  In the Linux world, the rapid pace of
     development on the kernel means new kernel versions are produced
     every few months.  This rapid pace is not always appropriate for end
     users, so it is not uncommon to have dozens of different versions of
     the Linux kernel in use that must be actively supported.  To keep
     this many versions in sync with potentially radical changes in the
     paravirtualized system is not a scalable solution.  To reduce the
     maintenance burden as much as possible, while still allowing the
     implementation to accommodate changes, the design provides a stable
     ABI with semantic invariants.  The underlying implementation of the
     ABI and details of what data or how it communicates with the
     hypervisor are not visible to the guest OS.  As a result, in most
     cases, the guest OS need not even be recompiled to work with a newer
     hypervisor.  This allows performance optimizations, bug fixes,
     debugging, or statistical instrumentation to be added to the API
     implementation without any impact on the guest kernel.  This is
     achieved by publishing a block of code from the hypervisor in the
     form of a ROM.  The guest OS makes calls into this ROM to perform
     privileged or sensitive actions in the system.

  Extensibility.

     In order to provide a vehicle for new features, new device support,
     and general evolution, the API uses feature compartmentalization
     with controlled versioning.  The API is split into sections, with
     each section having independent versions.  Each section has a top
     level version which is incremented for each major revision, with a
     minor version indicating incremental level.  Version compatibility
     is based on matching the major version field, and changes of the
     major version are assumed to break compatibility.  This allows
     accurate matching of compatibility.  In the event of incompatible
     API changes, multiple APIs may be advertised by the hypervisor if it
     wishes to support older versions of guest kernels.  This provides
     the most general forward / backward compatibility possible.
     Currently, the API has a core section for CPU / MMU virtualization
     support, with additional sections provided for each supported device
     class.


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2006-03-13 19:50 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-13 19:50 VMI Interface Proposal Documentation for I386, Part 1 Zachary Amsden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox