* [RESEND] Documentation Patches - 2/4
@ 2005-09-19 16:22 Robb Romans
0 siblings, 0 replies; only message in thread
From: Robb Romans @ 2005-09-19 16:22 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: text/plain, Size: 646 bytes --]
Resending.
---------- Forwarded Message ----------
Subject: [PATCH] Documentation Patches - 2/4
Date: Wednesday 14 September 2005 02:13 pm
From: Robb Romans <3r@us.ibm.com>
To: "xen-devel" <xen-devel@lists.xensource.com>
This patch creates a separate file for hypercalls documentation.
Signed-Off-By: Robb Romans <3r@us.ibm.com>
--
Robb Romans (512) 838-0419
Linux Commando T/L 678-0419
ARS NA5TT
.-- - ..-. ..--..
-------------------------------------------------------
--
Robb Romans (512) 838-0419
Linux Commando T/L 678-0419
ARS NA5TT
.-- - ..-. ..--..
[-- Attachment #2: 6818-doc-hypercalls.diff --]
[-- Type: text/x-diff, Size: 39185 bytes --]
# HG changeset patch
# User Robb Romans <3r@us.ibm.com>
# Node ID 1e255eacf158c51d5d4efdd7b17c4d1ce2a6be62
# Parent f619a10fdb762bfc9f061622e6aea1bd6c5e5fb3
Separate hypercalls information into separate file.
Depends on 6817-fix-makefile.diff
Signed-Off-By: Robb Romans <3r@us.ibm.com>
diff -r f619a10fdb76 -r 1e255eacf158 docs/src/interface.tex
--- a/docs/src/interface.tex Wed Sep 14 18:08:36 2005
+++ b/docs/src/interface.tex Wed Sep 14 18:22:54 2005
@@ -622,549 +622,10 @@
-
\appendix
-%\newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}}
-
-
-
-
-
-\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
-
-
-
-
-
-
-\chapter{Xen Hypercalls}
-\label{a:hypercalls}
-
-Hypercalls represent the procedural interface to Xen; this appendix
-categorizes and describes the current set of hypercalls.
-
-\section{Invoking Hypercalls}
-
-Hypercalls are invoked in a manner analogous to system calls in a
-conventional operating system; a software interrupt is issued which
-vectors to an entry point within Xen. On x86\_32 machines the
-instruction required is {\tt int \$82}; the (real) IDT is setup so
-that this may only be issued from within ring 1. The particular
-hypercall to be invoked is contained in {\tt EAX} --- a list
-mapping these values to symbolic hypercall names can be found
-in {\tt xen/include/public/xen.h}.
-
-On some occasions a set of hypercalls will be required to carry
-out a higher-level function; a good example is when a guest
-operating wishes to context switch to a new process which
-requires updating various privileged CPU state. As an optimization
-for these cases, there is a generic mechanism to issue a set of
-hypercalls as a batch:
-
-\begin{quote}
-\hypercall{multicall(void *call\_list, int nr\_calls)}
-
-Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
-the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
-call\_list}. Each entry contains the hypercall operation code followed
-by up to 7 word-sized arguments.
-\end{quote}
-
-Note that multicalls are provided purely as an optimization; there is
-no requirement to use them when first porting a guest operating
-system.
-
-
-\section{Virtual CPU Setup}
-
-At start of day, a guest operating system needs to setup the virtual
-CPU it is executing on. This includes installing vectors for the
-virtual IDT so that the guest OS can handle interrupts, page faults,
-etc. However the very first thing a guest OS must setup is a pair
-of hypervisor callbacks: these are the entry points which Xen will
-use when it wishes to notify the guest OS of an occurrence.
-
-\begin{quote}
-\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
- event\_address, unsigned long failsafe\_selector, unsigned long
- failsafe\_address) }
-
-Register the normal (``event'') and failsafe callbacks for
-event processing. In each case the code segment selector and
-address within that segment are provided. The selectors must
-have RPL 1; in XenLinux we simply use the kernel's CS for both
-{\tt event\_selector} and {\tt failsafe\_selector}.
-
-The value {\tt event\_address} specifies the address of the guest OSes
-event handling and dispatch routine; the {\tt failsafe\_address}
-specifies a separate entry point which is used only if a fault occurs
-when Xen attempts to use the normal callback.
-\end{quote}
-
-
-After installing the hypervisor callbacks, the guest OS can
-install a `virtual IDT' by using the following hypercall:
-
-\begin{quote}
-\hypercall{set\_trap\_table(trap\_info\_t *table)}
-
-Install one or more entries into the per-domain
-trap handler table (essentially a software version of the IDT).
-Each entry in the array pointed to by {\tt table} includes the
-exception vector number with the corresponding segment selector
-and entry point. Most guest OSes can use the same handlers on
-Xen as when running on the real hardware; an exception is the
-page fault handler (exception vector 14) where a modified
-stack-frame layout is used.
-
-
-\end{quote}
-
-
-
-\section{Scheduling and Timer}
-
-Domains are preemptively scheduled by Xen according to the
-parameters installed by domain 0 (see Section~\ref{s:dom0ops}).
-In addition, however, a domain may choose to explicitly
-control certain behavior with the following hypercall:
-
-\begin{quote}
-\hypercall{sched\_op(unsigned long op)}
-
-Request scheduling operation from hypervisor. The options are: {\it
-yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the
-calling domain runnable but may cause a reschedule if other domains
-are runnable. {\it block} removes the calling domain from the run
-queue and cause is to sleeps until an event is delivered to it. {\it
-shutdown} is used to end the domain's execution; the caller can
-additionally specify whether the domain should reboot, halt or
-suspend.
-\end{quote}
-
-To aid the implementation of a process scheduler within a guest OS,
-Xen provides a virtual programmable timer:
-
-\begin{quote}
-\hypercall{set\_timer\_op(uint64\_t timeout)}
-
-Request a timer event to be sent at the specified system time (time
-in nanoseconds since system boot). The hypercall actually passes the
-64-bit timeout value as a pair of 32-bit values.
-
-\end{quote}
-
-Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op}
-allows block-with-timeout semantics.
-
-
-\section{Page Table Management}
-
-Since guest operating systems have read-only access to their page
-tables, Xen must be involved when making any changes. The following
-multi-purpose hypercall can be used to modify page-table entries,
-update the machine-to-physical mapping table, flush the TLB, install
-a new page-table base pointer, and more.
-
-\begin{quote}
-\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)}
-
-Update the page table for the domain; a set of {\tt count} updates are
-submitted for processing in a batch, with {\tt success\_count} being
-updated to report the number of successful updates.
-
-Each element of {\tt req[]} contains a pointer (address) and value;
-the least significant 2-bits of the pointer are used to distinguish
-the type of update requested as follows:
-\begin{description}
-
-\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
-page table entry to the associated value; Xen will check that the
-update is safe, as described in Chapter~\ref{c:memory}.
-
-\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the
- machine-to-physical table. The calling domain must own the machine
- page in question (or be privileged).
-
-\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations.
-The set of additional MMU operations is considerable, and includes
-updating {\tt cr3} (or just re-installing it for a TLB flush),
-flushing the cache, installing a new LDT, or pinning \& unpinning
-page-table pages (to ensure their reference count doesn't drop to zero
-which would require a revalidation of all entries).
-
-Further extended commands are used to deal with granting and
-acquiring page ownership; see Section~\ref{s:idc}.
-
-
-\end{description}
-
-More details on the precise format of all commands can be
-found in {\tt xen/include/public/xen.h}.
-
-
-\end{quote}
-
-Explicitly updating batches of page table entries is extremely
-efficient, but can require a number of alterations to the guest
-OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
-recommended for new OS ports.
-
-Regardless of which page table update mode is being used, however,
-there are some occasions (notably handling a demand page fault) where
-a guest OS will wish to modify exactly one PTE rather than a
-batch. This is catered for by the following:
-
-\begin{quote}
-\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long
-val, \\ unsigned long flags)}
-
-Update the currently installed PTE for the page {\tt page\_nr} to
-{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification
-is safe before applying it. The {\tt flags} determine which kind
-of TLB flush, if any, should follow the update.
-
-\end{quote}
-
-Finally, sufficiently privileged domains may occasionally wish to manipulate
-the pages of others:
-\begin{quote}
-
-\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr,
-unsigned long val, unsigned long flags, uint16\_t domid)}
-
-Identical to {\tt update\_va\_mapping()} save that the pages being
-mapped must belong to the domain {\tt domid}.
-
-\end{quote}
-
-This privileged operation is currently used by backend virtual device
-drivers to safely map pages containing I/O data.
-
-
-
-\section{Segmentation Support}
-
-Xen allows guest OSes to install a custom GDT if they require it;
-this is context switched transparently whenever a domain is
-[de]scheduled. The following hypercall is effectively a
-`safe' version of {\tt lgdt}:
-
-\begin{quote}
-\hypercall{set\_gdt(unsigned long *frame\_list, int entries)}
-
-Install a global descriptor table for a domain; {\tt frame\_list} is
-an array of up to 16 machine page frames within which the GDT resides,
-with {\tt entries} being the actual number of descriptor-entry
-slots. All page frames must be mapped read-only within the guest's
-address space, and the table must be large enough to contain Xen's
-reserved entries (see {\tt xen/include/public/arch-x86\_32.h}).
-
-\end{quote}
-
-Many guest OSes will also wish to install LDTs; this is achieved by
-using {\tt mmu\_update()} with an extended command, passing the
-linear address of the LDT base along with the number of entries. No
-special safety checks are required; Xen needs to perform this task
-simply since {\tt lldt} requires CPL 0.
-
-
-Xen also allows guest operating systems to update just an
-individual segment descriptor in the GDT or LDT:
-
-\begin{quote}
-\hypercall{update\_descriptor(unsigned long ma, unsigned long word1,
-unsigned long word2)}
-
-Update the GDT/LDT entry at machine address {\tt ma}; the new
-8-byte descriptor is stored in {\tt word1} and {\tt word2}.
-Xen performs a number of checks to ensure the descriptor is
-valid.
-
-\end{quote}
-
-Guest OSes can use the above in place of context switching entire
-LDTs (or the GDT) when the number of changing descriptors is small.
-
-\section{Context Switching}
-
-When a guest OS wishes to context switch between two processes,
-it can use the page table and segmentation hypercalls described
-above to perform the the bulk of the privileged work. In addition,
-however, it will need to invoke Xen to switch the kernel (ring 1)
-stack pointer:
-
-\begin{quote}
-\hypercall{stack\_switch(unsigned long ss, unsigned long esp)}
-
-Request kernel stack switch from hypervisor; {\tt ss} is the new
-stack segment, which {\tt esp} is the new stack pointer.
-
-\end{quote}
-
-A final useful hypercall for context switching allows ``lazy''
-save and restore of floating point state:
-
-\begin{quote}
-\hypercall{fpu\_taskswitch(void)}
-
-This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
-control register; this means that the next attempt to use floating
-point will cause a trap which the guest OS can trap. Typically it will
-then save/restore the FP state, and clear the {\tt TS} bit.
-\end{quote}
-
-This is provided as an optimization only; guest OSes can also choose
-to save and restore FP state on all context switches for simplicity.
-
-
-\section{Physical Memory Management}
-
-As mentioned previously, each domain has a maximum and current
-memory allocation. The maximum allocation, set at domain creation
-time, cannot be modified. However a domain can choose to reduce
-and subsequently grow its current allocation by using the
-following call:
-
-\begin{quote}
-\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list,
- unsigned long nr\_extents, unsigned int extent\_order)}
-
-Increase or decrease current memory allocation (as determined by
-the value of {\tt op}). Each invocation provides a list of
-extents each of which is $2^s$ pages in size,
-where $s$ is the value of {\tt extent\_order}.
-
-\end{quote}
-
-In addition to simply reducing or increasing the current memory
-allocation via a `balloon driver', this call is also useful for
-obtaining contiguous regions of machine memory when required (e.g.
-for certain PCI devices, or if using superpages).
-
-
-\section{Inter-Domain Communication}
-\label{s:idc}
-
-Xen provides a simple asynchronous notification mechanism via
-\emph{event channels}. Each domain has a set of end-points (or
-\emph{ports}) which may be bound to an event source (e.g. a physical
-IRQ, a virtual IRQ, or an port in another domain). When a pair of
-end-points in two different domains are bound together, then a `send'
-operation on one will cause an event to be received by the destination
-domain.
-
-The control and use of event channels involves the following hypercall:
-
-\begin{quote}
-\hypercall{event\_channel\_op(evtchn\_op\_t *op)}
-
-Inter-domain event-channel management; {\tt op} is a discriminated
-union which allows the following 7 operations:
-
-\begin{description}
-
-\item[\it alloc\_unbound:] allocate a free (unbound) local
- port and prepare for connection from a specified domain.
-\item[\it bind\_virq:] bind a local port to a virtual
-IRQ; any particular VIRQ can be bound to at most one port per domain.
-\item[\it bind\_pirq:] bind a local port to a physical IRQ;
-once more, a given pIRQ can be bound to at most one port per
-domain. Furthermore the calling domain must be sufficiently
-privileged.
-\item[\it bind\_interdomain:] construct an interdomain event
-channel; in general, the target domain must have previously allocated
-an unbound port for this channel, although this can be bypassed by
-privileged domains during domain setup.
-\item[\it close:] close an interdomain event channel.
-\item[\it send:] send an event to the remote end of a
-interdomain event channel.
-\item[\it status:] determine the current status of a local port.
-\end{description}
-
-For more details see
-{\tt xen/include/public/event\_channel.h}.
-
-\end{quote}
-
-Event channels are the fundamental communication primitive between
-Xen domains and seamlessly support SMP. However they provide little
-bandwidth for communication {\sl per se}, and hence are typically
-married with a piece of shared memory to produce effective and
-high-performance inter-domain communication.
-
-Safe sharing of memory pages between guest OSes is carried out by
-granting access on a per page basis to individual domains. This is
-achieved by using the {\tt grant\_table\_op()} hypercall.
-
-\begin{quote}
-\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
-
-Grant or remove access to a particular page to a particular domain.
-
-\end{quote}
-
-This is not currently widely in use by guest operating systems, but
-we intend to integrate support more fully in the near future.
-
-\section{PCI Configuration}
-
-Domains with physical device access (i.e.\ driver domains) receive
-limited access to certain PCI devices (bus address space and
-interrupts). However many guest operating systems attempt to
-determine the PCI configuration by directly access the PCI BIOS,
-which cannot be allowed for safety.
-
-Instead, Xen provides the following hypercall:
-
-\begin{quote}
-\hypercall{physdev\_op(void *physdev\_op)}
-
-Perform a PCI configuration option; depending on the value
-of {\tt physdev\_op} this can be a PCI config read, a PCI config
-write, or a small number of other queries.
-
-\end{quote}
-
-
-For examples of using {\tt physdev\_op()}, see the
-Xen-specific PCI code in the linux sparse tree.
-
-\section{Administrative Operations}
-\label{s:dom0ops}
-
-A large number of control operations are available to a sufficiently
-privileged domain (typically domain 0). These allow the creation and
-management of new domains, for example. A complete list is given
-below: for more details on any or all of these, please see
-{\tt xen/include/public/dom0\_ops.h}
-
-
-\begin{quote}
-\hypercall{dom0\_op(dom0\_op\_t *op)}
-
-Administrative domain operations for domain management. The options are:
-
-\begin{description}
-\item [\it DOM0\_CREATEDOMAIN:] create a new domain
-
-\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run
-queue.
-
-\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
- once again.
-
-\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated
-with a domain
-
-\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain
-
-\item [\it DOM0\_SCHEDCTL:]
-
-\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
-
-\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain
-
-\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain
-
-\item [\it DOM0\_GETPAGEFRAMEINFO:]
-
-\item [\it DOM0\_GETPAGEFRAMEINFO2:]
-
-\item [\it DOM0\_IOPL:] set I/O privilege level
-
-\item [\it DOM0\_MSR:] read or write model specific registers
-
-\item [\it DOM0\_DEBUG:] interactively invoke the debugger
-
-\item [\it DOM0\_SETTIME:] set system time
-
-\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
-
-\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
-
-\item [\it DOM0\_GETTBUFS:] get information about the size and location of
- the trace buffers (only on trace-buffer enabled builds)
-
-\item [\it DOM0\_PHYSINFO:] get information about the host machine
-
-\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions
-
-\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
-
-\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
-
-\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain
-
-\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
-
-\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options
-\end{description}
-\end{quote}
-
-Most of the above are best understood by looking at the code
-implementing them (in {\tt xen/common/dom0\_ops.c}) and in
-the user-space tools that use them (mostly in {\tt tools/libxc}).
-
-\section{Debugging Hypercalls}
-
-A few additional hypercalls are mainly useful for debugging:
-
-\begin{quote}
-\hypercall{console\_io(int cmd, int count, char *str)}
-
-Use Xen to interact with the console; operations are:
-
-{\it CONSOLEIO\_write}: Output count characters from buffer str.
-
-{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
-\end{quote}
-
-A pair of hypercalls allows access to the underlying debug registers:
-\begin{quote}
-\hypercall{set\_debugreg(int reg, unsigned long value)}
-
-Set debug register {\tt reg} to {\tt value}
-
-\hypercall{get\_debugreg(int reg)}
-
-Return the contents of the debug register {\tt reg}
-\end{quote}
-
-And finally:
-\begin{quote}
-\hypercall{xen\_version(int cmd)}
-
-Request Xen version number.
-\end{quote}
-
-This is useful to ensure that user-space tools are in sync
-with the underlying hypervisor.
-
-\section{Deprecated Hypercalls}
-
-Xen is under constant development and refinement; as such there
-are plans to improve the way in which various pieces of functionality
-are exposed to guest OSes.
-
-\begin{quote}
-\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
-
-Toggle various memory management modes (in particular wrritable page
-tables and superpage support).
-
-\end{quote}
-
-This is likely to be replaced with mode values in the shared
-information page since this is more resilient for resumption
-after migration or checkpoint.
-
-
-
-
-
-
+\include{src/interface/hypercalls}
+%% hypercalls moved to hypercalls.tex
%%
diff -r f619a10fdb76 -r 1e255eacf158 docs/src/interface/hypercalls.tex
--- /dev/null Wed Sep 14 18:08:36 2005
+++ b/docs/src/interface/hypercalls.tex Wed Sep 14 18:22:54 2005
@@ -0,0 +1,524 @@
+
+\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
+
+\chapter{Xen Hypercalls}
+\label{a:hypercalls}
+
+Hypercalls represent the procedural interface to Xen; this appendix
+categorizes and describes the current set of hypercalls.
+
+\section{Invoking Hypercalls}
+
+Hypercalls are invoked in a manner analogous to system calls in a
+conventional operating system; a software interrupt is issued which
+vectors to an entry point within Xen. On x86\_32 machines the
+instruction required is {\tt int \$82}; the (real) IDT is setup so
+that this may only be issued from within ring 1. The particular
+hypercall to be invoked is contained in {\tt EAX} --- a list
+mapping these values to symbolic hypercall names can be found
+in {\tt xen/include/public/xen.h}.
+
+On some occasions a set of hypercalls will be required to carry
+out a higher-level function; a good example is when a guest
+operating wishes to context switch to a new process which
+requires updating various privileged CPU state. As an optimization
+for these cases, there is a generic mechanism to issue a set of
+hypercalls as a batch:
+
+\begin{quote}
+\hypercall{multicall(void *call\_list, int nr\_calls)}
+
+Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
+the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
+call\_list}. Each entry contains the hypercall operation code followed
+by up to 7 word-sized arguments.
+\end{quote}
+
+Note that multicalls are provided purely as an optimization; there is
+no requirement to use them when first porting a guest operating
+system.
+
+
+\section{Virtual CPU Setup}
+
+At start of day, a guest operating system needs to setup the virtual
+CPU it is executing on. This includes installing vectors for the
+virtual IDT so that the guest OS can handle interrupts, page faults,
+etc. However the very first thing a guest OS must setup is a pair
+of hypervisor callbacks: these are the entry points which Xen will
+use when it wishes to notify the guest OS of an occurrence.
+
+\begin{quote}
+\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
+ event\_address, unsigned long failsafe\_selector, unsigned long
+ failsafe\_address) }
+
+Register the normal (``event'') and failsafe callbacks for
+event processing. In each case the code segment selector and
+address within that segment are provided. The selectors must
+have RPL 1; in XenLinux we simply use the kernel's CS for both
+{\tt event\_selector} and {\tt failsafe\_selector}.
+
+The value {\tt event\_address} specifies the address of the guest OSes
+event handling and dispatch routine; the {\tt failsafe\_address}
+specifies a separate entry point which is used only if a fault occurs
+when Xen attempts to use the normal callback.
+\end{quote}
+
+
+After installing the hypervisor callbacks, the guest OS can
+install a `virtual IDT' by using the following hypercall:
+
+\begin{quote}
+\hypercall{set\_trap\_table(trap\_info\_t *table)}
+
+Install one or more entries into the per-domain
+trap handler table (essentially a software version of the IDT).
+Each entry in the array pointed to by {\tt table} includes the
+exception vector number with the corresponding segment selector
+and entry point. Most guest OSes can use the same handlers on
+Xen as when running on the real hardware; an exception is the
+page fault handler (exception vector 14) where a modified
+stack-frame layout is used.
+
+
+\end{quote}
+
+
+
+\section{Scheduling and Timer}
+
+Domains are preemptively scheduled by Xen according to the
+parameters installed by domain 0 (see Section~\ref{s:dom0ops}).
+In addition, however, a domain may choose to explicitly
+control certain behavior with the following hypercall:
+
+\begin{quote}
+\hypercall{sched\_op(unsigned long op)}
+
+Request scheduling operation from hypervisor. The options are: {\it
+yield}, {\it block}, and {\it shutdown}. {\it yield} keeps the
+calling domain runnable but may cause a reschedule if other domains
+are runnable. {\it block} removes the calling domain from the run
+queue and cause is to sleeps until an event is delivered to it. {\it
+shutdown} is used to end the domain's execution; the caller can
+additionally specify whether the domain should reboot, halt or
+suspend.
+\end{quote}
+
+To aid the implementation of a process scheduler within a guest OS,
+Xen provides a virtual programmable timer:
+
+\begin{quote}
+\hypercall{set\_timer\_op(uint64\_t timeout)}
+
+Request a timer event to be sent at the specified system time (time
+in nanoseconds since system boot). The hypercall actually passes the
+64-bit timeout value as a pair of 32-bit values.
+
+\end{quote}
+
+Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op}
+allows block-with-timeout semantics.
+
+
+\section{Page Table Management}
+
+Since guest operating systems have read-only access to their page
+tables, Xen must be involved when making any changes. The following
+multi-purpose hypercall can be used to modify page-table entries,
+update the machine-to-physical mapping table, flush the TLB, install
+a new page-table base pointer, and more.
+
+\begin{quote}
+\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)}
+
+Update the page table for the domain; a set of {\tt count} updates are
+submitted for processing in a batch, with {\tt success\_count} being
+updated to report the number of successful updates.
+
+Each element of {\tt req[]} contains a pointer (address) and value;
+the least significant 2-bits of the pointer are used to distinguish
+the type of update requested as follows:
+\begin{description}
+
+\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
+page table entry to the associated value; Xen will check that the
+update is safe, as described in Chapter~\ref{c:memory}.
+
+\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the
+ machine-to-physical table. The calling domain must own the machine
+ page in question (or be privileged).
+
+\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations.
+The set of additional MMU operations is considerable, and includes
+updating {\tt cr3} (or just re-installing it for a TLB flush),
+flushing the cache, installing a new LDT, or pinning \& unpinning
+page-table pages (to ensure their reference count doesn't drop to zero
+which would require a revalidation of all entries).
+
+Further extended commands are used to deal with granting and
+acquiring page ownership; see Section~\ref{s:idc}.
+
+
+\end{description}
+
+More details on the precise format of all commands can be
+found in {\tt xen/include/public/xen.h}.
+
+
+\end{quote}
+
+Explicitly updating batches of page table entries is extremely
+efficient, but can require a number of alterations to the guest
+OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
+recommended for new OS ports.
+
+Regardless of which page table update mode is being used, however,
+there are some occasions (notably handling a demand page fault) where
+a guest OS will wish to modify exactly one PTE rather than a
+batch. This is catered for by the following:
+
+\begin{quote}
+\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long
+val, \\ unsigned long flags)}
+
+Update the currently installed PTE for the page {\tt page\_nr} to
+{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification
+is safe before applying it. The {\tt flags} determine which kind
+of TLB flush, if any, should follow the update.
+
+\end{quote}
+
+Finally, sufficiently privileged domains may occasionally wish to manipulate
+the pages of others:
+\begin{quote}
+
+\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr,
+unsigned long val, unsigned long flags, uint16\_t domid)}
+
+Identical to {\tt update\_va\_mapping()} save that the pages being
+mapped must belong to the domain {\tt domid}.
+
+\end{quote}
+
+This privileged operation is currently used by backend virtual device
+drivers to safely map pages containing I/O data.
+
+
+
+\section{Segmentation Support}
+
+Xen allows guest OSes to install a custom GDT if they require it;
+this is context switched transparently whenever a domain is
+[de]scheduled. The following hypercall is effectively a
+`safe' version of {\tt lgdt}:
+
+\begin{quote}
+\hypercall{set\_gdt(unsigned long *frame\_list, int entries)}
+
+Install a global descriptor table for a domain; {\tt frame\_list} is
+an array of up to 16 machine page frames within which the GDT resides,
+with {\tt entries} being the actual number of descriptor-entry
+slots. All page frames must be mapped read-only within the guest's
+address space, and the table must be large enough to contain Xen's
+reserved entries (see {\tt xen/include/public/arch-x86\_32.h}).
+
+\end{quote}
+
+Many guest OSes will also wish to install LDTs; this is achieved by
+using {\tt mmu\_update()} with an extended command, passing the
+linear address of the LDT base along with the number of entries. No
+special safety checks are required; Xen needs to perform this task
+simply since {\tt lldt} requires CPL 0.
+
+
+Xen also allows guest operating systems to update just an
+individual segment descriptor in the GDT or LDT:
+
+\begin{quote}
+\hypercall{update\_descriptor(unsigned long ma, unsigned long word1,
+unsigned long word2)}
+
+Update the GDT/LDT entry at machine address {\tt ma}; the new
+8-byte descriptor is stored in {\tt word1} and {\tt word2}.
+Xen performs a number of checks to ensure the descriptor is
+valid.
+
+\end{quote}
+
+Guest OSes can use the above in place of context switching entire
+LDTs (or the GDT) when the number of changing descriptors is small.
+
+\section{Context Switching}
+
+When a guest OS wishes to context switch between two processes,
+it can use the page table and segmentation hypercalls described
+above to perform the the bulk of the privileged work. In addition,
+however, it will need to invoke Xen to switch the kernel (ring 1)
+stack pointer:
+
+\begin{quote}
+\hypercall{stack\_switch(unsigned long ss, unsigned long esp)}
+
+Request kernel stack switch from hypervisor; {\tt ss} is the new
+stack segment, which {\tt esp} is the new stack pointer.
+
+\end{quote}
+
+A final useful hypercall for context switching allows ``lazy''
+save and restore of floating point state:
+
+\begin{quote}
+\hypercall{fpu\_taskswitch(void)}
+
+This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
+control register; this means that the next attempt to use floating
+point will cause a trap which the guest OS can trap. Typically it will
+then save/restore the FP state, and clear the {\tt TS} bit.
+\end{quote}
+
+This is provided as an optimization only; guest OSes can also choose
+to save and restore FP state on all context switches for simplicity.
+
+
+\section{Physical Memory Management}
+
+As mentioned previously, each domain has a maximum and current
+memory allocation. The maximum allocation, set at domain creation
+time, cannot be modified. However a domain can choose to reduce
+and subsequently grow its current allocation by using the
+following call:
+
+\begin{quote}
+\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list,
+ unsigned long nr\_extents, unsigned int extent\_order)}
+
+Increase or decrease current memory allocation (as determined by
+the value of {\tt op}). Each invocation provides a list of
+extents each of which is $2^s$ pages in size,
+where $s$ is the value of {\tt extent\_order}.
+
+\end{quote}
+
+In addition to simply reducing or increasing the current memory
+allocation via a `balloon driver', this call is also useful for
+obtaining contiguous regions of machine memory when required (e.g.
+for certain PCI devices, or if using superpages).
+
+
+\section{Inter-Domain Communication}
+\label{s:idc}
+
+Xen provides a simple asynchronous notification mechanism via
+\emph{event channels}. Each domain has a set of end-points (or
+\emph{ports}) which may be bound to an event source (e.g. a physical
+IRQ, a virtual IRQ, or an port in another domain). When a pair of
+end-points in two different domains are bound together, then a `send'
+operation on one will cause an event to be received by the destination
+domain.
+
+The control and use of event channels involves the following hypercall:
+
+\begin{quote}
+\hypercall{event\_channel\_op(evtchn\_op\_t *op)}
+
+Inter-domain event-channel management; {\tt op} is a discriminated
+union which allows the following 7 operations:
+
+\begin{description}
+
+\item[\it alloc\_unbound:] allocate a free (unbound) local
+ port and prepare for connection from a specified domain.
+\item[\it bind\_virq:] bind a local port to a virtual
+IRQ; any particular VIRQ can be bound to at most one port per domain.
+\item[\it bind\_pirq:] bind a local port to a physical IRQ;
+once more, a given pIRQ can be bound to at most one port per
+domain. Furthermore the calling domain must be sufficiently
+privileged.
+\item[\it bind\_interdomain:] construct an interdomain event
+channel; in general, the target domain must have previously allocated
+an unbound port for this channel, although this can be bypassed by
+privileged domains during domain setup.
+\item[\it close:] close an interdomain event channel.
+\item[\it send:] send an event to the remote end of a
+interdomain event channel.
+\item[\it status:] determine the current status of a local port.
+\end{description}
+
+For more details see
+{\tt xen/include/public/event\_channel.h}.
+
+\end{quote}
+
+Event channels are the fundamental communication primitive between
+Xen domains and seamlessly support SMP. However they provide little
+bandwidth for communication {\sl per se}, and hence are typically
+married with a piece of shared memory to produce effective and
+high-performance inter-domain communication.
+
+Safe sharing of memory pages between guest OSes is carried out by
+granting access on a per page basis to individual domains. This is
+achieved by using the {\tt grant\_table\_op()} hypercall.
+
+\begin{quote}
+\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
+
+Grant or remove access to a particular page to a particular domain.
+
+\end{quote}
+
+This is not currently widely in use by guest operating systems, but
+we intend to integrate support more fully in the near future.
+
+\section{PCI Configuration}
+
+Domains with physical device access (i.e.\ driver domains) receive
+limited access to certain PCI devices (bus address space and
+interrupts). However many guest operating systems attempt to
+determine the PCI configuration by directly access the PCI BIOS,
+which cannot be allowed for safety.
+
+Instead, Xen provides the following hypercall:
+
+\begin{quote}
+\hypercall{physdev\_op(void *physdev\_op)}
+
+Perform a PCI configuration option; depending on the value
+of {\tt physdev\_op} this can be a PCI config read, a PCI config
+write, or a small number of other queries.
+
+\end{quote}
+
+
+For examples of using {\tt physdev\_op()}, see the
+Xen-specific PCI code in the linux sparse tree.
+
+\section{Administrative Operations}
+\label{s:dom0ops}
+
+A large number of control operations are available to a sufficiently
+privileged domain (typically domain 0). These allow the creation and
+management of new domains, for example. A complete list is given
+below: for more details on any or all of these, please see
+{\tt xen/include/public/dom0\_ops.h}
+
+
+\begin{quote}
+\hypercall{dom0\_op(dom0\_op\_t *op)}
+
+Administrative domain operations for domain management. The options are:
+
+\begin{description}
+\item [\it DOM0\_CREATEDOMAIN:] create a new domain
+
+\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run
+queue.
+
+\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
+ once again.
+
+\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated
+with a domain
+
+\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain
+
+\item [\it DOM0\_SCHEDCTL:]
+
+\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
+
+\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain
+
+\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain
+
+\item [\it DOM0\_GETPAGEFRAMEINFO:]
+
+\item [\it DOM0\_GETPAGEFRAMEINFO2:]
+
+\item [\it DOM0\_IOPL:] set I/O privilege level
+
+\item [\it DOM0\_MSR:] read or write model specific registers
+
+\item [\it DOM0\_DEBUG:] interactively invoke the debugger
+
+\item [\it DOM0\_SETTIME:] set system time
+
+\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
+
+\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
+
+\item [\it DOM0\_GETTBUFS:] get information about the size and location of
+ the trace buffers (only on trace-buffer enabled builds)
+
+\item [\it DOM0\_PHYSINFO:] get information about the host machine
+
+\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions
+
+\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
+
+\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
+
+\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain
+
+\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
+
+\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options
+\end{description}
+\end{quote}
+
+Most of the above are best understood by looking at the code
+implementing them (in {\tt xen/common/dom0\_ops.c}) and in
+the user-space tools that use them (mostly in {\tt tools/libxc}).
+
+\section{Debugging Hypercalls}
+
+A few additional hypercalls are mainly useful for debugging:
+
+\begin{quote}
+\hypercall{console\_io(int cmd, int count, char *str)}
+
+Use Xen to interact with the console; operations are:
+
+{\it CONSOLEIO\_write}: Output count characters from buffer str.
+
+{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
+\end{quote}
+
+A pair of hypercalls allows access to the underlying debug registers:
+\begin{quote}
+\hypercall{set\_debugreg(int reg, unsigned long value)}
+
+Set debug register {\tt reg} to {\tt value}
+
+\hypercall{get\_debugreg(int reg)}
+
+Return the contents of the debug register {\tt reg}
+\end{quote}
+
+And finally:
+\begin{quote}
+\hypercall{xen\_version(int cmd)}
+
+Request Xen version number.
+\end{quote}
+
+This is useful to ensure that user-space tools are in sync
+with the underlying hypervisor.
+
+\section{Deprecated Hypercalls}
+
+Xen is under constant development and refinement; as such there
+are plans to improve the way in which various pieces of functionality
+are exposed to guest OSes.
+
+\begin{quote}
+\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
+
+Toggle various memory management modes (in particular wrritable page
+tables and superpage support).
+
+\end{quote}
+
+This is likely to be replaced with mode values in the shared
+information page since this is more resilient for resumption
+after migration or checkpoint.
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2005-09-19 16:22 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-19 16:22 [RESEND] Documentation Patches - 2/4 Robb Romans
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.