All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robb Romans <3r@us.ibm.com>
To: xen-devel <xen-devel@lists.xensource.com>
Subject: [RESEND] Documentation Patches - 2/4
Date: Mon, 19 Sep 2005 11:22:40 -0500	[thread overview]
Message-ID: <200509191122.40532.3r@us.ibm.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 646 bytes --]

Resending.

----------  Forwarded Message  ----------

Subject: [PATCH] Documentation Patches - 2/4
Date: Wednesday 14 September 2005 02:13 pm
From: Robb Romans <3r@us.ibm.com>
To: "xen-devel" <xen-devel@lists.xensource.com>

This patch creates a separate file for hypercalls documentation.

Signed-Off-By: Robb Romans <3r@us.ibm.com>


--
Robb Romans                     (512) 838-0419
Linux Commando                  T/L   678-0419
ARS NA5TT
.-- - ..-. ..--..

-------------------------------------------------------

-- 
Robb Romans                     (512) 838-0419
Linux Commando                  T/L   678-0419
ARS NA5TT
.-- - ..-. ..--..

[-- Attachment #2: 6818-doc-hypercalls.diff --]
[-- Type: text/x-diff, Size: 39185 bytes --]

# HG changeset patch
# User Robb Romans <3r@us.ibm.com>
# Node ID 1e255eacf158c51d5d4efdd7b17c4d1ce2a6be62
# Parent  f619a10fdb762bfc9f061622e6aea1bd6c5e5fb3
Separate hypercalls information into separate file.

Depends on 6817-fix-makefile.diff
Signed-Off-By: Robb Romans <3r@us.ibm.com>

diff -r f619a10fdb76 -r 1e255eacf158 docs/src/interface.tex
--- a/docs/src/interface.tex	Wed Sep 14 18:08:36 2005
+++ b/docs/src/interface.tex	Wed Sep 14 18:22:54 2005
@@ -622,549 +622,10 @@
 
 
 
-
 \appendix
 
-%\newcommand{\hypercall}[1]{\vspace{5mm}{\large\sf #1}}
-
-
-
-
-
-\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
-
-
-
-
-
-
-\chapter{Xen Hypercalls}
-\label{a:hypercalls}
-
-Hypercalls represent the procedural interface to Xen; this appendix 
-categorizes and describes the current set of hypercalls. 
-
-\section{Invoking Hypercalls} 
-
-Hypercalls are invoked in a manner analogous to system calls in a
-conventional operating system; a software interrupt is issued which
-vectors to an entry point within Xen. On x86\_32 machines the
-instruction required is {\tt int \$82}; the (real) IDT is setup so
-that this may only be issued from within ring 1. The particular 
-hypercall to be invoked is contained in {\tt EAX} --- a list 
-mapping these values to symbolic hypercall names can be found 
-in {\tt xen/include/public/xen.h}. 
-
-On some occasions a set of hypercalls will be required to carry
-out a higher-level function; a good example is when a guest 
-operating wishes to context switch to a new process which 
-requires updating various privileged CPU state. As an optimization
-for these cases, there is a generic mechanism to issue a set of 
-hypercalls as a batch: 
-
-\begin{quote}
-\hypercall{multicall(void *call\_list, int nr\_calls)}
-
-Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
-the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
-call\_list}. Each entry contains the hypercall operation code followed
-by up to 7 word-sized arguments.
-\end{quote}
-
-Note that multicalls are provided purely as an optimization; there is
-no requirement to use them when first porting a guest operating
-system.
-
-
-\section{Virtual CPU Setup} 
-
-At start of day, a guest operating system needs to setup the virtual
-CPU it is executing on. This includes installing vectors for the
-virtual IDT so that the guest OS can handle interrupts, page faults,
-etc. However the very first thing a guest OS must setup is a pair 
-of hypervisor callbacks: these are the entry points which Xen will
-use when it wishes to notify the guest OS of an occurrence. 
-
-\begin{quote}
-\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
-  event\_address, unsigned long failsafe\_selector, unsigned long
-  failsafe\_address) }
-
-Register the normal (``event'') and failsafe callbacks for 
-event processing. In each case the code segment selector and 
-address within that segment are provided. The selectors must
-have RPL 1; in XenLinux we simply use the kernel's CS for both 
-{\tt event\_selector} and {\tt failsafe\_selector}.
-
-The value {\tt event\_address} specifies the address of the guest OSes
-event handling and dispatch routine; the {\tt failsafe\_address}
-specifies a separate entry point which is used only if a fault occurs
-when Xen attempts to use the normal callback. 
-\end{quote} 
-
-
-After installing the hypervisor callbacks, the guest OS can 
-install a `virtual IDT' by using the following hypercall: 
-
-\begin{quote} 
-\hypercall{set\_trap\_table(trap\_info\_t *table)} 
-
-Install one or more entries into the per-domain 
-trap handler table (essentially a software version of the IDT). 
-Each entry in the array pointed to by {\tt table} includes the 
-exception vector number with the corresponding segment selector 
-and entry point. Most guest OSes can use the same handlers on 
-Xen as when running on the real hardware; an exception is the 
-page fault handler (exception vector 14) where a modified 
-stack-frame layout is used. 
-
-
-\end{quote} 
-
-
-
-\section{Scheduling and Timer}
-
-Domains are preemptively scheduled by Xen according to the 
-parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 
-In addition, however, a domain may choose to explicitly 
-control certain behavior with the following hypercall: 
-
-\begin{quote} 
-\hypercall{sched\_op(unsigned long op)} 
-
-Request scheduling operation from hypervisor. The options are: {\it
-yield}, {\it block}, and {\it shutdown}.  {\it yield} keeps the
-calling domain runnable but may cause a reschedule if other domains
-are runnable.  {\it block} removes the calling domain from the run
-queue and cause is to sleeps until an event is delivered to it.  {\it
-shutdown} is used to end the domain's execution; the caller can
-additionally specify whether the domain should reboot, halt or
-suspend.
-\end{quote} 
-
-To aid the implementation of a process scheduler within a guest OS,
-Xen provides a virtual programmable timer:
-
-\begin{quote}
-\hypercall{set\_timer\_op(uint64\_t timeout)} 
-
-Request a timer event to be sent at the specified system time (time 
-in nanoseconds since system boot). The hypercall actually passes the 
-64-bit timeout value as a pair of 32-bit values. 
-
-\end{quote} 
-
-Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} 
-allows block-with-timeout semantics. 
-
-
-\section{Page Table Management} 
-
-Since guest operating systems have read-only access to their page 
-tables, Xen must be involved when making any changes. The following
-multi-purpose hypercall can be used to modify page-table entries, 
-update the machine-to-physical mapping table, flush the TLB, install 
-a new page-table base pointer, and more.
-
-\begin{quote} 
-\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 
-
-Update the page table for the domain; a set of {\tt count} updates are
-submitted for processing in a batch, with {\tt success\_count} being 
-updated to report the number of successful updates.  
-
-Each element of {\tt req[]} contains a pointer (address) and value; 
-the least significant 2-bits of the pointer are used to distinguish 
-the type of update requested as follows:
-\begin{description} 
-
-\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
-page table entry to the associated value; Xen will check that the
-update is safe, as described in Chapter~\ref{c:memory}.
-
-\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the
-  machine-to-physical table. The calling domain must own the machine
-  page in question (or be privileged).
-
-\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations.
-The set of additional MMU operations is considerable, and includes
-updating {\tt cr3} (or just re-installing it for a TLB flush),
-flushing the cache, installing a new LDT, or pinning \& unpinning
-page-table pages (to ensure their reference count doesn't drop to zero
-which would require a revalidation of all entries).
-
-Further extended commands are used to deal with granting and 
-acquiring page ownership; see Section~\ref{s:idc}. 
-
-
-\end{description}
-
-More details on the precise format of all commands can be 
-found in {\tt xen/include/public/xen.h}. 
-
-
-\end{quote}
-
-Explicitly updating batches of page table entries is extremely
-efficient, but can require a number of alterations to the guest
-OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
-recommended for new OS ports.
-
-Regardless of which page table update mode is being used, however,
-there are some occasions (notably handling a demand page fault) where
-a guest OS will wish to modify exactly one PTE rather than a
-batch. This is catered for by the following:
-
-\begin{quote} 
-\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long
-val, \\ unsigned long flags)}
-
-Update the currently installed PTE for the page {\tt page\_nr} to 
-{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification 
-is safe before applying it. The {\tt flags} determine which kind
-of TLB flush, if any, should follow the update. 
-
-\end{quote} 
-
-Finally, sufficiently privileged domains may occasionally wish to manipulate 
-the pages of others: 
-\begin{quote}
-
-\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr,
-unsigned long val, unsigned long flags, uint16\_t domid)}
-
-Identical to {\tt update\_va\_mapping()} save that the pages being
-mapped must belong to the domain {\tt domid}. 
-
-\end{quote}
-
-This privileged operation is currently used by backend virtual device
-drivers to safely map pages containing I/O data. 
-
-
-
-\section{Segmentation Support}
-
-Xen allows guest OSes to install a custom GDT if they require it; 
-this is context switched transparently whenever a domain is 
-[de]scheduled.  The following hypercall is effectively a 
-`safe' version of {\tt lgdt}: 
-
-\begin{quote}
-\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 
-
-Install a global descriptor table for a domain; {\tt frame\_list} is
-an array of up to 16 machine page frames within which the GDT resides,
-with {\tt entries} being the actual number of descriptor-entry
-slots. All page frames must be mapped read-only within the guest's
-address space, and the table must be large enough to contain Xen's
-reserved entries (see {\tt xen/include/public/arch-x86\_32.h}).
-
-\end{quote}
-
-Many guest OSes will also wish to install LDTs; this is achieved by
-using {\tt mmu\_update()} with an extended command, passing the
-linear address of the LDT base along with the number of entries. No
-special safety checks are required; Xen needs to perform this task
-simply since {\tt lldt} requires CPL 0.
-
-
-Xen also allows guest operating systems to update just an 
-individual segment descriptor in the GDT or LDT:  
-
-\begin{quote}
-\hypercall{update\_descriptor(unsigned long ma, unsigned long word1,
-unsigned long word2)}
-
-Update the GDT/LDT entry at machine address {\tt ma}; the new
-8-byte descriptor is stored in {\tt word1} and {\tt word2}.
-Xen performs a number of checks to ensure the descriptor is 
-valid. 
-
-\end{quote}
-
-Guest OSes can use the above in place of context switching entire 
-LDTs (or the GDT) when the number of changing descriptors is small. 
-
-\section{Context Switching} 
-
-When a guest OS wishes to context switch between two processes, 
-it can use the page table and segmentation hypercalls described
-above to perform the the bulk of the privileged work. In addition, 
-however, it will need to invoke Xen to switch the kernel (ring 1) 
-stack pointer: 
-
-\begin{quote} 
-\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 
-
-Request kernel stack switch from hypervisor; {\tt ss} is the new 
-stack segment, which {\tt esp} is the new stack pointer. 
-
-\end{quote} 
-
-A final useful hypercall for context switching allows ``lazy'' 
-save and restore of floating point state: 
-
-\begin{quote}
-\hypercall{fpu\_taskswitch(void)} 
-
-This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
-control register; this means that the next attempt to use floating
-point will cause a trap which the guest OS can trap. Typically it will
-then save/restore the FP state, and clear the {\tt TS} bit. 
-\end{quote} 
-
-This is provided as an optimization only; guest OSes can also choose
-to save and restore FP state on all context switches for simplicity. 
-
-
-\section{Physical Memory Management}
-
-As mentioned previously, each domain has a maximum and current 
-memory allocation. The maximum allocation, set at domain creation 
-time, cannot be modified. However a domain can choose to reduce 
-and subsequently grow its current allocation by using the
-following call: 
-
-\begin{quote} 
-\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list,
-  unsigned long nr\_extents, unsigned int extent\_order)}
-
-Increase or decrease current memory allocation (as determined by 
-the value of {\tt op}). Each invocation provides a list of 
-extents each of which is $2^s$ pages in size, 
-where $s$ is the value of {\tt extent\_order}. 
-
-\end{quote} 
-
-In addition to simply reducing or increasing the current memory
-allocation via a `balloon driver', this call is also useful for 
-obtaining contiguous regions of machine memory when required (e.g. 
-for certain PCI devices, or if using superpages).  
-
-
-\section{Inter-Domain Communication}
-\label{s:idc} 
-
-Xen provides a simple asynchronous notification mechanism via
-\emph{event channels}. Each domain has a set of end-points (or
-\emph{ports}) which may be bound to an event source (e.g. a physical
-IRQ, a virtual IRQ, or an port in another domain). When a pair of
-end-points in two different domains are bound together, then a `send'
-operation on one will cause an event to be received by the destination
-domain.
-
-The control and use of event channels involves the following hypercall: 
-
-\begin{quote}
-\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 
-
-Inter-domain event-channel management; {\tt op} is a discriminated 
-union which allows the following 7 operations: 
-
-\begin{description} 
-
-\item[\it alloc\_unbound:] allocate a free (unbound) local
-  port and prepare for connection from a specified domain. 
-\item[\it bind\_virq:] bind a local port to a virtual 
-IRQ; any particular VIRQ can be bound to at most one port per domain. 
-\item[\it bind\_pirq:] bind a local port to a physical IRQ;
-once more, a given pIRQ can be bound to at most one port per
-domain. Furthermore the calling domain must be sufficiently
-privileged.
-\item[\it bind\_interdomain:] construct an interdomain event 
-channel; in general, the target domain must have previously allocated 
-an unbound port for this channel, although this can be bypassed by 
-privileged domains during domain setup. 
-\item[\it close:] close an interdomain event channel. 
-\item[\it send:] send an event to the remote end of a 
-interdomain event channel. 
-\item[\it status:] determine the current status of a local port. 
-\end{description} 
-
-For more details see
-{\tt xen/include/public/event\_channel.h}. 
-
-\end{quote} 
-
-Event channels are the fundamental communication primitive between 
-Xen domains and seamlessly support SMP. However they provide little
-bandwidth for communication {\sl per se}, and hence are typically 
-married with a piece of shared memory to produce effective and 
-high-performance inter-domain communication. 
-
-Safe sharing of memory pages between guest OSes is carried out by
-granting access on a per page basis to individual domains. This is
-achieved by using the {\tt grant\_table\_op()} hypercall.
-
-\begin{quote}
-\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
-
-Grant or remove access to a particular page to a particular domain. 
-
-\end{quote} 
-
-This is not currently widely in use by guest operating systems, but 
-we intend to integrate support more fully in the near future. 
-
-\section{PCI Configuration} 
-
-Domains with physical device access (i.e.\ driver domains) receive
-limited access to certain PCI devices (bus address space and
-interrupts). However many guest operating systems attempt to 
-determine the PCI configuration by directly access the PCI BIOS, 
-which cannot be allowed for safety. 
-
-Instead, Xen provides the following hypercall: 
-
-\begin{quote}
-\hypercall{physdev\_op(void *physdev\_op)}
-
-Perform a PCI configuration option; depending on the value 
-of {\tt physdev\_op} this can be a PCI config read, a PCI config 
-write, or a small number of other queries. 
-
-\end{quote} 
-
-
-For examples of using {\tt physdev\_op()}, see the 
-Xen-specific PCI code in the linux sparse tree. 
-
-\section{Administrative Operations}
-\label{s:dom0ops}
-
-A large number of control operations are available to a sufficiently
-privileged domain (typically domain 0). These allow the creation and
-management of new domains, for example. A complete list is given 
-below: for more details on any or all of these, please see 
-{\tt xen/include/public/dom0\_ops.h} 
-
-
-\begin{quote}
-\hypercall{dom0\_op(dom0\_op\_t *op)} 
-
-Administrative domain operations for domain management. The options are:
-
-\begin{description} 
-\item [\it DOM0\_CREATEDOMAIN:] create a new domain
-
-\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 
-queue. 
-
-\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
-  once again. 
-
-\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated
-with a domain
-
-\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain
-
-\item [\it DOM0\_SCHEDCTL:]
-
-\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
-
-\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain
-
-\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain
-
-\item [\it DOM0\_GETPAGEFRAMEINFO:] 
-
-\item [\it DOM0\_GETPAGEFRAMEINFO2:]
-
-\item [\it DOM0\_IOPL:] set I/O privilege level
-
-\item [\it DOM0\_MSR:] read or write model specific registers
-
-\item [\it DOM0\_DEBUG:] interactively invoke the debugger
-
-\item [\it DOM0\_SETTIME:] set system time
-
-\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
-
-\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
-
-\item [\it DOM0\_GETTBUFS:] get information about the size and location of
-                      the trace buffers (only on trace-buffer enabled builds)
-
-\item [\it DOM0\_PHYSINFO:] get information about the host machine
-
-\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions
-
-\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
-
-\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
-
-\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain
-
-\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
-
-\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options
-\end{description} 
-\end{quote} 
-
-Most of the above are best understood by looking at the code 
-implementing them (in {\tt xen/common/dom0\_ops.c}) and in 
-the user-space tools that use them (mostly in {\tt tools/libxc}). 
-
-\section{Debugging Hypercalls} 
-
-A few additional hypercalls are mainly useful for debugging: 
-
-\begin{quote} 
-\hypercall{console\_io(int cmd, int count, char *str)}
-
-Use Xen to interact with the console; operations are:
-
-{\it CONSOLEIO\_write}: Output count characters from buffer str.
-
-{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
-\end{quote} 
-
-A pair of hypercalls allows access to the underlying debug registers: 
-\begin{quote}
-\hypercall{set\_debugreg(int reg, unsigned long value)}
-
-Set debug register {\tt reg} to {\tt value} 
-
-\hypercall{get\_debugreg(int reg)}
-
-Return the contents of the debug register {\tt reg}
-\end{quote}
-
-And finally: 
-\begin{quote}
-\hypercall{xen\_version(int cmd)}
-
-Request Xen version number.
-\end{quote} 
-
-This is useful to ensure that user-space tools are in sync 
-with the underlying hypervisor. 
-
-\section{Deprecated Hypercalls}
-
-Xen is under constant development and refinement; as such there 
-are plans to improve the way in which various pieces of functionality 
-are exposed to guest OSes. 
-
-\begin{quote} 
-\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
-
-Toggle various memory management modes (in particular wrritable page
-tables and superpage support). 
-
-\end{quote} 
-
-This is likely to be replaced with mode values in the shared 
-information page since this is more resilient for resumption 
-after migration or checkpoint. 
-
-
-
-
-
-
+\include{src/interface/hypercalls}
+%% hypercalls moved to hypercalls.tex
 
 
 %% 
diff -r f619a10fdb76 -r 1e255eacf158 docs/src/interface/hypercalls.tex
--- /dev/null	Wed Sep 14 18:08:36 2005
+++ b/docs/src/interface/hypercalls.tex	Wed Sep 14 18:22:54 2005
@@ -0,0 +1,524 @@
+
+\newcommand{\hypercall}[1]{\vspace{2mm}{\sf #1}}
+
+\chapter{Xen Hypercalls}
+\label{a:hypercalls}
+
+Hypercalls represent the procedural interface to Xen; this appendix 
+categorizes and describes the current set of hypercalls. 
+
+\section{Invoking Hypercalls} 
+
+Hypercalls are invoked in a manner analogous to system calls in a
+conventional operating system; a software interrupt is issued which
+vectors to an entry point within Xen. On x86\_32 machines the
+instruction required is {\tt int \$82}; the (real) IDT is setup so
+that this may only be issued from within ring 1. The particular 
+hypercall to be invoked is contained in {\tt EAX} --- a list 
+mapping these values to symbolic hypercall names can be found 
+in {\tt xen/include/public/xen.h}. 
+
+On some occasions a set of hypercalls will be required to carry
+out a higher-level function; a good example is when a guest 
+operating wishes to context switch to a new process which 
+requires updating various privileged CPU state. As an optimization
+for these cases, there is a generic mechanism to issue a set of 
+hypercalls as a batch: 
+
+\begin{quote}
+\hypercall{multicall(void *call\_list, int nr\_calls)}
+
+Execute a series of hypervisor calls; {\tt nr\_calls} is the length of
+the array of {\tt multicall\_entry\_t} structures pointed to be {\tt
+call\_list}. Each entry contains the hypercall operation code followed
+by up to 7 word-sized arguments.
+\end{quote}
+
+Note that multicalls are provided purely as an optimization; there is
+no requirement to use them when first porting a guest operating
+system.
+
+
+\section{Virtual CPU Setup} 
+
+At start of day, a guest operating system needs to setup the virtual
+CPU it is executing on. This includes installing vectors for the
+virtual IDT so that the guest OS can handle interrupts, page faults,
+etc. However the very first thing a guest OS must setup is a pair 
+of hypervisor callbacks: these are the entry points which Xen will
+use when it wishes to notify the guest OS of an occurrence. 
+
+\begin{quote}
+\hypercall{set\_callbacks(unsigned long event\_selector, unsigned long
+  event\_address, unsigned long failsafe\_selector, unsigned long
+  failsafe\_address) }
+
+Register the normal (``event'') and failsafe callbacks for 
+event processing. In each case the code segment selector and 
+address within that segment are provided. The selectors must
+have RPL 1; in XenLinux we simply use the kernel's CS for both 
+{\tt event\_selector} and {\tt failsafe\_selector}.
+
+The value {\tt event\_address} specifies the address of the guest OSes
+event handling and dispatch routine; the {\tt failsafe\_address}
+specifies a separate entry point which is used only if a fault occurs
+when Xen attempts to use the normal callback. 
+\end{quote} 
+
+
+After installing the hypervisor callbacks, the guest OS can 
+install a `virtual IDT' by using the following hypercall: 
+
+\begin{quote} 
+\hypercall{set\_trap\_table(trap\_info\_t *table)} 
+
+Install one or more entries into the per-domain 
+trap handler table (essentially a software version of the IDT). 
+Each entry in the array pointed to by {\tt table} includes the 
+exception vector number with the corresponding segment selector 
+and entry point. Most guest OSes can use the same handlers on 
+Xen as when running on the real hardware; an exception is the 
+page fault handler (exception vector 14) where a modified 
+stack-frame layout is used. 
+
+
+\end{quote} 
+
+
+
+\section{Scheduling and Timer}
+
+Domains are preemptively scheduled by Xen according to the 
+parameters installed by domain 0 (see Section~\ref{s:dom0ops}). 
+In addition, however, a domain may choose to explicitly 
+control certain behavior with the following hypercall: 
+
+\begin{quote} 
+\hypercall{sched\_op(unsigned long op)} 
+
+Request scheduling operation from hypervisor. The options are: {\it
+yield}, {\it block}, and {\it shutdown}.  {\it yield} keeps the
+calling domain runnable but may cause a reschedule if other domains
+are runnable.  {\it block} removes the calling domain from the run
+queue and cause is to sleeps until an event is delivered to it.  {\it
+shutdown} is used to end the domain's execution; the caller can
+additionally specify whether the domain should reboot, halt or
+suspend.
+\end{quote} 
+
+To aid the implementation of a process scheduler within a guest OS,
+Xen provides a virtual programmable timer:
+
+\begin{quote}
+\hypercall{set\_timer\_op(uint64\_t timeout)} 
+
+Request a timer event to be sent at the specified system time (time 
+in nanoseconds since system boot). The hypercall actually passes the 
+64-bit timeout value as a pair of 32-bit values. 
+
+\end{quote} 
+
+Note that calling {\tt set\_timer\_op()} prior to {\tt sched\_op} 
+allows block-with-timeout semantics. 
+
+
+\section{Page Table Management} 
+
+Since guest operating systems have read-only access to their page 
+tables, Xen must be involved when making any changes. The following
+multi-purpose hypercall can be used to modify page-table entries, 
+update the machine-to-physical mapping table, flush the TLB, install 
+a new page-table base pointer, and more.
+
+\begin{quote} 
+\hypercall{mmu\_update(mmu\_update\_t *req, int count, int *success\_count)} 
+
+Update the page table for the domain; a set of {\tt count} updates are
+submitted for processing in a batch, with {\tt success\_count} being 
+updated to report the number of successful updates.  
+
+Each element of {\tt req[]} contains a pointer (address) and value; 
+the least significant 2-bits of the pointer are used to distinguish 
+the type of update requested as follows:
+\begin{description} 
+
+\item[\it MMU\_NORMAL\_PT\_UPDATE:] update a page directory entry or
+page table entry to the associated value; Xen will check that the
+update is safe, as described in Chapter~\ref{c:memory}.
+
+\item[\it MMU\_MACHPHYS\_UPDATE:] update an entry in the
+  machine-to-physical table. The calling domain must own the machine
+  page in question (or be privileged).
+
+\item[\it MMU\_EXTENDED\_COMMAND:] perform additional MMU operations.
+The set of additional MMU operations is considerable, and includes
+updating {\tt cr3} (or just re-installing it for a TLB flush),
+flushing the cache, installing a new LDT, or pinning \& unpinning
+page-table pages (to ensure their reference count doesn't drop to zero
+which would require a revalidation of all entries).
+
+Further extended commands are used to deal with granting and 
+acquiring page ownership; see Section~\ref{s:idc}. 
+
+
+\end{description}
+
+More details on the precise format of all commands can be 
+found in {\tt xen/include/public/xen.h}. 
+
+
+\end{quote}
+
+Explicitly updating batches of page table entries is extremely
+efficient, but can require a number of alterations to the guest
+OS. Using the writable page table mode (Chapter~\ref{c:memory}) is
+recommended for new OS ports.
+
+Regardless of which page table update mode is being used, however,
+there are some occasions (notably handling a demand page fault) where
+a guest OS will wish to modify exactly one PTE rather than a
+batch. This is catered for by the following:
+
+\begin{quote} 
+\hypercall{update\_va\_mapping(unsigned long page\_nr, unsigned long
+val, \\ unsigned long flags)}
+
+Update the currently installed PTE for the page {\tt page\_nr} to 
+{\tt val}. As with {\tt mmu\_update()}, Xen checks the modification 
+is safe before applying it. The {\tt flags} determine which kind
+of TLB flush, if any, should follow the update. 
+
+\end{quote} 
+
+Finally, sufficiently privileged domains may occasionally wish to manipulate 
+the pages of others: 
+\begin{quote}
+
+\hypercall{update\_va\_mapping\_otherdomain(unsigned long page\_nr,
+unsigned long val, unsigned long flags, uint16\_t domid)}
+
+Identical to {\tt update\_va\_mapping()} save that the pages being
+mapped must belong to the domain {\tt domid}. 
+
+\end{quote}
+
+This privileged operation is currently used by backend virtual device
+drivers to safely map pages containing I/O data. 
+
+
+
+\section{Segmentation Support}
+
+Xen allows guest OSes to install a custom GDT if they require it; 
+this is context switched transparently whenever a domain is 
+[de]scheduled.  The following hypercall is effectively a 
+`safe' version of {\tt lgdt}: 
+
+\begin{quote}
+\hypercall{set\_gdt(unsigned long *frame\_list, int entries)} 
+
+Install a global descriptor table for a domain; {\tt frame\_list} is
+an array of up to 16 machine page frames within which the GDT resides,
+with {\tt entries} being the actual number of descriptor-entry
+slots. All page frames must be mapped read-only within the guest's
+address space, and the table must be large enough to contain Xen's
+reserved entries (see {\tt xen/include/public/arch-x86\_32.h}).
+
+\end{quote}
+
+Many guest OSes will also wish to install LDTs; this is achieved by
+using {\tt mmu\_update()} with an extended command, passing the
+linear address of the LDT base along with the number of entries. No
+special safety checks are required; Xen needs to perform this task
+simply since {\tt lldt} requires CPL 0.
+
+
+Xen also allows guest operating systems to update just an 
+individual segment descriptor in the GDT or LDT:  
+
+\begin{quote}
+\hypercall{update\_descriptor(unsigned long ma, unsigned long word1,
+unsigned long word2)}
+
+Update the GDT/LDT entry at machine address {\tt ma}; the new
+8-byte descriptor is stored in {\tt word1} and {\tt word2}.
+Xen performs a number of checks to ensure the descriptor is 
+valid. 
+
+\end{quote}
+
+Guest OSes can use the above in place of context switching entire 
+LDTs (or the GDT) when the number of changing descriptors is small. 
+
+\section{Context Switching} 
+
+When a guest OS wishes to context switch between two processes, 
+it can use the page table and segmentation hypercalls described
+above to perform the the bulk of the privileged work. In addition, 
+however, it will need to invoke Xen to switch the kernel (ring 1) 
+stack pointer: 
+
+\begin{quote} 
+\hypercall{stack\_switch(unsigned long ss, unsigned long esp)} 
+
+Request kernel stack switch from hypervisor; {\tt ss} is the new 
+stack segment, which {\tt esp} is the new stack pointer. 
+
+\end{quote} 
+
+A final useful hypercall for context switching allows ``lazy'' 
+save and restore of floating point state: 
+
+\begin{quote}
+\hypercall{fpu\_taskswitch(void)} 
+
+This call instructs Xen to set the {\tt TS} bit in the {\tt cr0}
+control register; this means that the next attempt to use floating
+point will cause a trap which the guest OS can trap. Typically it will
+then save/restore the FP state, and clear the {\tt TS} bit. 
+\end{quote} 
+
+This is provided as an optimization only; guest OSes can also choose
+to save and restore FP state on all context switches for simplicity. 
+
+
+\section{Physical Memory Management}
+
+As mentioned previously, each domain has a maximum and current 
+memory allocation. The maximum allocation, set at domain creation 
+time, cannot be modified. However a domain can choose to reduce 
+and subsequently grow its current allocation by using the
+following call: 
+
+\begin{quote} 
+\hypercall{dom\_mem\_op(unsigned int op, unsigned long *extent\_list,
+  unsigned long nr\_extents, unsigned int extent\_order)}
+
+Increase or decrease current memory allocation (as determined by 
+the value of {\tt op}). Each invocation provides a list of 
+extents each of which is $2^s$ pages in size, 
+where $s$ is the value of {\tt extent\_order}. 
+
+\end{quote} 
+
+In addition to simply reducing or increasing the current memory
+allocation via a `balloon driver', this call is also useful for 
+obtaining contiguous regions of machine memory when required (e.g. 
+for certain PCI devices, or if using superpages).  
+
+
+\section{Inter-Domain Communication}
+\label{s:idc} 
+
+Xen provides a simple asynchronous notification mechanism via
+\emph{event channels}. Each domain has a set of end-points (or
+\emph{ports}) which may be bound to an event source (e.g. a physical
+IRQ, a virtual IRQ, or an port in another domain). When a pair of
+end-points in two different domains are bound together, then a `send'
+operation on one will cause an event to be received by the destination
+domain.
+
+The control and use of event channels involves the following hypercall: 
+
+\begin{quote}
+\hypercall{event\_channel\_op(evtchn\_op\_t *op)} 
+
+Inter-domain event-channel management; {\tt op} is a discriminated 
+union which allows the following 7 operations: 
+
+\begin{description} 
+
+\item[\it alloc\_unbound:] allocate a free (unbound) local
+  port and prepare for connection from a specified domain. 
+\item[\it bind\_virq:] bind a local port to a virtual 
+IRQ; any particular VIRQ can be bound to at most one port per domain. 
+\item[\it bind\_pirq:] bind a local port to a physical IRQ;
+once more, a given pIRQ can be bound to at most one port per
+domain. Furthermore the calling domain must be sufficiently
+privileged.
+\item[\it bind\_interdomain:] construct an interdomain event 
+channel; in general, the target domain must have previously allocated 
+an unbound port for this channel, although this can be bypassed by 
+privileged domains during domain setup. 
+\item[\it close:] close an interdomain event channel. 
+\item[\it send:] send an event to the remote end of a 
+interdomain event channel. 
+\item[\it status:] determine the current status of a local port. 
+\end{description} 
+
+For more details see
+{\tt xen/include/public/event\_channel.h}. 
+
+\end{quote} 
+
+Event channels are the fundamental communication primitive between 
+Xen domains and seamlessly support SMP. However they provide little
+bandwidth for communication {\sl per se}, and hence are typically 
+married with a piece of shared memory to produce effective and 
+high-performance inter-domain communication. 
+
+Safe sharing of memory pages between guest OSes is carried out by
+granting access on a per page basis to individual domains. This is
+achieved by using the {\tt grant\_table\_op()} hypercall.
+
+\begin{quote}
+\hypercall{grant\_table\_op(unsigned int cmd, void *uop, unsigned int count)}
+
+Grant or remove access to a particular page to a particular domain. 
+
+\end{quote} 
+
+This is not currently widely in use by guest operating systems, but 
+we intend to integrate support more fully in the near future. 
+
+\section{PCI Configuration} 
+
+Domains with physical device access (i.e.\ driver domains) receive
+limited access to certain PCI devices (bus address space and
+interrupts). However many guest operating systems attempt to 
+determine the PCI configuration by directly access the PCI BIOS, 
+which cannot be allowed for safety. 
+
+Instead, Xen provides the following hypercall: 
+
+\begin{quote}
+\hypercall{physdev\_op(void *physdev\_op)}
+
+Perform a PCI configuration option; depending on the value 
+of {\tt physdev\_op} this can be a PCI config read, a PCI config 
+write, or a small number of other queries. 
+
+\end{quote} 
+
+
+For examples of using {\tt physdev\_op()}, see the 
+Xen-specific PCI code in the linux sparse tree. 
+
+\section{Administrative Operations}
+\label{s:dom0ops}
+
+A large number of control operations are available to a sufficiently
+privileged domain (typically domain 0). These allow the creation and
+management of new domains, for example. A complete list is given 
+below: for more details on any or all of these, please see 
+{\tt xen/include/public/dom0\_ops.h} 
+
+
+\begin{quote}
+\hypercall{dom0\_op(dom0\_op\_t *op)} 
+
+Administrative domain operations for domain management. The options are:
+
+\begin{description} 
+\item [\it DOM0\_CREATEDOMAIN:] create a new domain
+
+\item [\it DOM0\_PAUSEDOMAIN:] remove a domain from the scheduler run 
+queue. 
+
+\item [\it DOM0\_UNPAUSEDOMAIN:] mark a paused domain as schedulable
+  once again. 
+
+\item [\it DOM0\_DESTROYDOMAIN:] deallocate all resources associated
+with a domain
+
+\item [\it DOM0\_GETMEMLIST:] get list of pages used by the domain
+
+\item [\it DOM0\_SCHEDCTL:]
+
+\item [\it DOM0\_ADJUSTDOM:] adjust scheduling priorities for domain
+
+\item [\it DOM0\_BUILDDOMAIN:] do final guest OS setup for domain
+
+\item [\it DOM0\_GETDOMAINFO:] get statistics about the domain
+
+\item [\it DOM0\_GETPAGEFRAMEINFO:] 
+
+\item [\it DOM0\_GETPAGEFRAMEINFO2:]
+
+\item [\it DOM0\_IOPL:] set I/O privilege level
+
+\item [\it DOM0\_MSR:] read or write model specific registers
+
+\item [\it DOM0\_DEBUG:] interactively invoke the debugger
+
+\item [\it DOM0\_SETTIME:] set system time
+
+\item [\it DOM0\_READCONSOLE:] read console content from hypervisor buffer ring
+
+\item [\it DOM0\_PINCPUDOMAIN:] pin domain to a particular CPU
+
+\item [\it DOM0\_GETTBUFS:] get information about the size and location of
+                      the trace buffers (only on trace-buffer enabled builds)
+
+\item [\it DOM0\_PHYSINFO:] get information about the host machine
+
+\item [\it DOM0\_PCIDEV\_ACCESS:] modify PCI device access permissions
+
+\item [\it DOM0\_SCHED\_ID:] get the ID of the current Xen scheduler
+
+\item [\it DOM0\_SHADOW\_CONTROL:] switch between shadow page-table modes
+
+\item [\it DOM0\_SETDOMAININITIALMEM:] set initial memory allocation of a domain
+
+\item [\it DOM0\_SETDOMAINMAXMEM:] set maximum memory allocation of a domain
+
+\item [\it DOM0\_SETDOMAINVMASSIST:] set domain VM assist options
+\end{description} 
+\end{quote} 
+
+Most of the above are best understood by looking at the code 
+implementing them (in {\tt xen/common/dom0\_ops.c}) and in 
+the user-space tools that use them (mostly in {\tt tools/libxc}). 
+
+\section{Debugging Hypercalls} 
+
+A few additional hypercalls are mainly useful for debugging: 
+
+\begin{quote} 
+\hypercall{console\_io(int cmd, int count, char *str)}
+
+Use Xen to interact with the console; operations are:
+
+{\it CONSOLEIO\_write}: Output count characters from buffer str.
+
+{\it CONSOLEIO\_read}: Input at most count characters into buffer str.
+\end{quote} 
+
+A pair of hypercalls allows access to the underlying debug registers: 
+\begin{quote}
+\hypercall{set\_debugreg(int reg, unsigned long value)}
+
+Set debug register {\tt reg} to {\tt value} 
+
+\hypercall{get\_debugreg(int reg)}
+
+Return the contents of the debug register {\tt reg}
+\end{quote}
+
+And finally: 
+\begin{quote}
+\hypercall{xen\_version(int cmd)}
+
+Request Xen version number.
+\end{quote} 
+
+This is useful to ensure that user-space tools are in sync 
+with the underlying hypervisor. 
+
+\section{Deprecated Hypercalls}
+
+Xen is under constant development and refinement; as such there 
+are plans to improve the way in which various pieces of functionality 
+are exposed to guest OSes. 
+
+\begin{quote} 
+\hypercall{vm\_assist(unsigned int cmd, unsigned int type)}
+
+Toggle various memory management modes (in particular wrritable page
+tables and superpage support). 
+
+\end{quote} 
+
+This is likely to be replaced with mode values in the shared 
+information page since this is more resilient for resumption 
+after migration or checkpoint. 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

                 reply	other threads:[~2005-09-19 16:22 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200509191122.40532.3r@us.ibm.com \
    --to=3r@us.ibm.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.