* [PATCH 0 of 5] docs: x86 PV MMU related functions
@ 2012-11-02 11:18 Ian Campbell
2012-11-02 11:18 ` [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain) Ian Campbell
` (6 more replies)
0 siblings, 7 replies; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 11:18 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
The following series adds some documentation for the PV MU related
hypercalls. For the most part this is just a case of adding a suitable
prototype and marking things up so they show up in the generated docs.
I also have a draft of a wiki article on the subject which references
the information in the public headers which I hope to post soon.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain)
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
@ 2012-11-02 11:18 ` Ian Campbell
2012-11-02 11:33 ` David Vrabel
2012-11-02 11:18 ` [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op Ian Campbell
` (5 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 11:18 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1351854956 -3600
# Node ID e8e1191aef208fbe2f4de61aad3e4fd789333646
# Parent 37a8946eeb9db8b5eafc1c75aded006ad5322af8
docs: document HYPERVISOR_update_va_mapping(_other_domain)
Mark-up for inclusion of generated docs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r 37a8946eeb9d -r e8e1191aef20 xen/include/public/xen.h
--- a/xen/include/public/xen.h Fri Oct 26 16:09:29 2012 +0100
+++ b/xen/include/public/xen.h Fri Nov 02 12:15:56 2012 +0100
@@ -423,9 +423,25 @@ typedef struct mmuext_op mmuext_op_t;
DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
#endif
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_update_va_mapping(unsigned long va, u64 val,
+ * ` enum uvm_flags flags)
+ * `
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_update_va_mapping_otherdomain(unsigned long va, u64 val,
+ * ` enum uvm_flags flags,
+ * ` domid_t domid)
+ * `
+ * ` @va: The virtual address whose mapping we want to change
+ * ` @val: The new page table entry
+ * ` @flags: Control TLB flushes
+ */
+*/
/* These are passed as 'flags' to update_va_mapping. They can be ORed. */
/* When specifying UVMF_MULTI, also OR in a pointer to a CPU bitmap. */
/* UVMF_LOCAL is merely UVMF_MULTI with a NULL bitmap pointer. */
+/* ` enum uvm_flags { */
#define UVMF_NONE (0UL<<0) /* No flushing at all. */
#define UVMF_TLB_FLUSH (1UL<<0) /* Flush entire TLB(s). */
#define UVMF_INVLPG (2UL<<0) /* Flush only one entry. */
@@ -433,6 +449,7 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
#define UVMF_MULTI (0UL<<2) /* Flush subset of TLBs. */
#define UVMF_LOCAL (0UL<<2) /* Flush local TLB. */
#define UVMF_ALL (1UL<<2) /* Flush all TLBs. */
+/* ` } */
/*
* Commands to HYPERVISOR_console_io().
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
2012-11-02 11:18 ` [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain) Ian Campbell
@ 2012-11-02 11:18 ` Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout Ian Campbell
` (4 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 11:18 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1351854990 -3600
# Node ID e6880358eba346c386b9faaa30a5489df73a06a0
# Parent e8e1191aef208fbe2f4de61aad3e4fd789333646
docs: Document HYPERVISOR_mmuext_op
Mark-up for inclusion of generated docs.
Remove some trailing whitespace.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r e8e1191aef20 -r e6880358eba3 xen/include/public/xen.h
--- a/xen/include/public/xen.h Fri Nov 02 12:15:56 2012 +0100
+++ b/xen/include/public/xen.h Fri Nov 02 12:16:30 2012 +0100
@@ -319,48 +319,54 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
/*
* MMU EXTENDED OPERATIONS
- *
- * HYPERVISOR_mmuext_op() accepts a list of mmuext_op structures.
+ *
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_mmuext_op(mmuext_op_t uops[],
+ * ` unsigned int count,
+ * ` unsigned int *pdone,
+ * ` unsigned int foreigndom)
+ */
+/* HYPERVISOR_mmuext_op() accepts a list of mmuext_op structures.
* A foreigndom (FD) can be specified (or DOMID_SELF for none).
* Where the FD has some effect, it is described below.
- *
+ *
* cmd: MMUEXT_(UN)PIN_*_TABLE
* mfn: Machine frame number to be (un)pinned as a p.t. page.
* The frame must belong to the FD, if one is specified.
- *
+ *
* cmd: MMUEXT_NEW_BASEPTR
* mfn: Machine frame number of new page-table base to install in MMU.
- *
+ *
* cmd: MMUEXT_NEW_USER_BASEPTR [x86/64 only]
* mfn: Machine frame number of new page-table base to install in MMU
* when in user space.
- *
+ *
* cmd: MMUEXT_TLB_FLUSH_LOCAL
* No additional arguments. Flushes local TLB.
- *
+ *
* cmd: MMUEXT_INVLPG_LOCAL
* linear_addr: Linear address to be flushed from the local TLB.
- *
+ *
* cmd: MMUEXT_TLB_FLUSH_MULTI
* vcpumask: Pointer to bitmap of VCPUs to be flushed.
- *
+ *
* cmd: MMUEXT_INVLPG_MULTI
* linear_addr: Linear address to be flushed.
* vcpumask: Pointer to bitmap of VCPUs to be flushed.
- *
+ *
* cmd: MMUEXT_TLB_FLUSH_ALL
* No additional arguments. Flushes all VCPUs' TLBs.
- *
+ *
* cmd: MMUEXT_INVLPG_ALL
* linear_addr: Linear address to be flushed from all VCPUs' TLBs.
- *
+ *
* cmd: MMUEXT_FLUSH_CACHE
* No additional arguments. Writes back and flushes cache contents.
*
* cmd: MMUEXT_FLUSH_CACHE_GLOBAL
* No additional arguments. Writes back and flushes cache contents
* on all CPUs in the system.
- *
+ *
* cmd: MMUEXT_SET_LDT
* linear_addr: Linear address of LDT base (NB. must be page-aligned).
* nr_ents: Number of entries in LDT.
@@ -375,6 +381,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
* cmd: MMUEXT_[UN]MARK_SUPER
* mfn: Machine frame number of head of superpage to be [un]marked.
*/
+/* ` enum mmuext_cmd { */
#define MMUEXT_PIN_L1_TABLE 0
#define MMUEXT_PIN_L2_TABLE 1
#define MMUEXT_PIN_L3_TABLE 2
@@ -395,10 +402,11 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
#define MMUEXT_FLUSH_CACHE_GLOBAL 18
#define MMUEXT_MARK_SUPER 19
#define MMUEXT_UNMARK_SUPER 20
+/* ` } */
#ifndef __ASSEMBLY__
struct mmuext_op {
- unsigned int cmd;
+ unsigned int cmd; /* => enum mmuext_cmd */
union {
/* [UN]PIN_TABLE, NEW_BASEPTR, NEW_USER_BASEPTR
* CLEAR_PAGE, COPY_PAGE, [UN]MARK_SUPER */
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
2012-11-02 11:18 ` [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain) Ian Campbell
2012-11-02 11:18 ` [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op Ian Campbell
@ 2012-11-02 11:18 ` Ian Campbell
2012-11-19 11:32 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor Ian Campbell
` (3 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 11:18 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1351855004 -3600
# Node ID 433d5d988e30e66a9c1f53a9b2c027692f9de11c
# Parent e6880358eba346c386b9faaa30a5489df73a06a0
docs: Add ToC entry for start of day memory layout.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r e6880358eba3 -r 433d5d988e30 xen/include/public/xen.h
--- a/xen/include/public/xen.h Fri Nov 02 12:16:30 2012 +0100
+++ b/xen/include/public/xen.h Fri Nov 02 12:16:44 2012 +0100
@@ -680,7 +680,8 @@ typedef struct shared_info shared_info_t
#endif
/*
- * Start-of-day memory layout:
+ * `incontents 200 startofday Start-of-day memory layout
+ *
* 1. The domain is started within contiguous virtual-memory region.
* 2. The contiguous region ends on an aligned 4MB boundary.
* 3. This the order of bootstrap elements in the initial virtual region:
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
` (2 preceding siblings ...)
2012-11-02 11:18 ` [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout Ian Campbell
@ 2012-11-02 11:18 ` Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall Ian Campbell
` (2 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 11:18 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1351855019 -3600
# Node ID bb43f655bcc863d681fe30cea477f12333dd0cc6
# Parent 433d5d988e30e66a9c1f53a9b2c027692f9de11c
docs: Document HYPERVISOR_update_descriptor
Mark-up for inclusion of generated docs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r 433d5d988e30 -r bb43f655bcc8 xen/include/public/arch-x86/xen.h
--- a/xen/include/public/arch-x86/xen.h Fri Nov 02 12:16:44 2012 +0100
+++ b/xen/include/public/arch-x86/xen.h Fri Nov 02 12:16:59 2012 +0100
@@ -71,7 +71,7 @@ typedef unsigned long xen_pfn_t;
#endif
/*
- * SEGMENT DESCRIPTOR TABLES
+ * `incontents 200 segdesc Segment Descriptor Tables
*/
/*
* ` enum neg_errnoval
@@ -83,11 +83,24 @@ typedef unsigned long xen_pfn_t;
* start of the GDT because some stupid OSes export hard-coded selector values
* in their ABI. These hard-coded values are always near the start of the GDT,
* so Xen places itself out of the way, at the far end of the GDT.
+ *
+ * NB The LDT is set using the MMUEXT_SET_LDT op of HYPERVISOR_mmuext_op
*/
#define FIRST_RESERVED_GDT_PAGE 14
#define FIRST_RESERVED_GDT_BYTE (FIRST_RESERVED_GDT_PAGE * 4096)
#define FIRST_RESERVED_GDT_ENTRY (FIRST_RESERVED_GDT_BYTE / 8)
+
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_update_descriptor(u64 pa, u64 desc);
+ * `
+ * ` @pa The machine physical address of the descriptor to
+ * ` update. Must be either a descriptor page or writable.
+ * ` @desc The descriptor value to update, in the same format as a
+ * ` native descriptor table entry.
+ */
+
/* Maximum number of virtual CPUs in legacy multi-processor guests. */
#define XEN_LEGACY_MAX_VCPUS 32
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
` (3 preceding siblings ...)
2012-11-02 11:18 ` [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor Ian Campbell
@ 2012-11-02 11:18 ` Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-12 10:04 ` [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
2012-11-16 16:55 ` Ian Campbell
6 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 11:18 UTC (permalink / raw)
To: xen-devel; +Cc: Ian Campbell
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1351855028 -3600
# Node ID afb8de49d9a8f91cf1b0248842e56be8e4b48995
# Parent bb43f655bcc863d681fe30cea477f12333dd0cc6
docs: Include prototype for HYPERVISOR_multicall
Mark-up for inclusion of generated docs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r bb43f655bcc8 -r afb8de49d9a8 xen/include/public/xen.h
--- a/xen/include/public/xen.h Fri Nov 02 12:16:59 2012 +0100
+++ b/xen/include/public/xen.h Fri Nov 02 12:17:08 2012 +0100
@@ -540,7 +540,10 @@ typedef struct mmu_update mmu_update_t;
DEFINE_XEN_GUEST_HANDLE(mmu_update_t);
/*
- * Send an array of these to HYPERVISOR_multicall().
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_multicall(multicall_entry_t call_list[],
+ * ` unsigned int nr_calls);
+ *
* NB. The fields are natural register size for this architecture.
*/
struct multicall_entry {
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain)
2012-11-02 11:18 ` [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain) Ian Campbell
@ 2012-11-02 11:33 ` David Vrabel
2012-11-02 13:07 ` Ian Campbell
0 siblings, 1 reply; 17+ messages in thread
From: David Vrabel @ 2012-11-02 11:33 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
On 02/11/12 11:18, Ian Campbell wrote:
> docs: document HYPERVISOR_update_va_mapping(_other_domain)
>
> Mark-up for inclusion of generated docs.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>
> diff -r 37a8946eeb9d -r e8e1191aef20 xen/include/public/xen.h
> --- a/xen/include/public/xen.h Fri Oct 26 16:09:29 2012 +0100
> +++ b/xen/include/public/xen.h Fri Nov 02 12:15:56 2012 +0100
> @@ -423,9 +423,25 @@ typedef struct mmuext_op mmuext_op_t;
> DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
> #endif
>
> +/*
> + * ` enum neg_errnoval
> + * ` HYPERVISOR_update_va_mapping(unsigned long va, u64 val,
> + * ` enum uvm_flags flags)
> + * `
> + * ` enum neg_errnoval
> + * ` HYPERVISOR_update_va_mapping_otherdomain(unsigned long va, u64 val,
> + * ` enum uvm_flags flags,
> + * ` domid_t domid)
> + * `
> + * ` @va: The virtual address whose mapping we want to change
> + * ` @val: The new page table entry
Suggest mentioning that this PTE requires the MFN not PFN.
David
> + * ` @flags: Control TLB flushes
> + */
> +*/
> /* These are passed as 'flags' to update_va_mapping. They can be ORed. */
> /* When specifying UVMF_MULTI, also OR in a pointer to a CPU bitmap. */
> /* UVMF_LOCAL is merely UVMF_MULTI with a NULL bitmap pointer. */
> +/* ` enum uvm_flags { */
> #define UVMF_NONE (0UL<<0) /* No flushing at all. */
> #define UVMF_TLB_FLUSH (1UL<<0) /* Flush entire TLB(s). */
> #define UVMF_INVLPG (2UL<<0) /* Flush only one entry. */
> @@ -433,6 +449,7 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
> #define UVMF_MULTI (0UL<<2) /* Flush subset of TLBs. */
> #define UVMF_LOCAL (0UL<<2) /* Flush local TLB. */
> #define UVMF_ALL (1UL<<2) /* Flush all TLBs. */
> +/* ` } */
>
> /*
> * Commands to HYPERVISOR_console_io().
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain)
2012-11-02 11:33 ` David Vrabel
@ 2012-11-02 13:07 ` Ian Campbell
2012-11-19 11:31 ` Ian Jackson
0 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-02 13:07 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel@lists.xen.org
On Fri, 2012-11-02 at 11:33 +0000, David Vrabel wrote:
> > + * ` @va: The virtual address whose mapping we want to change
> > + * ` @val: The new page table entry
>
> Suggest mentioning that this PTE requires the MFN not PFN.
Good idea.
8<---------------------------------------------
# HG changeset patch
# User Ian Campbell <ian.campbell@citrix.com>
# Date 1351861598 0
# Node ID 3753cf4617500ee0ac443eeba4f6a12257d77253
# Parent 37a8946eeb9db8b5eafc1c75aded006ad5322af8
docs: document HYPERVISOR_update_va_mapping(_other_domain)
Mark-up for inclusion of generated docs.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
diff -r 37a8946eeb9d -r 3753cf461750 xen/include/public/xen.h
--- a/xen/include/public/xen.h Fri Oct 26 16:09:29 2012 +0100
+++ b/xen/include/public/xen.h Fri Nov 02 13:06:38 2012 +0000
@@ -423,9 +423,25 @@ typedef struct mmuext_op mmuext_op_t;
DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
#endif
+/*
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_update_va_mapping(unsigned long va, u64 val,
+ * ` enum uvm_flags flags)
+ * `
+ * ` enum neg_errnoval
+ * ` HYPERVISOR_update_va_mapping_otherdomain(unsigned long va, u64 val,
+ * ` enum uvm_flags flags,
+ * ` domid_t domid)
+ * `
+ * ` @va: The virtual address whose mapping we want to change
+ * ` @val: The new page table entry, must contain a machine address
+ * ` @flags: Control TLB flushes
+ */
+*/
/* These are passed as 'flags' to update_va_mapping. They can be ORed. */
/* When specifying UVMF_MULTI, also OR in a pointer to a CPU bitmap. */
/* UVMF_LOCAL is merely UVMF_MULTI with a NULL bitmap pointer. */
+/* ` enum uvm_flags { */
#define UVMF_NONE (0UL<<0) /* No flushing at all. */
#define UVMF_TLB_FLUSH (1UL<<0) /* Flush entire TLB(s). */
#define UVMF_INVLPG (2UL<<0) /* Flush only one entry. */
@@ -433,6 +449,7 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t);
#define UVMF_MULTI (0UL<<2) /* Flush subset of TLBs. */
#define UVMF_LOCAL (0UL<<2) /* Flush local TLB. */
#define UVMF_ALL (1UL<<2) /* Flush all TLBs. */
+/* ` } */
/*
* Commands to HYPERVISOR_console_io().
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0 of 5] docs: x86 PV MMU related functions
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
` (4 preceding siblings ...)
2012-11-02 11:18 ` [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall Ian Campbell
@ 2012-11-12 10:04 ` Ian Campbell
2012-11-16 16:55 ` Ian Campbell
6 siblings, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2012-11-12 10:04 UTC (permalink / raw)
To: xen-devel@lists.xen.org; +Cc: Keir Fraser, Ian Jackson
ping?
On Fri, 2012-11-02 at 11:18 +0000, Ian Campbell wrote:
> The following series adds some documentation for the PV MU related
> hypercalls. For the most part this is just a case of adding a suitable
> prototype and marking things up so they show up in the generated docs.
>
> I also have a draft of a wiki article on the subject which references
> the information in the public headers which I hope to post soon.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0 of 5] docs: x86 PV MMU related functions
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
` (5 preceding siblings ...)
2012-11-12 10:04 ` [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
@ 2012-11-16 16:55 ` Ian Campbell
2012-11-18 21:02 ` Pasi Kärkkäinen
6 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2012-11-16 16:55 UTC (permalink / raw)
To: xen-devel@lists.xen.org
On Fri, 2012-11-02 at 11:18 +0000, Ian Campbell wrote:
>
> I also have a draft of a wiki article on the subject which references
> the information in the public headers which I hope to post soon.
I realised I forgot to do this...
It needs some polish but the majority of the XXX's are placeholder for
links to the result of this applying this series.
8<------------------------------------
Paravirtualised X86 Memory Management
= Intro =
One of the original innovations of the Xen hypervisor was the a
paravirtualisation of the memory management unit (MMU). This allowed
for fas and efficient virtualisation of Operating Systems which used
paging compared to contemporary techniques.
In this article we will describe the functionality of the PV MMU for
X86 Xen guests. A familiarity with X86 paging and related concepts
will be assumed.
Other guest types, such as HVM or PVH guests on X86 or guest on ARM
achieve virtualisation of the MMU usaing other techniques, such as the
use of hardware assisted or shadow paging.
= Direct Paging =
In order to virtualised the memory subsystem all hypervisors introduce
an additional level of abstraction between what the guest sees as
physical memory (pseudo-physical) and the underlying memory of the
machine (called machine addresses in Xen). This is usually done
through the introduction of a physical to machine (P2M)
mapping. Typically this would be maintained within the hypervisor and
hidden from the guest Operating System through techniques such as
Shadow Paging.
The Xen paravirtualised MMU model instead requires that the guest be
aware of the P2M mapping and be modified such that instead of writing
page table entries mapping virtual addresses to the physical address
space it would instead write entries mapping virtual addresses
directly to the machine address space by mapping from pseudo physical
to machine addresses using the P2M as it writes its page tables. This
technique is known as direct paging.
= Page Types and Invariants =
In order to ensure that the guest cannot subvert the system Xen
requires that certain invariants are met and therefore that all
updates to the page table updates are performed by Xen through the use
of hypercalls.
To this end Xen defines a number of page types and ensures that any
given page has exactly one type at any given time. The type of a page
is reference counted and can only be changed when the "type count" is
zero.
The basic types are:
* None: No special uses.
* Page table page: Pages used as page tables (there are separate types
for each of the 4 levels on 64 bit and 3 levels on 32 bit PAE
guests).
* Segment descriptor page: Page is used as part of the Global or Local
Descriptor table (GDT/LDT).
* Writeable: Page is writable.
Xen enforces the invariant that only pages with the writable type have
a writable mapping in the page tables. Likewise it ensures that no
writable mapping exists of a page with any other type. It also
enforces other invariants such as requiring that no page table page
can make a non-privlieged mapping of the hypervisor's virutal address
space etc. By doing this it can ensure that the guest OS is not able
to directly modify any critical data structures and therefore subvert
the safety of the system, for example to map machine addresses which
do not belong to it.
Whenever a set of page-tables is loaded into the hardware page-table
base register ('cr3') the hypervisor must take an appropriate type
reference with the root page-table type (that is, an L4 reference on
64-bit or an L3 reference on 32-bit). If the page is not already of
the required type then in order to take the initial reference it must
first have a type count of zero (remember, a pages' type only be
change while the type count is zero) and must be validated to ensure
that it respects the invariants. This in turn means that the pages
referenced by the root page-table must be validates as having the
correct type (i.e. L3 or L2 on 64- or 32-bit repsectively), and so on
down to the data pages at the leafs of the page-table, thereby
ensuring that the page table as a whole is safe to load into 'cr3'.
XXX link to appropriate header.
In order to maintain the necessary invariants Xen must be involved in
all updates to the page tables, as well as various other privileged
operations. These are covered in the following sections.
In order to prevent guest operating systems from subverting these
mechanisms it is also necessary for guest kernels to run without the
normal privileges associated with running in processor ring-0. For this
reason Xen PV guest kernels usually run in either ring-1 (32-bit
guests) or ring-3 (64-bit guests).
= Updating Page Tables =
Since the page tables are not writable by the guest Xen provides
several machanisms by which the guest can update a page table entry.
== mmu_update hypercall ==
The first mechanism provided by Xen is the HYPERVISOR_mmu_update
hypercall [XXX link]. This hypercall has the prototype:
struct mmu_update {
uint64_t ptr; /* Machine address of PTE. */
uint64_t val; /* New contents of PTE. */
};
long HYPERVISOR_mmu_update(const struct mmu_update reqs[],
unsigned count, unsigned *done_out,
unsigned foreigndom)
The operation takes an array of 'count' requests 'reqs'. The
'done_out' paramter returns an indication of the number of successful
operations. 'foreigndom' can be used by a suitably privileged domain
to access memory belonging to other domains (this usage is not covered
here).
Each request is a ('ptr','value') pair. The 'ptr' field is further
divides into 'ptr[1:0]' indicating the type of update to perform and
'ptr[:2]' which indicates the the address to update.
The valid values for 'ptr[1:0]' are:
* MMU_NORMAL_PT_UPDATE: A normal page table update. 'ptr[:2]' contains
the machine address of the entry to update while 'val' is the Page
Table Entry to write. This effectively implements '*ptr = val' with
checks to ensure that the required invariants aree preserved.
* MMU_MACHPHYS_UPDATE: Update the machine to physical address
mapping. This is covered below, see [XXX link]
* MMU_PT_UPDATE_PRESERVE_AD: As per MMU_NORMAL_PT_UPDATE but
preserving the Accessed and Dirty bits in the page table entry. The
'val' here is almost a standard Page Table Entry but with some
special handling. See the [XXX link hypercall documentation] for more
information.
== update_va_mapping hypercall ==
The second mechanism provided by Xen is the
HYPERVISOR_update_va_mapping hypercall [XXX link]. This hypercall has
the prototype:
long
HYPERVISOR_update_va_mapping(unsigned long va, u64 val,
enum update_va_mapping_flags flags)
This operation simply updates the leaf PTE entry (called and L1 in
Xen) which maps the virtual address 'va' with the given value
'val', while of course performing the expected checks to ensure that
the invariants are maintained. This can be though of as updating the
PTE using a [XXX link linear mapping].
The flags parameter can be used to request that Xen flush the TLB
entries associated with the update. See the [XXX link hypercall
documentation for more].
== Trap and emulate of page table writes ==
As well as the above Xen can also trap and emulate updates to leaf
page table entries (L1) only. This trapping and emulating is
relatively expensive and is best avoided but for little used code
paths can provide a reasonable trade off vs.the requirement to modify
the callsite in the guest OS.
= Other privileged operations =
As well as moderating page table updates in order to maintain the
necessary invariants Xen must also be involved in certain other
privileged operations, such as setting a new page table base
('cr3'). Because the guest kernel no longer runs in ring-0 certain
other privleged operations must also be done by the hypervisor, such
as flushing the TLB.
These operations are performed via the HYPERVISOR_mmuext_op hypercall
[XXX link]. This hypercall has the following prototype:
struct mmuext_op {
unsigned int cmd; /* => enum mmuext_cmd */
union {
/* [UN]PIN_TABLE, NEW_BASEPTR, NEW_USER_BASEPTR
* CLEAR_PAGE, COPY_PAGE, [UN]MARK_SUPER */
xen_pfn_t mfn;
/* INVLPG_LOCAL, INVLPG_ALL, SET_LDT */
unsigned long linear_addr;
} arg1;
union {
/* SET_LDT */
unsigned int nr_ents;
/* TLB_FLUSH_MULTI, INVLPG_MULTI */
const void *vcpumask;
/* COPY_PAGE */
xen_pfn_t src_mfn;
} arg2;
};
long
HYPERVISOR_mmuext_op(struct mmuext_op uops[],
unsigned int count,
unsigned int *pdone,
unsigned int foreigndom)
The hypercall takes an array of 'count' operations each specified by
the 'mmuext_op' struct. This hypercall allows access to various
operations which must be performed via the hypervisor either because
the guest kernel is no longer privileged or because the hypervisor
must be involed in order to maintain safety, in general each available
command corresponds to a low-level processor function. The include
NEWBASE_PTR (write cr3), various types of TLB and cache flush and to
set the LDT table address (see below). For more information on the
available operations please see [XXX link the hypercall
documentation].
= Pinning Page Tables =
As discussed above Xen ensures that various invariants are met
concerning whether certain pages are mapped writable or not. This
in turn means that Xen needs to validate the page tables whenever they
are loaded into 'cr3'. However this is a potentially expensive
operation since Xen needs to walk the complete set of page-tables and
validate each one recursivley.
In order to avoid this expense every time 'cr3' changes (i.e. on every
context switch). Xen allows a page to be explictly ''pinned'' to a
give type. This effectively means taking an extra reference of the
relevant page table type, thereby forcing Xen to validate the
page-table up front and to maintain the invariants for as long as the
pin remains in place. By doing this the guest ensures that when a new
'cr3' is loaded the referenced page already has the appropriate type
(L4 or L3) and therefore the type count can simply be incremented
without the need to validate.
For maximum performance a guest OS kernel will usually want to perform
a pin operation as late as possible during the setup of a new set of
page tables, so as to be able to construct them using normal writable
mappings before blessing them as a set of page tables. Likewise on
page-table teardown a guest OS will usually want to unpin the pages as
soon as possible such that it can teardown the page tables without the
use of hypercalls. These operations are usually refered to as 'late
pin' and 'early unpin'.
= The Physical-to-machine and machine-to-physical mapping tables =
As discussed above direct paging requires that the guest Operating
System be aware of the mapping between (pseudo-physical) and machine
addresses (the P2M table). In addition in order to be able to read PTE
entries (which contain machine addresses) and convert them back into
pseudo-physical addresses a translation between, this is done using
the M2P table.
Each table is a simple array of frame numbers, indexed by either
physical or machine frames and looking up the other.
Since the P2M is sized according to the guest's pseudo-physical
address it is left entirely up to the guest to provide and maintain in
its own pages.
However the M2P must be sized according to the total amount of RAM in
the host and therefore could be of considerable ize compared to the
amount of RAM available to the guest, not to mention sparse from the
guest's point of view since the majority of machine pages will not
belong to it.
For this reason Xen exposes a read-only M2P of the entire host to the
guest and allows guests to update this table using the
MMU_MACHPHYS_UPDATE sub-op of the HYPERVISOR_mmu_update hypercall [XXX
link].
= Descriptor Tables =
As well as protecting page tables from being writable by the guest Xen
also requires that various descriptor tables must be made unavailable
to the guest.
== Interrupt Descriptor Table ==
A Xen guest cannot access the IDT directly. Instead Xen maintains its
own IDT and allows guest to write entries using the
HYPERVISOR_set_trap_table hypercall. This has the following prototype:
XXX link.
struct trap_info {
uint8_t vector; /* exception vector
*/
uint8_t flags; /* 0-3: privilege level; 4: clear event
enable? */
uint16_t cs; /* code selector
*/
unsigned long address; /* code offset
*/
};
long HYPERVISOR_set_trap_table(const struct trap_info traps[]);
The entires of the ''trap_info'' struct correspond to the fields of a
native IDT entry and each will be validated by Xen before it is
used. The hypercall takes an array of traps terminated by an entry
where ''address'' is zero.
== Global/Local Descriptor Tables ==
A Xen guest is not able to access the Global or Local descriptor
tables directly. Pages which are in use as part of either table are
given their own distinct type and must therefore be mapped as
read-only in the guest.
The guest is also not privileged to update the descriptor base
registers and must therefore do so using a hypercall. The hypercall to
update the GDT is:
long HYPERVISOR_set_gdt(const xen_pfn_t frames[], unsigned int
entries);
This takes an array of machine frame numbers which are validated and
loaded into the virtual GDTR. Note that unlike native X86 these are
machine frames and not virtual addresses. These frames will be mapped
by Xen into the virtual address which it reserves for this purpose.
The LDT is set using the MMUEXT_SET_LDT sub-op of the
HYPERVISOR_mmuext_op hypercall. [XXX link.] XXX a single page?
Finally since the pages cannot be mapped as writable by the guest the
HYPERVISOR_update_descriptor hypercall is provided:
long HYPERVISOR_update_descriptor(u64 pa, u64 desc);
It takes a machine physical address of a descriptor entry to update
and the requested contents of the descriptor itself, in the same
format as the native descriptors.
= Start Of Day =
The initial boot time environment of a Xen PV guest is somewhat
different to the normal initial mode of an X86 processor. Rather than
starting out in 16-bit mode with paging disabled a PV guest is
started in either 32- or 64- bit mode with paging enabled running on
an initial set of page tables provided by the hypervisor. These pages
will be setup so as to meet the required invariants and will be loaded
into the 'cr3' register but will not be explicitly pinned (in other
words their type count is effectively one)
The initial virtual and pseudo-physical layout of a new guest is
described in XXX
file:///home/ijc/devel/xen-unstable.hg/docs/html/hypercall/include,public,xen.h.html#incontents_startofday
= Virtual Address Space =
Xen enforces certain restrictions on the virtual addresses which are
available to PV guests. These are enforced as part of the machinery for
typing and writing page tables.
Xen uses this to reserve certain addresses for its own use. Certain
areas are also read-only for guests and contain shared datastructures
such as the Macine-to-physical address lookup table.
For a 64-bit guest Xen the virtual address space is setout as follows:
0x0000000000000000-0x00007fffffffffff Fully available to guests
0x0000800000000000-0xffff7fffffffffff Inaccessible (addresses are 48-bit
sign extended)
0xffff800000000000-0xffff807fffffffff Read only to guests.
0xffff808000000000-0xffff87ffffffffff Reserved for Xen use
0xffff880000000000-0xffffffffffffffff Fully Available to guests
For 32-bit guests running on a 64-bit hypervisor guests the virtual
address space under 4G (which is all such guests can access is:
0x00000000-0xf57fffff Fully available to guests
0xf5800000-0xffffffff Read only to guests.
For more information see "Memory Layout" under [XXX link
xen/include/asm-x86/config.h]
= Batching =
For some memory management operations the overhead of making many
hypercalls can become prohibively expensive. For this reason many of
the hypercalls described above take a list of operations to
perform. In addition Xen provides the concept of a multicall which can
allow several different hypercalls to be batched
together. HYPERVISOR_multicall has this prototype:
struct multicall_entry {
unsigned long op, result;
unsigned long args[6];
};
long HYPERVISOR_multicall(multicall_entry_t call_list[],
unsigned int nr_calls);
Each entry represents a hypercall and its associated arguments in the
(hopefully) obvious way.
= Guest Specific Details
== Linux paravirt_ops ==
=== General PV MMU operation ===
The Linux ''paravirt_ops'' infrastructure provides a mechanism by
which the low-level MMU operations are abstracted into function
pointers allowing the native operations where necessary.
>From the point of view of MMU operations the main entry point is
''struct pv_mmu_ops''. This contains entry points for low level
operations such as:
* Allocating/freeing page table entries. These allow the kernel to
mark the pages read-only and read-write as the pages are reused.
* Creating, writing and reading PTE entries. These allow the kernel
to make the necessary translations between pseudo-physical and
machine addressing as well as using hypercalls instead of direct
writes.
* Reading and writing of control registers, e.g. cr3, to allow
hypercalls to be inserted.
* Various TLB flush operations, again to allow their replacement by
hypercalls.
As well as these the interface includes some higher-level operations
which allow for more efficient batching of compound operations such as
duplicating (forking) a memory map. This is achieved by using the
''lazy_mmu_ops'' hooks to implement buffering of operations
and flushing of larger batches or upon completion.
The Xen paravirt_ops backend uses an additional page flag,
''PG_pinned'' in order to track whether a page has been pinned or not
and implemented the late-pin early-unpin scheme described above.
=== Start of Day issues ===
XXX get someone to describe these...
= References =
[XXX Xen and the art of virtualisation.]
[XXX The hypercall interface documentation.]
[XXX others? Chisnal Book?]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0 of 5] docs: x86 PV MMU related functions
2012-11-16 16:55 ` Ian Campbell
@ 2012-11-18 21:02 ` Pasi Kärkkäinen
2012-11-19 10:29 ` Ian Campbell
0 siblings, 1 reply; 17+ messages in thread
From: Pasi Kärkkäinen @ 2012-11-18 21:02 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel@lists.xen.org
On Fri, Nov 16, 2012 at 04:55:13PM +0000, Ian Campbell wrote:
> On Fri, 2012-11-02 at 11:18 +0000, Ian Campbell wrote:
> >
> > I also have a draft of a wiki article on the subject which references
> > the information in the public headers which I hope to post soon.
>
> I realised I forgot to do this...
>
> It needs some polish but the majority of the XXX's are placeholder for
> links to the result of this applying this series.
>
Hello,
Comments about some small typos..
> 8<------------------------------------
>
> Paravirtualised X86 Memory Management
>
> = Intro =
>
> One of the original innovations of the Xen hypervisor was the a
^^^^^
"was the a paravirtualisation". Extra "a" ?
> paravirtualisation of the memory management unit (MMU). This allowed
> for fas and efficient virtualisation of Operating Systems which used
^^
"fast".
> paging compared to contemporary techniques.
>
> In this article we will describe the functionality of the PV MMU for
> X86 Xen guests. A familiarity with X86 paging and related concepts
> will be assumed.
>
> Other guest types, such as HVM or PVH guests on X86 or guest on ARM
> achieve virtualisation of the MMU usaing other techniques, such as the
^^^^^^
"using".
> use of hardware assisted or shadow paging.
>
> = Direct Paging =
>
> In order to virtualised the memory subsystem all hypervisors introduce
^^^
"to virtualise" ?
-- Pasi
> an additional level of abstraction between what the guest sees as
> physical memory (pseudo-physical) and the underlying memory of the
> machine (called machine addresses in Xen). This is usually done
> through the introduction of a physical to machine (P2M)
> mapping. Typically this would be maintained within the hypervisor and
> hidden from the guest Operating System through techniques such as
> Shadow Paging.
>
> The Xen paravirtualised MMU model instead requires that the guest be
> aware of the P2M mapping and be modified such that instead of writing
> page table entries mapping virtual addresses to the physical address
> space it would instead write entries mapping virtual addresses
> directly to the machine address space by mapping from pseudo physical
> to machine addresses using the P2M as it writes its page tables. This
> technique is known as direct paging.
>
> = Page Types and Invariants =
>
> In order to ensure that the guest cannot subvert the system Xen
> requires that certain invariants are met and therefore that all
> updates to the page table updates are performed by Xen through the use
> of hypercalls.
>
> To this end Xen defines a number of page types and ensures that any
> given page has exactly one type at any given time. The type of a page
> is reference counted and can only be changed when the "type count" is
> zero.
>
> The basic types are:
>
> * None: No special uses.
> * Page table page: Pages used as page tables (there are separate types
> for each of the 4 levels on 64 bit and 3 levels on 32 bit PAE
> guests).
> * Segment descriptor page: Page is used as part of the Global or Local
> Descriptor table (GDT/LDT).
> * Writeable: Page is writable.
>
> Xen enforces the invariant that only pages with the writable type have
> a writable mapping in the page tables. Likewise it ensures that no
> writable mapping exists of a page with any other type. It also
> enforces other invariants such as requiring that no page table page
> can make a non-privlieged mapping of the hypervisor's virutal address
> space etc. By doing this it can ensure that the guest OS is not able
> to directly modify any critical data structures and therefore subvert
> the safety of the system, for example to map machine addresses which
> do not belong to it.
>
> Whenever a set of page-tables is loaded into the hardware page-table
> base register ('cr3') the hypervisor must take an appropriate type
> reference with the root page-table type (that is, an L4 reference on
> 64-bit or an L3 reference on 32-bit). If the page is not already of
> the required type then in order to take the initial reference it must
> first have a type count of zero (remember, a pages' type only be
> change while the type count is zero) and must be validated to ensure
> that it respects the invariants. This in turn means that the pages
> referenced by the root page-table must be validates as having the
> correct type (i.e. L3 or L2 on 64- or 32-bit repsectively), and so on
> down to the data pages at the leafs of the page-table, thereby
> ensuring that the page table as a whole is safe to load into 'cr3'.
>
> XXX link to appropriate header.
>
> In order to maintain the necessary invariants Xen must be involved in
> all updates to the page tables, as well as various other privileged
> operations. These are covered in the following sections.
>
> In order to prevent guest operating systems from subverting these
> mechanisms it is also necessary for guest kernels to run without the
> normal privileges associated with running in processor ring-0. For this
> reason Xen PV guest kernels usually run in either ring-1 (32-bit
> guests) or ring-3 (64-bit guests).
>
> = Updating Page Tables =
>
> Since the page tables are not writable by the guest Xen provides
> several machanisms by which the guest can update a page table entry.
>
> == mmu_update hypercall ==
>
> The first mechanism provided by Xen is the HYPERVISOR_mmu_update
> hypercall [XXX link]. This hypercall has the prototype:
>
> struct mmu_update {
> uint64_t ptr; /* Machine address of PTE. */
> uint64_t val; /* New contents of PTE. */
> };
>
> long HYPERVISOR_mmu_update(const struct mmu_update reqs[],
> unsigned count, unsigned *done_out,
> unsigned foreigndom)
>
> The operation takes an array of 'count' requests 'reqs'. The
> 'done_out' paramter returns an indication of the number of successful
> operations. 'foreigndom' can be used by a suitably privileged domain
> to access memory belonging to other domains (this usage is not covered
> here).
>
> Each request is a ('ptr','value') pair. The 'ptr' field is further
> divides into 'ptr[1:0]' indicating the type of update to perform and
> 'ptr[:2]' which indicates the the address to update.
>
> The valid values for 'ptr[1:0]' are:
>
> * MMU_NORMAL_PT_UPDATE: A normal page table update. 'ptr[:2]' contains
> the machine address of the entry to update while 'val' is the Page
> Table Entry to write. This effectively implements '*ptr = val' with
> checks to ensure that the required invariants aree preserved.
> * MMU_MACHPHYS_UPDATE: Update the machine to physical address
> mapping. This is covered below, see [XXX link]
> * MMU_PT_UPDATE_PRESERVE_AD: As per MMU_NORMAL_PT_UPDATE but
> preserving the Accessed and Dirty bits in the page table entry. The
> 'val' here is almost a standard Page Table Entry but with some
> special handling. See the [XXX link hypercall documentation] for more
> information.
>
> == update_va_mapping hypercall ==
>
> The second mechanism provided by Xen is the
> HYPERVISOR_update_va_mapping hypercall [XXX link]. This hypercall has
> the prototype:
>
> long
> HYPERVISOR_update_va_mapping(unsigned long va, u64 val,
> enum update_va_mapping_flags flags)
>
> This operation simply updates the leaf PTE entry (called and L1 in
> Xen) which maps the virtual address 'va' with the given value
> 'val', while of course performing the expected checks to ensure that
> the invariants are maintained. This can be though of as updating the
> PTE using a [XXX link linear mapping].
>
> The flags parameter can be used to request that Xen flush the TLB
> entries associated with the update. See the [XXX link hypercall
> documentation for more].
>
> == Trap and emulate of page table writes ==
>
> As well as the above Xen can also trap and emulate updates to leaf
> page table entries (L1) only. This trapping and emulating is
> relatively expensive and is best avoided but for little used code
> paths can provide a reasonable trade off vs.the requirement to modify
> the callsite in the guest OS.
>
> = Other privileged operations =
>
> As well as moderating page table updates in order to maintain the
> necessary invariants Xen must also be involved in certain other
> privileged operations, such as setting a new page table base
> ('cr3'). Because the guest kernel no longer runs in ring-0 certain
> other privleged operations must also be done by the hypervisor, such
> as flushing the TLB.
>
> These operations are performed via the HYPERVISOR_mmuext_op hypercall
> [XXX link]. This hypercall has the following prototype:
>
> struct mmuext_op {
> unsigned int cmd; /* => enum mmuext_cmd */
> union {
> /* [UN]PIN_TABLE, NEW_BASEPTR, NEW_USER_BASEPTR
> * CLEAR_PAGE, COPY_PAGE, [UN]MARK_SUPER */
> xen_pfn_t mfn;
> /* INVLPG_LOCAL, INVLPG_ALL, SET_LDT */
> unsigned long linear_addr;
> } arg1;
> union {
> /* SET_LDT */
> unsigned int nr_ents;
> /* TLB_FLUSH_MULTI, INVLPG_MULTI */
> const void *vcpumask;
> /* COPY_PAGE */
> xen_pfn_t src_mfn;
> } arg2;
> };
>
> long
> HYPERVISOR_mmuext_op(struct mmuext_op uops[],
> unsigned int count,
> unsigned int *pdone,
> unsigned int foreigndom)
>
> The hypercall takes an array of 'count' operations each specified by
> the 'mmuext_op' struct. This hypercall allows access to various
> operations which must be performed via the hypervisor either because
> the guest kernel is no longer privileged or because the hypervisor
> must be involed in order to maintain safety, in general each available
> command corresponds to a low-level processor function. The include
> NEWBASE_PTR (write cr3), various types of TLB and cache flush and to
> set the LDT table address (see below). For more information on the
> available operations please see [XXX link the hypercall
> documentation].
>
> = Pinning Page Tables =
>
> As discussed above Xen ensures that various invariants are met
> concerning whether certain pages are mapped writable or not. This
> in turn means that Xen needs to validate the page tables whenever they
> are loaded into 'cr3'. However this is a potentially expensive
> operation since Xen needs to walk the complete set of page-tables and
> validate each one recursivley.
>
> In order to avoid this expense every time 'cr3' changes (i.e. on every
> context switch). Xen allows a page to be explictly ''pinned'' to a
> give type. This effectively means taking an extra reference of the
> relevant page table type, thereby forcing Xen to validate the
> page-table up front and to maintain the invariants for as long as the
> pin remains in place. By doing this the guest ensures that when a new
> 'cr3' is loaded the referenced page already has the appropriate type
> (L4 or L3) and therefore the type count can simply be incremented
> without the need to validate.
>
> For maximum performance a guest OS kernel will usually want to perform
> a pin operation as late as possible during the setup of a new set of
> page tables, so as to be able to construct them using normal writable
> mappings before blessing them as a set of page tables. Likewise on
> page-table teardown a guest OS will usually want to unpin the pages as
> soon as possible such that it can teardown the page tables without the
> use of hypercalls. These operations are usually refered to as 'late
> pin' and 'early unpin'.
>
> = The Physical-to-machine and machine-to-physical mapping tables =
>
> As discussed above direct paging requires that the guest Operating
> System be aware of the mapping between (pseudo-physical) and machine
> addresses (the P2M table). In addition in order to be able to read PTE
> entries (which contain machine addresses) and convert them back into
> pseudo-physical addresses a translation between, this is done using
> the M2P table.
>
> Each table is a simple array of frame numbers, indexed by either
> physical or machine frames and looking up the other.
>
> Since the P2M is sized according to the guest's pseudo-physical
> address it is left entirely up to the guest to provide and maintain in
> its own pages.
>
> However the M2P must be sized according to the total amount of RAM in
> the host and therefore could be of considerable ize compared to the
> amount of RAM available to the guest, not to mention sparse from the
> guest's point of view since the majority of machine pages will not
> belong to it.
>
> For this reason Xen exposes a read-only M2P of the entire host to the
> guest and allows guests to update this table using the
> MMU_MACHPHYS_UPDATE sub-op of the HYPERVISOR_mmu_update hypercall [XXX
> link].
>
> = Descriptor Tables =
>
> As well as protecting page tables from being writable by the guest Xen
> also requires that various descriptor tables must be made unavailable
> to the guest.
>
> == Interrupt Descriptor Table ==
>
> A Xen guest cannot access the IDT directly. Instead Xen maintains its
> own IDT and allows guest to write entries using the
> HYPERVISOR_set_trap_table hypercall. This has the following prototype:
> XXX link.
>
> struct trap_info {
> uint8_t vector; /* exception vector
> */
> uint8_t flags; /* 0-3: privilege level; 4: clear event
> enable? */
> uint16_t cs; /* code selector
> */
> unsigned long address; /* code offset
> */
> };
> long HYPERVISOR_set_trap_table(const struct trap_info traps[]);
>
> The entires of the ''trap_info'' struct correspond to the fields of a
> native IDT entry and each will be validated by Xen before it is
> used. The hypercall takes an array of traps terminated by an entry
> where ''address'' is zero.
>
> == Global/Local Descriptor Tables ==
>
> A Xen guest is not able to access the Global or Local descriptor
> tables directly. Pages which are in use as part of either table are
> given their own distinct type and must therefore be mapped as
> read-only in the guest.
>
>
> The guest is also not privileged to update the descriptor base
> registers and must therefore do so using a hypercall. The hypercall to
> update the GDT is:
>
> long HYPERVISOR_set_gdt(const xen_pfn_t frames[], unsigned int
> entries);
>
> This takes an array of machine frame numbers which are validated and
> loaded into the virtual GDTR. Note that unlike native X86 these are
> machine frames and not virtual addresses. These frames will be mapped
> by Xen into the virtual address which it reserves for this purpose.
>
> The LDT is set using the MMUEXT_SET_LDT sub-op of the
> HYPERVISOR_mmuext_op hypercall. [XXX link.] XXX a single page?
>
> Finally since the pages cannot be mapped as writable by the guest the
> HYPERVISOR_update_descriptor hypercall is provided:
>
> long HYPERVISOR_update_descriptor(u64 pa, u64 desc);
>
> It takes a machine physical address of a descriptor entry to update
> and the requested contents of the descriptor itself, in the same
> format as the native descriptors.
>
> = Start Of Day =
>
> The initial boot time environment of a Xen PV guest is somewhat
> different to the normal initial mode of an X86 processor. Rather than
> starting out in 16-bit mode with paging disabled a PV guest is
> started in either 32- or 64- bit mode with paging enabled running on
> an initial set of page tables provided by the hypervisor. These pages
> will be setup so as to meet the required invariants and will be loaded
> into the 'cr3' register but will not be explicitly pinned (in other
> words their type count is effectively one)
>
> The initial virtual and pseudo-physical layout of a new guest is
> described in XXX
> file:///home/ijc/devel/xen-unstable.hg/docs/html/hypercall/include,public,xen.h.html#incontents_startofday
>
> = Virtual Address Space =
>
> Xen enforces certain restrictions on the virtual addresses which are
> available to PV guests. These are enforced as part of the machinery for
> typing and writing page tables.
>
> Xen uses this to reserve certain addresses for its own use. Certain
> areas are also read-only for guests and contain shared datastructures
> such as the Macine-to-physical address lookup table.
>
> For a 64-bit guest Xen the virtual address space is setout as follows:
>
> 0x0000000000000000-0x00007fffffffffff Fully available to guests
> 0x0000800000000000-0xffff7fffffffffff Inaccessible (addresses are 48-bit
> sign extended)
> 0xffff800000000000-0xffff807fffffffff Read only to guests.
> 0xffff808000000000-0xffff87ffffffffff Reserved for Xen use
> 0xffff880000000000-0xffffffffffffffff Fully Available to guests
>
> For 32-bit guests running on a 64-bit hypervisor guests the virtual
> address space under 4G (which is all such guests can access is:
> 0x00000000-0xf57fffff Fully available to guests
> 0xf5800000-0xffffffff Read only to guests.
>
> For more information see "Memory Layout" under [XXX link
> xen/include/asm-x86/config.h]
>
> = Batching =
>
> For some memory management operations the overhead of making many
> hypercalls can become prohibively expensive. For this reason many of
> the hypercalls described above take a list of operations to
> perform. In addition Xen provides the concept of a multicall which can
> allow several different hypercalls to be batched
> together. HYPERVISOR_multicall has this prototype:
>
> struct multicall_entry {
> unsigned long op, result;
> unsigned long args[6];
> };
> long HYPERVISOR_multicall(multicall_entry_t call_list[],
> unsigned int nr_calls);
>
> Each entry represents a hypercall and its associated arguments in the
> (hopefully) obvious way.
>
> = Guest Specific Details
>
> == Linux paravirt_ops ==
>
> === General PV MMU operation ===
>
> The Linux ''paravirt_ops'' infrastructure provides a mechanism by
> which the low-level MMU operations are abstracted into function
> pointers allowing the native operations where necessary.
>
> From the point of view of MMU operations the main entry point is
> ''struct pv_mmu_ops''. This contains entry points for low level
> operations such as:
>
> * Allocating/freeing page table entries. These allow the kernel to
> mark the pages read-only and read-write as the pages are reused.
> * Creating, writing and reading PTE entries. These allow the kernel
> to make the necessary translations between pseudo-physical and
> machine addressing as well as using hypercalls instead of direct
> writes.
> * Reading and writing of control registers, e.g. cr3, to allow
> hypercalls to be inserted.
> * Various TLB flush operations, again to allow their replacement by
> hypercalls.
>
> As well as these the interface includes some higher-level operations
> which allow for more efficient batching of compound operations such as
> duplicating (forking) a memory map. This is achieved by using the
> ''lazy_mmu_ops'' hooks to implement buffering of operations
> and flushing of larger batches or upon completion.
>
> The Xen paravirt_ops backend uses an additional page flag,
> ''PG_pinned'' in order to track whether a page has been pinned or not
> and implemented the late-pin early-unpin scheme described above.
>
> === Start of Day issues ===
>
> XXX get someone to describe these...
>
> = References =
>
> [XXX Xen and the art of virtualisation.]
> [XXX The hypercall interface documentation.]
> [XXX others? Chisnal Book?]
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0 of 5] docs: x86 PV MMU related functions
2012-11-18 21:02 ` Pasi Kärkkäinen
@ 2012-11-19 10:29 ` Ian Campbell
0 siblings, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2012-11-19 10:29 UTC (permalink / raw)
To: Pasi Kärkkäinen; +Cc: xen-devel@lists.xen.org
On Sun, 2012-11-18 at 21:02 +0000, Pasi Kärkkäinen wrote:
> On Fri, Nov 16, 2012 at 04:55:13PM +0000, Ian Campbell wrote:
> > On Fri, 2012-11-02 at 11:18 +0000, Ian Campbell wrote:
> > >
> > > I also have a draft of a wiki article on the subject which references
> > > the information in the public headers which I hope to post soon.
> >
> > I realised I forgot to do this...
> >
> > It needs some polish but the majority of the XXX's are placeholder for
> > links to the result of this applying this series.
> >
>
> Hello,
>
> Comments about some small typos.
Thanks, I've incorporated these into my local copy.
(Obviously I didn't proof-read or even spellcheck this yet ;-))
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor
2012-11-02 11:18 ` [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor Ian Campbell
@ 2012-11-19 11:31 ` Ian Jackson
0 siblings, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2012-11-19 11:31 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Ian Campbell writes ("[Xen-devel] [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor"):
> docs: Document HYPERVISOR_update_descriptor
I have verified that this is a pure documentation patch, but not
checked it for accuracy. That seems like the right level of review,
so:
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall
2012-11-02 11:18 ` [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall Ian Campbell
@ 2012-11-19 11:31 ` Ian Jackson
0 siblings, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2012-11-19 11:31 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Ian Campbell writes ("[Xen-devel] [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall"):
> docs: Include prototype for HYPERVISOR_multicall
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain)
2012-11-02 13:07 ` Ian Campbell
@ 2012-11-19 11:31 ` Ian Jackson
0 siblings, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2012-11-19 11:31 UTC (permalink / raw)
To: Ian Campbell; +Cc: David Vrabel, xen-devel@lists.xen.org
Ian Campbell writes ("Re: [Xen-devel] [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain)"):
> docs: document HYPERVISOR_update_va_mapping(_other_domain)
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op
2012-11-02 11:18 ` [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op Ian Campbell
@ 2012-11-19 11:31 ` Ian Jackson
0 siblings, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2012-11-19 11:31 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Ian Campbell writes ("[Xen-devel] [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op"):
> docs: Document HYPERVISOR_mmuext_op
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout
2012-11-02 11:18 ` [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout Ian Campbell
@ 2012-11-19 11:32 ` Ian Jackson
0 siblings, 0 replies; 17+ messages in thread
From: Ian Jackson @ 2012-11-19 11:32 UTC (permalink / raw)
To: Ian Campbell; +Cc: xen-devel
Ian Campbell writes ("[Xen-devel] [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout"):
> docs: Add ToC entry for start of day memory layout.
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-11-19 11:32 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-02 11:18 [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
2012-11-02 11:18 ` [PATCH 1 of 5] docs: document HYPERVISOR_update_va_mapping(_other_domain) Ian Campbell
2012-11-02 11:33 ` David Vrabel
2012-11-02 13:07 ` Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 2 of 5] docs: Document HYPERVISOR_mmuext_op Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 3 of 5] docs: Add ToC entry for start of day memory layout Ian Campbell
2012-11-19 11:32 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 4 of 5] docs: Document HYPERVISOR_update_descriptor Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-02 11:18 ` [PATCH 5 of 5] docs: Include prototype for HYPERVISOR_multicall Ian Campbell
2012-11-19 11:31 ` Ian Jackson
2012-11-12 10:04 ` [PATCH 0 of 5] docs: x86 PV MMU related functions Ian Campbell
2012-11-16 16:55 ` Ian Campbell
2012-11-18 21:02 ` Pasi Kärkkäinen
2012-11-19 10:29 ` Ian Campbell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).