[PATCH 0/4] support guest virtual mapped p2m list

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] support guest virtual mapped p2m list
@ 2014-11-14  9:37 Juergen Gross
  2014-11-14  9:37 ` [PATCH 1/4] expand x86 arch_shared_info to support linear " Juergen Gross
                   ` (3 more replies)
  0 siblings, 4 replies; 25+ messages in thread
From: Juergen Gross @ 2014-11-14  9:37 UTC (permalink / raw)
  To: xen-devel, jbeulich, konrad.wilk, david.vrabel; +Cc: Juergen Gross

The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
currently contains the mfn of the top level page frame of the 3 level
p2m tree, which is used by the Xen tools during saving and restoring
(and live migration) of pv domains and for crash dump analysis. With
three levels of the p2m tree it is possible to support up to 512 GB of
RAM for a 64 bit pv domain.

A 32 bit pv domain can support more, as each memory page can hold 1024
instead of 512 entries, leading to a limit of 4 TB.

To be able to support more RAM on x86-64 switch to a virtual mapped
p2m list.

Juergen Gross (4):
  expand x86 arch_shared_info to support linear p2m list
  introduce arch_get_features()
  introduce boot parameter for setting XENFEAT_virtual_p2m
  document new boot parameter virt_p2m

 docs/misc/xen-command-line.markdown | 22 ++++++++++
 xen/arch/arm/domain.c               |  5 +++
 xen/arch/x86/domain.c               | 80 +++++++++++++++++++++++++++++++++++++
 xen/common/kernel.c                 | 22 +---------
 xen/include/public/arch-x86/xen.h   |  7 +++-
 xen/include/public/features.h       |  3 ++
 xen/include/xen/domain.h            |  2 +
 7 files changed, 120 insertions(+), 21 deletions(-)

-- 
2.1.2

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14  9:37 [PATCH 0/4] support guest virtual mapped p2m list Juergen Gross
@ 2014-11-14  9:37 ` Juergen Gross
  2014-11-14 11:41   ` Andrew Cooper
  2014-11-21 12:23   ` Jan Beulich
  2014-11-14  9:37 ` [PATCH 2/4] introduce arch_get_features() Juergen Gross
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 25+ messages in thread
From: Juergen Gross @ 2014-11-14  9:37 UTC (permalink / raw)
  To: xen-devel, jbeulich, konrad.wilk, david.vrabel; +Cc: Juergen Gross

The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
currently contains the mfn of the top level page frame of the 3 level
p2m tree, which is used by the Xen tools during saving and restoring
(and live migration) of pv domains and for crash dump analysis. With
three levels of the p2m tree it is possible to support up to 512 GB of
RAM for a 64 bit pv domain.

A 32 bit pv domain can support more, as each memory page can hold 1024
instead of 512 entries, leading to a limit of 4 TB.

To be able to support more RAM on x86-64 switch to a virtual mapped
p2m list.

This patch expands struct arch_shared_info with a new p2m list virtual
address and the mfn of the page table root. The new information is
indicated by the domain to be valid by storing ~0UL into
pfn_to_mfn_frame_list_list. The hypervisor indicates usability of this
feature by a new flag XENFEAT_virtual_p2m.
---
 xen/include/public/arch-x86/xen.h | 7 ++++++-
 xen/include/public/features.h     | 3 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index f35804b..b0f85a9 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -224,7 +224,12 @@ struct arch_shared_info {
     /* Frame containing list of mfns containing list of mfns containing p2m. */
     xen_pfn_t     pfn_to_mfn_frame_list_list;
     unsigned long nmi_reason;
-    uint64_t pad[32];
+    /*
+     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
+     * ~0UL.
+     */
+    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
+    unsigned long p2m_as_root;  /* mfn of the top level page table */
 };
 typedef struct arch_shared_info arch_shared_info_t;
 
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 16d92aa..ff0b82d 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -99,6 +99,9 @@
 #define XENFEAT_grant_map_identity        12
  */
 
+/* x86: guest may specify virtual address of p2m list */
+#define XENFEAT_virtual_p2m               13
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] introduce arch_get_features()
  2014-11-14  9:37 [PATCH 0/4] support guest virtual mapped p2m list Juergen Gross
  2014-11-14  9:37 ` [PATCH 1/4] expand x86 arch_shared_info to support linear " Juergen Gross
@ 2014-11-14  9:37 ` Juergen Gross
  2014-11-21 12:26   ` Jan Beulich
  2014-11-21 13:21   ` Julien Grall
  2014-11-14  9:37 ` [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m Juergen Gross
  2014-11-14  9:37 ` [PATCH 4/4] document new boot parameter virt_p2m Juergen Gross
  3 siblings, 2 replies; 25+ messages in thread
From: Juergen Gross @ 2014-11-14  9:37 UTC (permalink / raw)
  To: xen-devel, jbeulich, konrad.wilk, david.vrabel; +Cc: Juergen Gross

The XENVER_get_features sub command of the xen_version hypercall is
handled completely in common/kernel.c despite of some architecture
dependant parts.

Move the architecture dependant parts in an own function in
arch/*/domain.c

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/arm/domain.c    |  5 +++++
 xen/arch/x86/domain.c    | 30 ++++++++++++++++++++++++++++++
 xen/common/kernel.c      | 22 ++--------------------
 xen/include/xen/domain.h |  2 ++
 4 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 7221bc8..dc5a3fb 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -823,6 +823,11 @@ void vcpu_block_unless_event_pending(struct vcpu *v)
         vcpu_unblock(current);
 }
 
+uint32_t arch_get_features(struct domain *d, unsigned int submap_idx)
+{
+    return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index ae0a344..d98aabd 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2166,6 +2166,36 @@ static int __init init_vcpu_kick_softirq(void)
 }
 __initcall(init_vcpu_kick_softirq);
 
+uint32_t arch_get_features(struct domain *d, unsigned int submap_idx)
+{
+    uint32_t submap = 0;
+
+    switch ( submap_idx )
+    {
+    case 0:
+        switch ( d->guest_type )
+        {
+        case guest_type_pv:
+            submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
+                      (1U << XENFEAT_highmem_assist) |
+                      (1U << XENFEAT_gnttab_map_avail_bits);
+            break;
+        case guest_type_pvh:
+            submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                      (1U << XENFEAT_supervisor_mode_kernel) |
+                      (1U << XENFEAT_hvm_callback_vector);
+            break;
+        case guest_type_hvm:
+            submap |= (1U << XENFEAT_hvm_safe_pvclock) |
+                      (1U << XENFEAT_hvm_callback_vector) |
+                      (1U << XENFEAT_hvm_pirqs);
+            break;
+        }
+        break;
+    }
+
+    return submap;
+}
 
 /*
  * Local variables:
diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index d23c422..d22a860 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -312,31 +312,13 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
                 fi.submap |= 1U << XENFEAT_supervisor_mode_kernel;
             if ( is_hardware_domain(current->domain) )
                 fi.submap |= 1U << XENFEAT_dom0;
-#ifdef CONFIG_X86
-            switch ( d->guest_type )
-            {
-            case guest_type_pv:
-                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
-                             (1U << XENFEAT_highmem_assist) |
-                             (1U << XENFEAT_gnttab_map_avail_bits);
-                break;
-            case guest_type_pvh:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_supervisor_mode_kernel) |
-                             (1U << XENFEAT_hvm_callback_vector);
-                break;
-            case guest_type_hvm:
-                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-                             (1U << XENFEAT_hvm_callback_vector) |
-                             (1U << XENFEAT_hvm_pirqs);
-                break;
-            }
-#endif
             break;
         default:
             return -EINVAL;
         }
 
+        fi.submap |= arch_get_features(d, fi.submap_idx);
+
         if ( copy_to_guest(arg, &fi, 1) )
             return -EFAULT;
         return 0;
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 9215b0e..0d12dc0 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -80,6 +80,8 @@ extern spinlock_t vcpu_alloc_lock;
 bool_t domctl_lock_acquire(void);
 void domctl_lock_release(void);
 
+uint32_t arch_get_features(struct domain *d, unsigned int submap_idx);
+
 /*
  * Continue the current hypercall via func(data) on specified cpu.
  * If this function returns 0 then the function is guaranteed to run at some
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m
  2014-11-14  9:37 [PATCH 0/4] support guest virtual mapped p2m list Juergen Gross
  2014-11-14  9:37 ` [PATCH 1/4] expand x86 arch_shared_info to support linear " Juergen Gross
  2014-11-14  9:37 ` [PATCH 2/4] introduce arch_get_features() Juergen Gross
@ 2014-11-14  9:37 ` Juergen Gross
  2014-11-19 21:04   ` Konrad Rzeszutek Wilk
  2014-11-14  9:37 ` [PATCH 4/4] document new boot parameter virt_p2m Juergen Gross
  3 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2014-11-14  9:37 UTC (permalink / raw)
  To: xen-devel, jbeulich, konrad.wilk, david.vrabel; +Cc: Juergen Gross

Introduce a new boot parameter "virt_p2m" to be able to set
XENFEAT_virtual_p2m for a pv domain.

As long as Xen tools and kdump don't support this new feature it is
turned off by default.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/domain.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index d98aabd..ccb54f6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2166,8 +2166,44 @@ static int __init init_vcpu_kick_softirq(void)
 }
 __initcall(init_vcpu_kick_softirq);
 
+#define VIRT_P2M_DOM0        0x01
+#define VIRT_P2M_DOM0_LARGE  0x02
+#define VIRT_P2M_DOMU        0x04
+#define VIRT_P2M_DOMU_LARGE  0x08
+static unsigned virt_p2m = 0;
+
+static void __init parse_virt_p2m(const char *s)
+{
+    char *ss;
+    int b;
+
+    do {
+        ss = strchr(s, ',');
+        if ( ss )
+            *ss = '\0';
+
+        b = parse_bool(s);
+        if ( b == 0 )
+            virt_p2m = 0;
+        else if ( b == 1 )
+            virt_p2m = VIRT_P2M_DOM0 | VIRT_P2M_DOMU;
+        else if ( !strcmp(s, "dom0") )
+            virt_p2m |= VIRT_P2M_DOM0;
+        else if ( !strcmp(s, "dom0_large") )
+            virt_p2m |= VIRT_P2M_DOM0_LARGE;
+        else if ( !strcmp(s, "domu") )
+            virt_p2m |= VIRT_P2M_DOMU;
+        else if ( !strcmp(s, "domu_large") )
+            virt_p2m |= VIRT_P2M_DOMU_LARGE;
+
+        s = ss + 1;
+    } while ( ss );
+}
+custom_param("virt_p2m", parse_virt_p2m);
+
 uint32_t arch_get_features(struct domain *d, unsigned int submap_idx)
 {
+#define DOM_IS_LARGE(d) ((d)->max_pages > 1U << 27)
     uint32_t submap = 0;
 
     switch ( submap_idx )
@@ -2179,6 +2215,20 @@ uint32_t arch_get_features(struct domain *d, unsigned int submap_idx)
             submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
                       (1U << XENFEAT_highmem_assist) |
                       (1U << XENFEAT_gnttab_map_avail_bits);
+            if ( is_hardware_domain(d) )
+            {
+                if ( virt_p2m & VIRT_P2M_DOM0 )
+                    submap |= 1U << XENFEAT_virtual_p2m;
+                if ( DOM_IS_LARGE(d) && virt_p2m & VIRT_P2M_DOM0_LARGE )
+                    submap |= 1U << XENFEAT_virtual_p2m;
+            }
+            else
+            {
+                if ( virt_p2m & VIRT_P2M_DOMU )
+                    submap |= 1U << XENFEAT_virtual_p2m;
+                if ( DOM_IS_LARGE(d) && virt_p2m & VIRT_P2M_DOMU_LARGE )
+                    submap |= 1U << XENFEAT_virtual_p2m;
+            }
             break;
         case guest_type_pvh:
             submap |= (1U << XENFEAT_hvm_safe_pvclock) |
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] document new boot parameter virt_p2m
  2014-11-14  9:37 [PATCH 0/4] support guest virtual mapped p2m list Juergen Gross
                   ` (2 preceding siblings ...)
  2014-11-14  9:37 ` [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m Juergen Gross
@ 2014-11-14  9:37 ` Juergen Gross
  3 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2014-11-14  9:37 UTC (permalink / raw)
  To: xen-devel, jbeulich, konrad.wilk, david.vrabel; +Cc: Juergen Gross

Add documentation for the new boot parameter "virt_p2m".

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 docs/misc/xen-command-line.markdown | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
index 0830e5f..c56273d 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1272,6 +1272,28 @@ The optional `keep` parameter causes Xen to continue using the vga
 console even after dom0 has been started.  The default behaviour is to
 relinquish control to dom0.
 
+### virt\_p2m
+> `= List of [ <boolean> | dom0 | dom0\_large | domu | domu\_large ]`
+
+> Default: `false`
+
+Allow pv-domains to specify a virtual address for the domain's p2m list which
+is used by the Xen tools during domain save and restore and by kdump.
+
+`dom0` enables this feature for Dom0.
+
+`dom0\_large` enables this feature for Dom0 with more than 512 GiB of RAM
+(the traditional 3 level p2m tree can't map more than that).
+
+`domu` enables this feature for all pv domains but Dom0.
+
+`domu\_large` enables this feature for all pv domains with more than 512 GiB
+of RAM but Dom0.
+
+`true` enables this feature for all pv domains.
+
+`false` disables this feature for all domains.
+
 ### vpid (Intel)
 > `= <boolean>`
 
-- 
2.1.2

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14  9:37 ` [PATCH 1/4] expand x86 arch_shared_info to support linear " Juergen Gross
@ 2014-11-14 11:41   ` Andrew Cooper
  2014-11-14 12:53     ` Juergen Gross
  2014-11-21 12:23   ` Jan Beulich
  1 sibling, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-14 11:41 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 14/11/14 09:37, Juergen Gross wrote:
> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
> currently contains the mfn of the top level page frame of the 3 level
> p2m tree, which is used by the Xen tools during saving and restoring
> (and live migration) of pv domains and for crash dump analysis. With
> three levels of the p2m tree it is possible to support up to 512 GB of
> RAM for a 64 bit pv domain.
>
> A 32 bit pv domain can support more, as each memory page can hold 1024
> instead of 512 entries, leading to a limit of 4 TB.
>
> To be able to support more RAM on x86-64 switch to a virtual mapped
> p2m list.
>
> This patch expands struct arch_shared_info with a new p2m list virtual
> address and the mfn of the page table root. The new information is
> indicated by the domain to be valid by storing ~0UL into
> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of this
> feature by a new flag XENFEAT_virtual_p2m.

How do you envisage this being used?  Are you expecting the tools to do
manual pagetable walks using xc_map_foreign_xxx() ?

~Andrew

> ---
>  xen/include/public/arch-x86/xen.h | 7 ++++++-
>  xen/include/public/features.h     | 3 +++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
> index f35804b..b0f85a9 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -224,7 +224,12 @@ struct arch_shared_info {
>      /* Frame containing list of mfns containing list of mfns containing p2m. */
>      xen_pfn_t     pfn_to_mfn_frame_list_list;
>      unsigned long nmi_reason;
> -    uint64_t pad[32];
> +    /*
> +     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
> +     * ~0UL.
> +     */
> +    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
> +    unsigned long p2m_as_root;  /* mfn of the top level page table */
>  };
>  typedef struct arch_shared_info arch_shared_info_t;
>  
> diff --git a/xen/include/public/features.h b/xen/include/public/features.h
> index 16d92aa..ff0b82d 100644
> --- a/xen/include/public/features.h
> +++ b/xen/include/public/features.h
> @@ -99,6 +99,9 @@
>  #define XENFEAT_grant_map_identity        12
>   */
>  
> +/* x86: guest may specify virtual address of p2m list */
> +#define XENFEAT_virtual_p2m               13
> +
>  #define XENFEAT_NR_SUBMAPS 1
>  
>  #endif /* __XEN_PUBLIC_FEATURES_H__ */

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 11:41   ` Andrew Cooper
@ 2014-11-14 12:53     ` Juergen Gross
  2014-11-14 13:56       ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2014-11-14 12:53 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 11/14/2014 12:41 PM, Andrew Cooper wrote:
> On 14/11/14 09:37, Juergen Gross wrote:
>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>> currently contains the mfn of the top level page frame of the 3 level
>> p2m tree, which is used by the Xen tools during saving and restoring
>> (and live migration) of pv domains and for crash dump analysis. With
>> three levels of the p2m tree it is possible to support up to 512 GB of
>> RAM for a 64 bit pv domain.
>>
>> A 32 bit pv domain can support more, as each memory page can hold 1024
>> instead of 512 entries, leading to a limit of 4 TB.
>>
>> To be able to support more RAM on x86-64 switch to a virtual mapped
>> p2m list.
>>
>> This patch expands struct arch_shared_info with a new p2m list virtual
>> address and the mfn of the page table root. The new information is
>> indicated by the domain to be valid by storing ~0UL into
>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of this
>> feature by a new flag XENFEAT_virtual_p2m.
>
> How do you envisage this being used?  Are you expecting the tools to do
> manual pagetable walks using xc_map_foreign_xxx() ?

Yes. Not very different compared to today's mapping via the 3 level
p2m tree. Just another entry format, 4 instead of 3 levels and starting
at an offset.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 12:53     ` Juergen Gross
@ 2014-11-14 13:56       ` Andrew Cooper
  2014-11-14 14:14         ` Jürgen Groß
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-14 13:56 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 14/11/14 12:53, Juergen Gross wrote:
> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>> On 14/11/14 09:37, Juergen Gross wrote:
>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>> currently contains the mfn of the top level page frame of the 3 level
>>> p2m tree, which is used by the Xen tools during saving and restoring
>>> (and live migration) of pv domains and for crash dump analysis. With
>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>> RAM for a 64 bit pv domain.
>>>
>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>> instead of 512 entries, leading to a limit of 4 TB.
>>>
>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>> p2m list.
>>>
>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>> address and the mfn of the page table root. The new information is
>>> indicated by the domain to be valid by storing ~0UL into
>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of this
>>> feature by a new flag XENFEAT_virtual_p2m.
>>
>> How do you envisage this being used?  Are you expecting the tools to do
>> manual pagetable walks using xc_map_foreign_xxx() ?
>
> Yes. Not very different compared to today's mapping via the 3 level
> p2m tree. Just another entry format, 4 instead of 3 levels and starting
> at an offset.

Yes - David and I were discussing this over lunch, and it is not
actually very different.

In reality, how likely is it that the pages backing this virtual linear
array change?

One issue currently is that, during the live part of migration, the
toolstack has no way of working out whether the structure of the p2m has
changed (intermediate leaves rearranged, or the length increasing).

In the case that the VM does change the structure of the p2m under the
feet of the toolstack, migration will either blow up in a non-subtle way
with a p2m/m2p mismatch, or in a subtle way with the receiving side
copying the new p2m over the wrong part of the new domain.

I am wondering whether, with this new p2m method, we can take sufficient
steps to be able to guarantee mishaps like this can't occur.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 13:56       ` Andrew Cooper
@ 2014-11-14 14:14         ` Jürgen Groß
  2014-11-14 14:59           ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Jürgen Groß @ 2014-11-14 14:14 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 11/14/2014 02:56 PM, Andrew Cooper wrote:
> On 14/11/14 12:53, Juergen Gross wrote:
>> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>>> On 14/11/14 09:37, Juergen Gross wrote:
>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>> currently contains the mfn of the top level page frame of the 3 level
>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>>> RAM for a 64 bit pv domain.
>>>>
>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>
>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>> p2m list.
>>>>
>>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>>> address and the mfn of the page table root. The new information is
>>>> indicated by the domain to be valid by storing ~0UL into
>>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of this
>>>> feature by a new flag XENFEAT_virtual_p2m.
>>>
>>> How do you envisage this being used?  Are you expecting the tools to do
>>> manual pagetable walks using xc_map_foreign_xxx() ?
>>
>> Yes. Not very different compared to today's mapping via the 3 level
>> p2m tree. Just another entry format, 4 instead of 3 levels and starting
>> at an offset.
>
> Yes - David and I were discussing this over lunch, and it is not
> actually very different.
>
> In reality, how likely is it that the pages backing this virtual linear
> array change?

Very unlikely, I think. But not impossible.

> One issue currently is that, during the live part of migration, the
> toolstack has no way of working out whether the structure of the p2m has
> changed (intermediate leaves rearranged, or the length increasing).
>
> In the case that the VM does change the structure of the p2m under the
> feet of the toolstack, migration will either blow up in a non-subtle way
> with a p2m/m2p mismatch, or in a subtle way with the receiving side
> copying the new p2m over the wrong part of the new domain.
>
> I am wondering whether, with this new p2m method, we can take sufficient
> steps to be able to guarantee mishaps like this can't occur.

This should be easy: I could add a counter in arch_shared_info which is
incremented whenever a p2m mapping is being changed. The toolstack could
compare the counter values before start and at end of migration and redo
the migration (or fail) if they are different. In order to avoid races
I would have to increment the counter before and after changing the
mapping.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 14:14         ` Jürgen Groß
@ 2014-11-14 14:59           ` Andrew Cooper
  2014-11-14 15:32             ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-14 14:59 UTC (permalink / raw)
  To: Jürgen Groß, xen-devel, jbeulich, konrad.wilk,
	david.vrabel

On 14/11/14 14:14, Jürgen Groß wrote:
> On 11/14/2014 02:56 PM, Andrew Cooper wrote:
>> On 14/11/14 12:53, Juergen Gross wrote:
>>> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>>>> On 14/11/14 09:37, Juergen Gross wrote:
>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>>> three levels of the p2m tree it is possible to support up to 512
>>>>> GB of
>>>>> RAM for a 64 bit pv domain.
>>>>>
>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>> 1024
>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>
>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>> p2m list.
>>>>>
>>>>> This patch expands struct arch_shared_info with a new p2m list
>>>>> virtual
>>>>> address and the mfn of the page table root. The new information is
>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of
>>>>> this
>>>>> feature by a new flag XENFEAT_virtual_p2m.
>>>>
>>>> How do you envisage this being used?  Are you expecting the tools
>>>> to do
>>>> manual pagetable walks using xc_map_foreign_xxx() ?
>>>
>>> Yes. Not very different compared to today's mapping via the 3 level
>>> p2m tree. Just another entry format, 4 instead of 3 levels and starting
>>> at an offset.
>>
>> Yes - David and I were discussing this over lunch, and it is not
>> actually very different.
>>
>> In reality, how likely is it that the pages backing this virtual linear
>> array change?
>
> Very unlikely, I think. But not impossible.
>
>> One issue currently is that, during the live part of migration, the
>> toolstack has no way of working out whether the structure of the p2m has
>> changed (intermediate leaves rearranged, or the length increasing).
>>
>> In the case that the VM does change the structure of the p2m under the
>> feet of the toolstack, migration will either blow up in a non-subtle way
>> with a p2m/m2p mismatch, or in a subtle way with the receiving side
>> copying the new p2m over the wrong part of the new domain.
>>
>> I am wondering whether, with this new p2m method, we can take sufficient
>> steps to be able to guarantee mishaps like this can't occur.
>
> This should be easy: I could add a counter in arch_shared_info which is
> incremented whenever a p2m mapping is being changed. The toolstack could
> compare the counter values before start and at end of migration and redo
> the migration (or fail) if they are different. In order to avoid races
> I would have to increment the counter before and after changing the
> mapping.
>

That is insufficient I believe.

Consider:

* Toolstack walks pagetables and maps the frames containing the linear p2m
* Live migration starts
* VM remaps a frame in the middle of the linear p2m
* Live migration continues, but the toolstack has a stale frame in the
middle of its view of the p2m.

As the p2m is almost never expected to change, I think it might be
better to have a flag the toolstack can set to say "The toolstack is
peeking at your p2m behind your back - you must not change its structure."

Having just thought this through, I think there is also a race condition
between a VM changing an entry in the p2m, and the toolstack doing
verifications of frames being sent.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 14:59           ` Andrew Cooper
@ 2014-11-14 15:32             ` Juergen Gross
  2014-11-14 16:08               ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2014-11-14 15:32 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 11/14/2014 03:59 PM, Andrew Cooper wrote:
> On 14/11/14 14:14, Jürgen Groß wrote:
>> On 11/14/2014 02:56 PM, Andrew Cooper wrote:
>>> On 14/11/14 12:53, Juergen Gross wrote:
>>>> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>>>>> On 14/11/14 09:37, Juergen Gross wrote:
>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>>>> three levels of the p2m tree it is possible to support up to 512
>>>>>> GB of
>>>>>> RAM for a 64 bit pv domain.
>>>>>>
>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>> 1024
>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>
>>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>>> p2m list.
>>>>>>
>>>>>> This patch expands struct arch_shared_info with a new p2m list
>>>>>> virtual
>>>>>> address and the mfn of the page table root. The new information is
>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of
>>>>>> this
>>>>>> feature by a new flag XENFEAT_virtual_p2m.
>>>>>
>>>>> How do you envisage this being used?  Are you expecting the tools
>>>>> to do
>>>>> manual pagetable walks using xc_map_foreign_xxx() ?
>>>>
>>>> Yes. Not very different compared to today's mapping via the 3 level
>>>> p2m tree. Just another entry format, 4 instead of 3 levels and starting
>>>> at an offset.
>>>
>>> Yes - David and I were discussing this over lunch, and it is not
>>> actually very different.
>>>
>>> In reality, how likely is it that the pages backing this virtual linear
>>> array change?
>>
>> Very unlikely, I think. But not impossible.
>>
>>> One issue currently is that, during the live part of migration, the
>>> toolstack has no way of working out whether the structure of the p2m has
>>> changed (intermediate leaves rearranged, or the length increasing).
>>>
>>> In the case that the VM does change the structure of the p2m under the
>>> feet of the toolstack, migration will either blow up in a non-subtle way
>>> with a p2m/m2p mismatch, or in a subtle way with the receiving side
>>> copying the new p2m over the wrong part of the new domain.
>>>
>>> I am wondering whether, with this new p2m method, we can take sufficient
>>> steps to be able to guarantee mishaps like this can't occur.
>>
>> This should be easy: I could add a counter in arch_shared_info which is
>> incremented whenever a p2m mapping is being changed. The toolstack could
>> compare the counter values before start and at end of migration and redo
>> the migration (or fail) if they are different. In order to avoid races
>> I would have to increment the counter before and after changing the
>> mapping.
>>
>
> That is insufficient I believe.
>
> Consider:
>
> * Toolstack walks pagetables and maps the frames containing the linear p2m
> * Live migration starts
> * VM remaps a frame in the middle of the linear p2m
> * Live migration continues, but the toolstack has a stale frame in the
> middle of its view of the p2m.

This would be covered by my suggestion. At the end of the memory
transfer (with some bogus contents) the toolstack would discover the
change of the p2m structure and either fail the migration or start it
from the beginning and thus overwriting the bogus frames.

> As the p2m is almost never expected to change, I think it might be
> better to have a flag the toolstack can set to say "The toolstack is
> peeking at your p2m behind your back - you must not change its structure."

Be careful here: changes of the structure can be due to two scenarios:
- ballooning (invalid entries being populated): this is no problem, as
   we can stop the ballooning during live migration.
- mapping of grant pages e.g. in a stub domain (first map in an area
   former marked as invalid): you can't stop this, as the stub domain
   has to do some work. Here a restart of the migration should work, as
   the p2m structure change can only happen once for each affected p2m
   page.

> Having just thought this through, I think there is also a race condition
> between a VM changing an entry in the p2m, and the toolstack doing
> verifications of frames being sent.

Okay, so the flag you mentioned should just prohibit changes in the
p2m list related to memory frames of the affected domain: ballooning
up or down, or rearranging the memory layout (does this happen today?).
Mapping and unmapping of grant pages should be still allowed.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 15:32             ` Juergen Gross
@ 2014-11-14 16:08               ` Andrew Cooper
  2014-11-18  5:33                 ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-14 16:08 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 14/11/14 15:32, Juergen Gross wrote:
> On 11/14/2014 03:59 PM, Andrew Cooper wrote:
>> On 14/11/14 14:14, Jürgen Groß wrote:
>>> On 11/14/2014 02:56 PM, Andrew Cooper wrote:
>>>> On 14/11/14 12:53, Juergen Gross wrote:
>>>>> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>>>>>> On 14/11/14 09:37, Juergen Gross wrote:
>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>> level
>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>> restoring
>>>>>>> (and live migration) of pv domains and for crash dump analysis.
>>>>>>> With
>>>>>>> three levels of the p2m tree it is possible to support up to 512
>>>>>>> GB of
>>>>>>> RAM for a 64 bit pv domain.
>>>>>>>
>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>> 1024
>>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>>
>>>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>>>> p2m list.
>>>>>>>
>>>>>>> This patch expands struct arch_shared_info with a new p2m list
>>>>>>> virtual
>>>>>>> address and the mfn of the page table root. The new information is
>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of
>>>>>>> this
>>>>>>> feature by a new flag XENFEAT_virtual_p2m.
>>>>>>
>>>>>> How do you envisage this being used?  Are you expecting the tools
>>>>>> to do
>>>>>> manual pagetable walks using xc_map_foreign_xxx() ?
>>>>>
>>>>> Yes. Not very different compared to today's mapping via the 3 level
>>>>> p2m tree. Just another entry format, 4 instead of 3 levels and
>>>>> starting
>>>>> at an offset.
>>>>
>>>> Yes - David and I were discussing this over lunch, and it is not
>>>> actually very different.
>>>>
>>>> In reality, how likely is it that the pages backing this virtual
>>>> linear
>>>> array change?
>>>
>>> Very unlikely, I think. But not impossible.
>>>
>>>> One issue currently is that, during the live part of migration, the
>>>> toolstack has no way of working out whether the structure of the
>>>> p2m has
>>>> changed (intermediate leaves rearranged, or the length increasing).
>>>>
>>>> In the case that the VM does change the structure of the p2m under the
>>>> feet of the toolstack, migration will either blow up in a
>>>> non-subtle way
>>>> with a p2m/m2p mismatch, or in a subtle way with the receiving side
>>>> copying the new p2m over the wrong part of the new domain.
>>>>
>>>> I am wondering whether, with this new p2m method, we can take
>>>> sufficient
>>>> steps to be able to guarantee mishaps like this can't occur.
>>>
>>> This should be easy: I could add a counter in arch_shared_info which is
>>> incremented whenever a p2m mapping is being changed. The toolstack
>>> could
>>> compare the counter values before start and at end of migration and
>>> redo
>>> the migration (or fail) if they are different. In order to avoid races
>>> I would have to increment the counter before and after changing the
>>> mapping.
>>>
>>
>> That is insufficient I believe.
>>
>> Consider:
>>
>> * Toolstack walks pagetables and maps the frames containing the
>> linear p2m
>> * Live migration starts
>> * VM remaps a frame in the middle of the linear p2m
>> * Live migration continues, but the toolstack has a stale frame in the
>> middle of its view of the p2m.
>
> This would be covered by my suggestion. At the end of the memory
> transfer (with some bogus contents) the toolstack would discover the
> change of the p2m structure and either fail the migration or start it
> from the beginning and thus overwriting the bogus frames.

Checking after pause is too late.  The content of the p2m is used verify
each frame being sent on the wire, so is in active use for the entire
duration of live migration.

If the toolstack starts verifying frames being sent using information
from a stale p2m, the best that can be hoped for is that the toolstack
declares that the p2m and m2p are inconsistent and abort the migrate.

>
>> As the p2m is almost never expected to change, I think it might be
>> better to have a flag the toolstack can set to say "The toolstack is
>> peeking at your p2m behind your back - you must not change its
>> structure."
>
> Be careful here: changes of the structure can be due to two scenarios:
> - ballooning (invalid entries being populated): this is no problem, as
>   we can stop the ballooning during live migration.
> - mapping of grant pages e.g. in a stub domain (first map in an area
>   former marked as invalid): you can't stop this, as the stub domain
>   has to do some work. Here a restart of the migration should work, as
>   the p2m structure change can only happen once for each affected p2m
>   page.

Migration is not at all possible with a domain referencing foreign frames.

The live part can cope with foreign frames referenced in the ptes.  As
part of the pause handling in the VM, the frontends must unmap any
grants they have.  After pause, any remaining foreign frames cause a
migration failure.

>
>> Having just thought this through, I think there is also a race condition
>> between a VM changing an entry in the p2m, and the toolstack doing
>> verifications of frames being sent.
>
> Okay, so the flag you mentioned should just prohibit changes in the
> p2m list related to memory frames of the affected domain: ballooning
> up or down, or rearranging the memory layout (does this happen today?).
> Mapping and unmapping of grant pages should be still allowed.

HVM guests doesn't have any of their p2m updates represented in the
logdirty bitmap, so ballooning an HVM guest during migrate leads to
unexpected holes or lack of holes on the resuming side, leading to a
very confused balloon driver.

At the time I had not found a problem with PV guests, but it is now
clear that there is a period of time when a guest is altering its p2m
where the p2m and m2p are out of sync, which will cause a migration
failure if the toolstack observes this artefact.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14 16:08               ` Andrew Cooper
@ 2014-11-18  5:33                 ` Juergen Gross
  2014-11-18 10:51                   ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2014-11-18  5:33 UTC (permalink / raw)
  To: Andrew Cooper, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 11/14/2014 05:08 PM, Andrew Cooper wrote:
> On 14/11/14 15:32, Juergen Gross wrote:
>> On 11/14/2014 03:59 PM, Andrew Cooper wrote:
>>> On 14/11/14 14:14, Jürgen Groß wrote:
>>>> On 11/14/2014 02:56 PM, Andrew Cooper wrote:
>>>>> On 14/11/14 12:53, Juergen Gross wrote:
>>>>>> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>>>>>>> On 14/11/14 09:37, Juergen Gross wrote:
>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>>> level
>>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>>> restoring
>>>>>>>> (and live migration) of pv domains and for crash dump analysis.
>>>>>>>> With
>>>>>>>> three levels of the p2m tree it is possible to support up to 512
>>>>>>>> GB of
>>>>>>>> RAM for a 64 bit pv domain.
>>>>>>>>
>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>>> 1024
>>>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>>>
>>>>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>>>>> p2m list.
>>>>>>>>
>>>>>>>> This patch expands struct arch_shared_info with a new p2m list
>>>>>>>> virtual
>>>>>>>> address and the mfn of the page table root. The new information is
>>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of
>>>>>>>> this
>>>>>>>> feature by a new flag XENFEAT_virtual_p2m.
>>>>>>>
>>>>>>> How do you envisage this being used?  Are you expecting the tools
>>>>>>> to do
>>>>>>> manual pagetable walks using xc_map_foreign_xxx() ?
>>>>>>
>>>>>> Yes. Not very different compared to today's mapping via the 3 level
>>>>>> p2m tree. Just another entry format, 4 instead of 3 levels and
>>>>>> starting
>>>>>> at an offset.
>>>>>
>>>>> Yes - David and I were discussing this over lunch, and it is not
>>>>> actually very different.
>>>>>
>>>>> In reality, how likely is it that the pages backing this virtual
>>>>> linear
>>>>> array change?
>>>>
>>>> Very unlikely, I think. But not impossible.
>>>>
>>>>> One issue currently is that, during the live part of migration, the
>>>>> toolstack has no way of working out whether the structure of the
>>>>> p2m has
>>>>> changed (intermediate leaves rearranged, or the length increasing).
>>>>>
>>>>> In the case that the VM does change the structure of the p2m under the
>>>>> feet of the toolstack, migration will either blow up in a
>>>>> non-subtle way
>>>>> with a p2m/m2p mismatch, or in a subtle way with the receiving side
>>>>> copying the new p2m over the wrong part of the new domain.
>>>>>
>>>>> I am wondering whether, with this new p2m method, we can take
>>>>> sufficient
>>>>> steps to be able to guarantee mishaps like this can't occur.
>>>>
>>>> This should be easy: I could add a counter in arch_shared_info which is
>>>> incremented whenever a p2m mapping is being changed. The toolstack
>>>> could
>>>> compare the counter values before start and at end of migration and
>>>> redo
>>>> the migration (or fail) if they are different. In order to avoid races
>>>> I would have to increment the counter before and after changing the
>>>> mapping.
>>>>
>>>
>>> That is insufficient I believe.
>>>
>>> Consider:
>>>
>>> * Toolstack walks pagetables and maps the frames containing the
>>> linear p2m
>>> * Live migration starts
>>> * VM remaps a frame in the middle of the linear p2m
>>> * Live migration continues, but the toolstack has a stale frame in the
>>> middle of its view of the p2m.
>>
>> This would be covered by my suggestion. At the end of the memory
>> transfer (with some bogus contents) the toolstack would discover the
>> change of the p2m structure and either fail the migration or start it
>> from the beginning and thus overwriting the bogus frames.
>
> Checking after pause is too late.  The content of the p2m is used verify
> each frame being sent on the wire, so is in active use for the entire
> duration of live migration.
>
> If the toolstack starts verifying frames being sent using information
> from a stale p2m, the best that can be hoped for is that the toolstack
> declares that the p2m and m2p are inconsistent and abort the migrate.
>
>>
>>> As the p2m is almost never expected to change, I think it might be
>>> better to have a flag the toolstack can set to say "The toolstack is
>>> peeking at your p2m behind your back - you must not change its
>>> structure."
>>
>> Be careful here: changes of the structure can be due to two scenarios:
>> - ballooning (invalid entries being populated): this is no problem, as
>>    we can stop the ballooning during live migration.
>> - mapping of grant pages e.g. in a stub domain (first map in an area
>>    former marked as invalid): you can't stop this, as the stub domain
>>    has to do some work. Here a restart of the migration should work, as
>>    the p2m structure change can only happen once for each affected p2m
>>    page.
>
> Migration is not at all possible with a domain referencing foreign frames.
>
> The live part can cope with foreign frames referenced in the ptes.  As
> part of the pause handling in the VM, the frontends must unmap any
> grants they have.  After pause, any remaining foreign frames cause a
> migration failure.
>
>>
>>> Having just thought this through, I think there is also a race condition
>>> between a VM changing an entry in the p2m, and the toolstack doing
>>> verifications of frames being sent.
>>
>> Okay, so the flag you mentioned should just prohibit changes in the
>> p2m list related to memory frames of the affected domain: ballooning
>> up or down, or rearranging the memory layout (does this happen today?).
>> Mapping and unmapping of grant pages should be still allowed.
>
> HVM guests doesn't have any of their p2m updates represented in the
> logdirty bitmap, so ballooning an HVM guest during migrate leads to
> unexpected holes or lack of holes on the resuming side, leading to a
> very confused balloon driver.
>
> At the time I had not found a problem with PV guests, but it is now
> clear that there is a period of time when a guest is altering its p2m
> where the p2m and m2p are out of sync, which will cause a migration
> failure if the toolstack observes this artefact.

So ballooning should be disabled during migration. I think this should
be handled via callbacks triggered by xenstore: one at start of
migration to stop ballooning and one at end to restart it. I wouldn't
want to tie this functionality to the p2m list structure, as it is
not related to it.

Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-18  5:33                 ` Juergen Gross
@ 2014-11-18 10:51                   ` Andrew Cooper
  2014-11-18 10:56                     ` David Vrabel
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-18 10:51 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, jbeulich, konrad.wilk, david.vrabel

On 18/11/14 05:33, Juergen Gross wrote:
> On 11/14/2014 05:08 PM, Andrew Cooper wrote:
>> On 14/11/14 15:32, Juergen Gross wrote:
>>> On 11/14/2014 03:59 PM, Andrew Cooper wrote:
>>>> On 14/11/14 14:14, Jürgen Groß wrote:
>>>>> On 11/14/2014 02:56 PM, Andrew Cooper wrote:
>>>>>> On 14/11/14 12:53, Juergen Gross wrote:
>>>>>>> On 11/14/2014 12:41 PM, Andrew Cooper wrote:
>>>>>>>> On 14/11/14 09:37, Juergen Gross wrote:
>>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>>>> level
>>>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>>>> restoring
>>>>>>>>> (and live migration) of pv domains and for crash dump analysis.
>>>>>>>>> With
>>>>>>>>> three levels of the p2m tree it is possible to support up to 512
>>>>>>>>> GB of
>>>>>>>>> RAM for a 64 bit pv domain.
>>>>>>>>>
>>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>>>> 1024
>>>>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>>>>
>>>>>>>>> To be able to support more RAM on x86-64 switch to a virtual
>>>>>>>>> mapped
>>>>>>>>> p2m list.
>>>>>>>>>
>>>>>>>>> This patch expands struct arch_shared_info with a new p2m list
>>>>>>>>> virtual
>>>>>>>>> address and the mfn of the page table root. The new
>>>>>>>>> information is
>>>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>>>> pfn_to_mfn_frame_list_list. The hypervisor indicates usability of
>>>>>>>>> this
>>>>>>>>> feature by a new flag XENFEAT_virtual_p2m.
>>>>>>>>
>>>>>>>> How do you envisage this being used?  Are you expecting the tools
>>>>>>>> to do
>>>>>>>> manual pagetable walks using xc_map_foreign_xxx() ?
>>>>>>>
>>>>>>> Yes. Not very different compared to today's mapping via the 3 level
>>>>>>> p2m tree. Just another entry format, 4 instead of 3 levels and
>>>>>>> starting
>>>>>>> at an offset.
>>>>>>
>>>>>> Yes - David and I were discussing this over lunch, and it is not
>>>>>> actually very different.
>>>>>>
>>>>>> In reality, how likely is it that the pages backing this virtual
>>>>>> linear
>>>>>> array change?
>>>>>
>>>>> Very unlikely, I think. But not impossible.
>>>>>
>>>>>> One issue currently is that, during the live part of migration, the
>>>>>> toolstack has no way of working out whether the structure of the
>>>>>> p2m has
>>>>>> changed (intermediate leaves rearranged, or the length increasing).
>>>>>>
>>>>>> In the case that the VM does change the structure of the p2m
>>>>>> under the
>>>>>> feet of the toolstack, migration will either blow up in a
>>>>>> non-subtle way
>>>>>> with a p2m/m2p mismatch, or in a subtle way with the receiving side
>>>>>> copying the new p2m over the wrong part of the new domain.
>>>>>>
>>>>>> I am wondering whether, with this new p2m method, we can take
>>>>>> sufficient
>>>>>> steps to be able to guarantee mishaps like this can't occur.
>>>>>
>>>>> This should be easy: I could add a counter in arch_shared_info
>>>>> which is
>>>>> incremented whenever a p2m mapping is being changed. The toolstack
>>>>> could
>>>>> compare the counter values before start and at end of migration and
>>>>> redo
>>>>> the migration (or fail) if they are different. In order to avoid
>>>>> races
>>>>> I would have to increment the counter before and after changing the
>>>>> mapping.
>>>>>
>>>>
>>>> That is insufficient I believe.
>>>>
>>>> Consider:
>>>>
>>>> * Toolstack walks pagetables and maps the frames containing the
>>>> linear p2m
>>>> * Live migration starts
>>>> * VM remaps a frame in the middle of the linear p2m
>>>> * Live migration continues, but the toolstack has a stale frame in the
>>>> middle of its view of the p2m.
>>>
>>> This would be covered by my suggestion. At the end of the memory
>>> transfer (with some bogus contents) the toolstack would discover the
>>> change of the p2m structure and either fail the migration or start it
>>> from the beginning and thus overwriting the bogus frames.
>>
>> Checking after pause is too late.  The content of the p2m is used verify
>> each frame being sent on the wire, so is in active use for the entire
>> duration of live migration.
>>
>> If the toolstack starts verifying frames being sent using information
>> from a stale p2m, the best that can be hoped for is that the toolstack
>> declares that the p2m and m2p are inconsistent and abort the migrate.
>>
>>>
>>>> As the p2m is almost never expected to change, I think it might be
>>>> better to have a flag the toolstack can set to say "The toolstack is
>>>> peeking at your p2m behind your back - you must not change its
>>>> structure."
>>>
>>> Be careful here: changes of the structure can be due to two scenarios:
>>> - ballooning (invalid entries being populated): this is no problem, as
>>>    we can stop the ballooning during live migration.
>>> - mapping of grant pages e.g. in a stub domain (first map in an area
>>>    former marked as invalid): you can't stop this, as the stub domain
>>>    has to do some work. Here a restart of the migration should work, as
>>>    the p2m structure change can only happen once for each affected p2m
>>>    page.
>>
>> Migration is not at all possible with a domain referencing foreign
>> frames.
>>
>> The live part can cope with foreign frames referenced in the ptes.  As
>> part of the pause handling in the VM, the frontends must unmap any
>> grants they have.  After pause, any remaining foreign frames cause a
>> migration failure.
>>
>>>
>>>> Having just thought this through, I think there is also a race
>>>> condition
>>>> between a VM changing an entry in the p2m, and the toolstack doing
>>>> verifications of frames being sent.
>>>
>>> Okay, so the flag you mentioned should just prohibit changes in the
>>> p2m list related to memory frames of the affected domain: ballooning
>>> up or down, or rearranging the memory layout (does this happen today?).
>>> Mapping and unmapping of grant pages should be still allowed.
>>
>> HVM guests doesn't have any of their p2m updates represented in the
>> logdirty bitmap, so ballooning an HVM guest during migrate leads to
>> unexpected holes or lack of holes on the resuming side, leading to a
>> very confused balloon driver.
>>
>> At the time I had not found a problem with PV guests, but it is now
>> clear that there is a period of time when a guest is altering its p2m
>> where the p2m and m2p are out of sync, which will cause a migration
>> failure if the toolstack observes this artefact.
>
> So ballooning should be disabled during migration. I think this should
> be handled via callbacks triggered by xenstore: one at start of
> migration to stop ballooning and one at end to restart it. I wouldn't
> want to tie this functionality to the p2m list structure, as it is
> not related to it.

It is not just ballooning.  It is any change to the p2m whatsoever. 
This includes mapping/unmapping grants, XENMEM_exchange, and the guest
simply changing the p2m layout.

I suspect that the only reason this has not been encountered in practice
is that noone has attempted migrating a domain which makes use of
foreign mappings.  It is typically only the backend drivers which map
frontend memory, and dom0 doesn't migrate.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-18 10:51                   ` Andrew Cooper
@ 2014-11-18 10:56                     ` David Vrabel
  0 siblings, 0 replies; 25+ messages in thread
From: David Vrabel @ 2014-11-18 10:56 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross, xen-devel, jbeulich, konrad.wilk

On 18/11/14 10:51, Andrew Cooper wrote:
>>
> It is not just ballooning.  It is any change to the p2m whatsoever. 
> This includes mapping/unmapping grants, XENMEM_exchange, and the guest
> simply changing the p2m layout.

Grants don't matter because they're never saved.

David

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m
  2014-11-14  9:37 ` [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m Juergen Gross
@ 2014-11-19 21:04   ` Konrad Rzeszutek Wilk
  2014-11-20  4:46     ` Juergen Gross
  0 siblings, 1 reply; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-19 21:04 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, david.vrabel, jbeulich

On Fri, Nov 14, 2014 at 10:37:25AM +0100, Juergen Gross wrote:
> Introduce a new boot parameter "virt_p2m" to be able to set
> XENFEAT_virtual_p2m for a pv domain.
> 
> As long as Xen tools and kdump don't support this new feature it is
> turned off by default.

Couldn't the dom0_large and dom0 be detected automatically? That is
the dom0 could advertise it can do large-dom0 support and Xen would
automatically switch to the right mode?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m
  2014-11-19 21:04   ` Konrad Rzeszutek Wilk
@ 2014-11-20  4:46     ` Juergen Gross
  0 siblings, 0 replies; 25+ messages in thread
From: Juergen Gross @ 2014-11-20  4:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel, david.vrabel, jbeulich

On 11/19/2014 10:04 PM, Konrad Rzeszutek Wilk wrote:
> On Fri, Nov 14, 2014 at 10:37:25AM +0100, Juergen Gross wrote:
>> Introduce a new boot parameter "virt_p2m" to be able to set
>> XENFEAT_virtual_p2m for a pv domain.
>>
>> As long as Xen tools and kdump don't support this new feature it is
>> turned off by default.
>
> Couldn't the dom0_large and dom0 be detected automatically? That is
> the dom0 could advertise it can do large-dom0 support and Xen would
> automatically switch to the right mode?

No, that's not the problem. Xen has to indicate it is capable to handle
the new mode. At dom0 construction time the dom0 kernel can't know about
the capability of kdump to handle the new mode.

In case the new interface is accepted I'll set up some kdump patches to
handle it. We can switch to dom0/dom0_large set on default if they are
accepted on time (e.g. at the time the kernel support for the new
interface is put in place).

Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-14  9:37 ` [PATCH 1/4] expand x86 arch_shared_info to support linear " Juergen Gross
  2014-11-14 11:41   ` Andrew Cooper
@ 2014-11-21 12:23   ` Jan Beulich
  2014-11-21 12:57     ` Juergen Gross
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2014-11-21 12:23 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, david.vrabel

>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -224,7 +224,12 @@ struct arch_shared_info {
>      /* Frame containing list of mfns containing list of mfns containing p2m. */
>      xen_pfn_t     pfn_to_mfn_frame_list_list;
>      unsigned long nmi_reason;
> -    uint64_t pad[32];
> +    /*
> +     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
> +     * ~0UL.
> +     */
> +    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
> +    unsigned long p2m_as_root;  /* mfn of the top level page table */

xen_pfn_t please. And what does the "as" in the name stand for?
It's also kind of unclear in the description what "the page table root"
is, as I don't think there are many OSes which use just a single set
of page tables (i.e. just a single address space). Not having followed
the discussion closely - what is this needed for anyway?

> --- a/xen/include/public/features.h
> +++ b/xen/include/public/features.h
> @@ -99,6 +99,9 @@
>  #define XENFEAT_grant_map_identity        12
>   */
>  
> +/* x86: guest may specify virtual address of p2m list */
> +#define XENFEAT_virtual_p2m               13

The name to me suggests something that's not real. Perhaps better
XENFEAT_virtually_mapped_p2m or XENFEAT_p2m_va{,ddr}?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] introduce arch_get_features()
  2014-11-14  9:37 ` [PATCH 2/4] introduce arch_get_features() Juergen Gross
@ 2014-11-21 12:26   ` Jan Beulich
  2014-11-21 13:21   ` Julien Grall
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2014-11-21 12:26 UTC (permalink / raw)
  To: Juergen Gross; +Cc: xen-devel, david.vrabel

>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
> The XENVER_get_features sub command of the xen_version hypercall is
> handled completely in common/kernel.c despite of some architecture
> dependant parts.
> 
> Move the architecture dependant parts in an own function in
> arch/*/domain.c
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-21 12:23   ` Jan Beulich
@ 2014-11-21 12:57     ` Juergen Gross
  2014-11-21 13:26       ` Andrew Cooper
  0 siblings, 1 reply; 25+ messages in thread
From: Juergen Gross @ 2014-11-21 12:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, Ian Jackson, Tim Deegan,
	david.vrabel, xen-devel

On 11/21/2014 01:23 PM, Jan Beulich wrote:
>>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
>> --- a/xen/include/public/arch-x86/xen.h
>> +++ b/xen/include/public/arch-x86/xen.h
>> @@ -224,7 +224,12 @@ struct arch_shared_info {
>>       /* Frame containing list of mfns containing list of mfns containing p2m. */
>>       xen_pfn_t     pfn_to_mfn_frame_list_list;
>>       unsigned long nmi_reason;
>> -    uint64_t pad[32];
>> +    /*
>> +     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
>> +     * ~0UL.
>> +     */
>> +    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
>> +    unsigned long p2m_as_root;  /* mfn of the top level page table */
>
> xen_pfn_t please. And what does the "as" in the name stand for?

"as" is address space. I can rename it to e.g. "p2m_pgd_mfn".

> It's also kind of unclear in the description what "the page table root"
> is, as I don't think there are many OSes which use just a single set
> of page tables (i.e. just a single address space). Not having followed
> the discussion closely - what is this needed for anyway?

It's a replacement of the pfn_to_mfn_frame_list_list using the same
page table as the kernel for accessing the p2m list. We need the root
of the page table and the virtual address of the p2m list.

>
>> --- a/xen/include/public/features.h
>> +++ b/xen/include/public/features.h
>> @@ -99,6 +99,9 @@
>>   #define XENFEAT_grant_map_identity        12
>>    */
>>
>> +/* x86: guest may specify virtual address of p2m list */
>> +#define XENFEAT_virtual_p2m               13
>
> The name to me suggests something that's not real. Perhaps better
> XENFEAT_virtually_mapped_p2m or XENFEAT_p2m_va{,ddr}?

Yeah, that's better. I'll use XENFEAT_p2m_vaddr.


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/4] introduce arch_get_features()
  2014-11-14  9:37 ` [PATCH 2/4] introduce arch_get_features() Juergen Gross
  2014-11-21 12:26   ` Jan Beulich
@ 2014-11-21 13:21   ` Julien Grall
  1 sibling, 0 replies; 25+ messages in thread
From: Julien Grall @ 2014-11-21 13:21 UTC (permalink / raw)
  To: Juergen Gross, xen-devel, jbeulich, konrad.wilk, david.vrabel

Hi Juergen,

On 11/14/2014 09:37 AM, Juergen Gross wrote:
> The XENVER_get_features sub command of the xen_version hypercall is
> handled completely in common/kernel.c despite of some architecture
> dependant parts.
> 
> Move the architecture dependant parts in an own function in
> arch/*/domain.c
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

For the ARM part:

Reviewed-by: Julien Grall <julien.grall@linaro.org>

Regards,

> ---
>  xen/arch/arm/domain.c    |  5 +++++
>  xen/arch/x86/domain.c    | 30 ++++++++++++++++++++++++++++++
>  xen/common/kernel.c      | 22 ++--------------------
>  xen/include/xen/domain.h |  2 ++
>  4 files changed, 39 insertions(+), 20 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 7221bc8..dc5a3fb 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -823,6 +823,11 @@ void vcpu_block_unless_event_pending(struct vcpu *v)
>          vcpu_unblock(current);
>  }
>  
> +uint32_t arch_get_features(struct domain *d, unsigned int submap_idx)
> +{
> +    return 0;
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index ae0a344..d98aabd 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -2166,6 +2166,36 @@ static int __init init_vcpu_kick_softirq(void)
>  }
>  __initcall(init_vcpu_kick_softirq);
>  
> +uint32_t arch_get_features(struct domain *d, unsigned int submap_idx)
> +{
> +    uint32_t submap = 0;
> +
> +    switch ( submap_idx )
> +    {
> +    case 0:
> +        switch ( d->guest_type )
> +        {
> +        case guest_type_pv:
> +            submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
> +                      (1U << XENFEAT_highmem_assist) |
> +                      (1U << XENFEAT_gnttab_map_avail_bits);
> +            break;
> +        case guest_type_pvh:
> +            submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> +                      (1U << XENFEAT_supervisor_mode_kernel) |
> +                      (1U << XENFEAT_hvm_callback_vector);
> +            break;
> +        case guest_type_hvm:
> +            submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> +                      (1U << XENFEAT_hvm_callback_vector) |
> +                      (1U << XENFEAT_hvm_pirqs);
> +            break;
> +        }
> +        break;
> +    }
> +
> +    return submap;
> +}
>  
>  /*
>   * Local variables:
> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
> index d23c422..d22a860 100644
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -312,31 +312,13 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>                  fi.submap |= 1U << XENFEAT_supervisor_mode_kernel;
>              if ( is_hardware_domain(current->domain) )
>                  fi.submap |= 1U << XENFEAT_dom0;
> -#ifdef CONFIG_X86
> -            switch ( d->guest_type )
> -            {
> -            case guest_type_pv:
> -                fi.submap |= (1U << XENFEAT_mmu_pt_update_preserve_ad) |
> -                             (1U << XENFEAT_highmem_assist) |
> -                             (1U << XENFEAT_gnttab_map_avail_bits);
> -                break;
> -            case guest_type_pvh:
> -                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> -                             (1U << XENFEAT_supervisor_mode_kernel) |
> -                             (1U << XENFEAT_hvm_callback_vector);
> -                break;
> -            case guest_type_hvm:
> -                fi.submap |= (1U << XENFEAT_hvm_safe_pvclock) |
> -                             (1U << XENFEAT_hvm_callback_vector) |
> -                             (1U << XENFEAT_hvm_pirqs);
> -                break;
> -            }
> -#endif
>              break;
>          default:
>              return -EINVAL;
>          }
>  
> +        fi.submap |= arch_get_features(d, fi.submap_idx);
> +
>          if ( copy_to_guest(arg, &fi, 1) )
>              return -EFAULT;
>          return 0;
> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
> index 9215b0e..0d12dc0 100644
> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -80,6 +80,8 @@ extern spinlock_t vcpu_alloc_lock;
>  bool_t domctl_lock_acquire(void);
>  void domctl_lock_release(void);
>  
> +uint32_t arch_get_features(struct domain *d, unsigned int submap_idx);
> +
>  /*
>   * Continue the current hypercall via func(data) on specified cpu.
>   * If this function returns 0 then the function is guaranteed to run at some
> 


-- 
Julien Grall

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-21 12:57     ` Juergen Gross
@ 2014-11-21 13:26       ` Andrew Cooper
  2014-11-21 13:37         ` Jürgen Groß
  0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-21 13:26 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, Tim Deegan, Ian Jackson,
	david.vrabel, xen-devel

On 21/11/14 12:57, Juergen Gross wrote:
> On 11/21/2014 01:23 PM, Jan Beulich wrote:
>>>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
>>> --- a/xen/include/public/arch-x86/xen.h
>>> +++ b/xen/include/public/arch-x86/xen.h
>>> @@ -224,7 +224,12 @@ struct arch_shared_info {
>>>       /* Frame containing list of mfns containing list of mfns
>>> containing p2m. */
>>>       xen_pfn_t     pfn_to_mfn_frame_list_list;
>>>       unsigned long nmi_reason;
>>> -    uint64_t pad[32];
>>> +    /*
>>> +     * Following two fields are valid if pfn_to_mfn_frame_list_list
>>> contains
>>> +     * ~0UL.
>>> +     */
>>> +    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
>>> +    unsigned long p2m_as_root;  /* mfn of the top level page table */
>>
>> xen_pfn_t please. And what does the "as" in the name stand for?
>
> "as" is address space. I can rename it to e.g. "p2m_pgd_mfn".

That is a linuxism in naming, which is also not accurate.

>From my understanding, this frame must be an L4 table for 64bit guests,
and an L3 table for 32bit guests.  I.e. it is effectively a cr3 with
which to use the p2m_vaddr field above?

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-21 13:26       ` Andrew Cooper
@ 2014-11-21 13:37         ` Jürgen Groß
  2014-11-21 14:04           ` Andrew Cooper
  2014-11-21 14:07           ` Jan Beulich
  0 siblings, 2 replies; 25+ messages in thread
From: Jürgen Groß @ 2014-11-21 13:37 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, Tim Deegan, Ian Jackson,
	david.vrabel, xen-devel

On 11/21/2014 02:26 PM, Andrew Cooper wrote:
> On 21/11/14 12:57, Juergen Gross wrote:
>> On 11/21/2014 01:23 PM, Jan Beulich wrote:
>>>>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
>>>> --- a/xen/include/public/arch-x86/xen.h
>>>> +++ b/xen/include/public/arch-x86/xen.h
>>>> @@ -224,7 +224,12 @@ struct arch_shared_info {
>>>>        /* Frame containing list of mfns containing list of mfns
>>>> containing p2m. */
>>>>        xen_pfn_t     pfn_to_mfn_frame_list_list;
>>>>        unsigned long nmi_reason;
>>>> -    uint64_t pad[32];
>>>> +    /*
>>>> +     * Following two fields are valid if pfn_to_mfn_frame_list_list
>>>> contains
>>>> +     * ~0UL.
>>>> +     */
>>>> +    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
>>>> +    unsigned long p2m_as_root;  /* mfn of the top level page table */
>>>
>>> xen_pfn_t please. And what does the "as" in the name stand for?
>>
>> "as" is address space. I can rename it to e.g. "p2m_pgd_mfn".
>
> That is a linuxism in naming, which is also not accurate.
>
>  From my understanding, this frame must be an L4 table for 64bit guests,
> and an L3 table for 32bit guests.  I.e. it is effectively a cr3 with
> which to use the p2m_vaddr field above?

Okay, so p2m_cr3 then?


Juergen

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-21 13:37         ` Jürgen Groß
@ 2014-11-21 14:04           ` Andrew Cooper
  2014-11-21 14:07           ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2014-11-21 14:04 UTC (permalink / raw)
  To: Jürgen Groß, Jan Beulich
  Cc: keir@xen.org, Ian.Campbell@citrix.com, Tim Deegan, Ian Jackson,
	david.vrabel, xen-devel

On 21/11/14 13:37, Jürgen Groß wrote:
> On 11/21/2014 02:26 PM, Andrew Cooper wrote:
>> On 21/11/14 12:57, Juergen Gross wrote:
>>> On 11/21/2014 01:23 PM, Jan Beulich wrote:
>>>>>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
>>>>> --- a/xen/include/public/arch-x86/xen.h
>>>>> +++ b/xen/include/public/arch-x86/xen.h
>>>>> @@ -224,7 +224,12 @@ struct arch_shared_info {
>>>>>        /* Frame containing list of mfns containing list of mfns
>>>>> containing p2m. */
>>>>>        xen_pfn_t     pfn_to_mfn_frame_list_list;
>>>>>        unsigned long nmi_reason;
>>>>> -    uint64_t pad[32];
>>>>> +    /*
>>>>> +     * Following two fields are valid if pfn_to_mfn_frame_list_list
>>>>> contains
>>>>> +     * ~0UL.
>>>>> +     */
>>>>> +    unsigned long p2m_vaddr;    /* virtual address of the p2m
>>>>> list */
>>>>> +    unsigned long p2m_as_root;  /* mfn of the top level page
>>>>> table */
>>>>
>>>> xen_pfn_t please. And what does the "as" in the name stand for?
>>>
>>> "as" is address space. I can rename it to e.g. "p2m_pgd_mfn".
>>
>> That is a linuxism in naming, which is also not accurate.
>>
>>  From my understanding, this frame must be an L4 table for 64bit guests,
>> and an L3 table for 32bit guests.  I.e. it is effectively a cr3 with
>> which to use the p2m_vaddr field above?
>
> Okay, so p2m_cr3 then?
>
>

I think that is reasonable, along with the comment explaining that it
must reference an L4 or L3 frame, depending on the width of the guest. 
In the case of a 32bit guest, it should be a magic folded pfn, in the
same representation that cr3 has in the vcpu register state.

~Andrew

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] expand x86 arch_shared_info to support linear p2m list
  2014-11-21 13:37         ` Jürgen Groß
  2014-11-21 14:04           ` Andrew Cooper
@ 2014-11-21 14:07           ` Jan Beulich
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2014-11-21 14:07 UTC (permalink / raw)
  To: Juergen Gross
  Cc: keir@xen.org, Ian.Campbell@citrix.com, Andrew Cooper, Ian Jackson,
	Tim Deegan, david.vrabel, xen-devel

>>> On 21.11.14 at 14:37, <JGross@suse.com> wrote:
> On 11/21/2014 02:26 PM, Andrew Cooper wrote:
>> On 21/11/14 12:57, Juergen Gross wrote:
>>> On 11/21/2014 01:23 PM, Jan Beulich wrote:
>>>>>>> On 14.11.14 at 10:37, <"jgross@suse.com".non-mime.internet> wrote:
>>>>> --- a/xen/include/public/arch-x86/xen.h
>>>>> +++ b/xen/include/public/arch-x86/xen.h
>>>>> @@ -224,7 +224,12 @@ struct arch_shared_info {
>>>>>        /* Frame containing list of mfns containing list of mfns
>>>>> containing p2m. */
>>>>>        xen_pfn_t     pfn_to_mfn_frame_list_list;
>>>>>        unsigned long nmi_reason;
>>>>> -    uint64_t pad[32];
>>>>> +    /*
>>>>> +     * Following two fields are valid if pfn_to_mfn_frame_list_list
>>>>> contains
>>>>> +     * ~0UL.
>>>>> +     */
>>>>> +    unsigned long p2m_vaddr;    /* virtual address of the p2m list */
>>>>> +    unsigned long p2m_as_root;  /* mfn of the top level page table */
>>>>
>>>> xen_pfn_t please. And what does the "as" in the name stand for?
>>>
>>> "as" is address space. I can rename it to e.g. "p2m_pgd_mfn".
>>
>> That is a linuxism in naming, which is also not accurate.
>>
>>  From my understanding, this frame must be an L4 table for 64bit guests,
>> and an L3 table for 32bit guests.  I.e. it is effectively a cr3 with
>> which to use the p2m_vaddr field above?
> 
> Okay, so p2m_cr3 then?

I'm okay with that, but would think p2m_pt_root would be more
generic. Considering PAE, wouldn't this need to be a 64-bit value
then even in the 32-bit interface? Or alternatively be represented
as MFN instead?

Jan

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-11-21 14:07 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-14  9:37 [PATCH 0/4] support guest virtual mapped p2m list Juergen Gross
2014-11-14  9:37 ` [PATCH 1/4] expand x86 arch_shared_info to support linear " Juergen Gross
2014-11-14 11:41   ` Andrew Cooper
2014-11-14 12:53     ` Juergen Gross
2014-11-14 13:56       ` Andrew Cooper
2014-11-14 14:14         ` Jürgen Groß
2014-11-14 14:59           ` Andrew Cooper
2014-11-14 15:32             ` Juergen Gross
2014-11-14 16:08               ` Andrew Cooper
2014-11-18  5:33                 ` Juergen Gross
2014-11-18 10:51                   ` Andrew Cooper
2014-11-18 10:56                     ` David Vrabel
2014-11-21 12:23   ` Jan Beulich
2014-11-21 12:57     ` Juergen Gross
2014-11-21 13:26       ` Andrew Cooper
2014-11-21 13:37         ` Jürgen Groß
2014-11-21 14:04           ` Andrew Cooper
2014-11-21 14:07           ` Jan Beulich
2014-11-14  9:37 ` [PATCH 2/4] introduce arch_get_features() Juergen Gross
2014-11-21 12:26   ` Jan Beulich
2014-11-21 13:21   ` Julien Grall
2014-11-14  9:37 ` [PATCH 3/4] introduce boot parameter for setting XENFEAT_virtual_p2m Juergen Gross
2014-11-19 21:04   ` Konrad Rzeszutek Wilk
2014-11-20  4:46     ` Juergen Gross
2014-11-14  9:37 ` [PATCH 4/4] document new boot parameter virt_p2m Juergen Gross

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.