[PATCH V3 0/1] support >3 level p2m tree

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [PATCH V3 0/1] support >3 level p2m tree
@ 2014-09-09  9:58 Juergen Gross
  2014-09-09  9:58 ` [PATCH V3 1/1] expand x86 arch_shared_info to " Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-09  9:58 UTC (permalink / raw)
  To: ian.campbell, ian.jackson, jbeulich, keir, tim, xen-devel; +Cc: Juergen Gross

Pv domains write the mfn of a 3 level p2m tree to arch_shared_info
structure. Consumers of this information are the domain save/restore
functions of the Xen tools.

Being defined as having 3 levels the maximum supported domain size of
64 bit domains is 512 GB. The following patch expands the
arch_shared_info structure to support more levels.

The Xen tools are not covered in this patch as the patch series
"Libxl migration v2 support":
http://lists.xen.org/archives/html/xen-devel/2014-09/msg00427.html
should be applied first.

Juergen Gross (1):
  expand x86 arch_shared_info to support >3 level p2m tree

 xen/include/public/arch-x86/xen.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
1.8.4.5

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-09  9:58 [PATCH V3 0/1] support >3 level p2m tree Juergen Gross
@ 2014-09-09  9:58 ` Juergen Gross
  2014-09-09 10:27   ` Andrew Cooper
                     ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Juergen Gross @ 2014-09-09  9:58 UTC (permalink / raw)
  To: ian.campbell, ian.jackson, jbeulich, keir, tim, xen-devel; +Cc: Juergen Gross

The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
currently contains the mfn of the top level page frame of the 3 level
p2m tree, which is used by the Xen tools during saving and restoring
(and live migration) of pv domains. With three levels of the p2m tree
it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
A 32 bit pv domain can support more, as each memory page can hold 1024
instead of 512 entries, leading to a limit of 4 TB. To be able to
support more RAM on x86-64 an additional level is to be added.

This patch expands struct arch_shared_info with a new p2m tree root
and the number of levels of the p2m tree. The new information is
indicated by the domain to be valid by storing ~0UL into
pfn_to_mfn_frame_list_list (this should be done only if more than
three levels are needed, of course).

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/include/public/arch-x86/xen.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index f35804b..2ca996c 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -224,7 +224,12 @@ struct arch_shared_info {
     /* Frame containing list of mfns containing list of mfns containing p2m. */
     xen_pfn_t     pfn_to_mfn_frame_list_list;
     unsigned long nmi_reason;
-    uint64_t pad[32];
+    /*
+     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
+     * ~0UL.
+     */
+    unsigned long p2m_levels;   /* number of levels of p2m tree */
+    xen_pfn_t     p2m_root;     /* p2m tree top level mfn */
 };
 typedef struct arch_shared_info arch_shared_info_t;

-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-09  9:58 ` [PATCH V3 1/1] expand x86 arch_shared_info to " Juergen Gross
@ 2014-09-09 10:27   ` Andrew Cooper
  2014-09-09 10:49     ` Juergen Gross
  2014-09-30  8:53   ` Jan Beulich
       [not found]   ` <542A8B93020000780003AE7B@suse.com>
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2014-09-09 10:27 UTC (permalink / raw)
  To: Juergen Gross, ian.campbell, ian.jackson, jbeulich, keir, tim,
	xen-devel

On 09/09/14 10:58, Juergen Gross wrote:
> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
> currently contains the mfn of the top level page frame of the 3 level
> p2m tree, which is used by the Xen tools during saving and restoring
> (and live migration) of pv domains. With three levels of the p2m tree
> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
> A 32 bit pv domain can support more, as each memory page can hold 1024
> instead of 512 entries, leading to a limit of 4 TB. To be able to
> support more RAM on x86-64 an additional level is to be added.
>
> This patch expands struct arch_shared_info with a new p2m tree root
> and the number of levels of the p2m tree. The new information is
> indicated by the domain to be valid by storing ~0UL into
> pfn_to_mfn_frame_list_list (this should be done only if more than
> three levels are needed, of course).

A small domain feeling a little tight on space could easily opt for a 2
or even 1 level p2m.  (After all, one advantage of virt is to cram many
small VMs into a server).

How is xen and toolstack support for n-level p2ms going to be advertised
to guests?  Simply assuming the toolstack is capable of dealing with
this new scheme wont work with a new pv guest running on an older Xen.

~Andrew

>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> ---
>  xen/include/public/arch-x86/xen.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
> index f35804b..2ca996c 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -224,7 +224,12 @@ struct arch_shared_info {
>      /* Frame containing list of mfns containing list of mfns containing p2m. */
>      xen_pfn_t     pfn_to_mfn_frame_list_list;
>      unsigned long nmi_reason;
> -    uint64_t pad[32];
> +    /*
> +     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
> +     * ~0UL.
> +     */
> +    unsigned long p2m_levels;   /* number of levels of p2m tree */
> +    xen_pfn_t     p2m_root;     /* p2m tree top level mfn */
>  };
>  typedef struct arch_shared_info arch_shared_info_t;
>  

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-09 10:27   ` Andrew Cooper
@ 2014-09-09 10:49     ` Juergen Gross
  2014-09-12 10:31       ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-09 10:49 UTC (permalink / raw)
  To: Andrew Cooper, ian.campbell, ian.jackson, jbeulich, keir, tim,
	xen-devel

On 09/09/2014 12:27 PM, Andrew Cooper wrote:
> On 09/09/14 10:58, Juergen Gross wrote:
>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>> currently contains the mfn of the top level page frame of the 3 level
>> p2m tree, which is used by the Xen tools during saving and restoring
>> (and live migration) of pv domains. With three levels of the p2m tree
>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>> A 32 bit pv domain can support more, as each memory page can hold 1024
>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>> support more RAM on x86-64 an additional level is to be added.
>>
>> This patch expands struct arch_shared_info with a new p2m tree root
>> and the number of levels of the p2m tree. The new information is
>> indicated by the domain to be valid by storing ~0UL into
>> pfn_to_mfn_frame_list_list (this should be done only if more than
>> three levels are needed, of course).
>
> A small domain feeling a little tight on space could easily opt for a 2
> or even 1 level p2m.  (After all, one advantage of virt is to cram many
> small VMs into a server).
>
> How is xen and toolstack support for n-level p2ms going to be advertised
> to guests?  Simply assuming the toolstack is capable of dealing with
> this new scheme wont work with a new pv guest running on an older Xen.

Is it really worth doing such an optimization? This would save only very
few pages.

If you think it should be done we can add another SIF_* flag to
start_info->flags. In this case a domain using this feature could not be
migrated to a server with old tools, however. So we would probably end
with the need to be able to suppress that flag on a per-domain base.

Juergen

>
> ~Andrew
>
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>> ---
>>   xen/include/public/arch-x86/xen.h | 7 ++++++-
>>   1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
>> index f35804b..2ca996c 100644
>> --- a/xen/include/public/arch-x86/xen.h
>> +++ b/xen/include/public/arch-x86/xen.h
>> @@ -224,7 +224,12 @@ struct arch_shared_info {
>>       /* Frame containing list of mfns containing list of mfns containing p2m. */
>>       xen_pfn_t     pfn_to_mfn_frame_list_list;
>>       unsigned long nmi_reason;
>> -    uint64_t pad[32];
>> +    /*
>> +     * Following two fields are valid if pfn_to_mfn_frame_list_list contains
>> +     * ~0UL.
>> +     */
>> +    unsigned long p2m_levels;   /* number of levels of p2m tree */
>> +    xen_pfn_t     p2m_root;     /* p2m tree top level mfn */
>>   };
>>   typedef struct arch_shared_info arch_shared_info_t;
>>
>
>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-09 10:49     ` Juergen Gross
@ 2014-09-12 10:31       ` Juergen Gross
  2014-09-15  8:29         ` Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-12 10:31 UTC (permalink / raw)
  To: Andrew Cooper, ian.campbell, ian.jackson, jbeulich, keir, tim,
	xen-devel

On 09/09/2014 12:49 PM, Juergen Gross wrote:
> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>> On 09/09/14 10:58, Juergen Gross wrote:
>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>> currently contains the mfn of the top level page frame of the 3 level
>>> p2m tree, which is used by the Xen tools during saving and restoring
>>> (and live migration) of pv domains. With three levels of the p2m tree
>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>> support more RAM on x86-64 an additional level is to be added.
>>>
>>> This patch expands struct arch_shared_info with a new p2m tree root
>>> and the number of levels of the p2m tree. The new information is
>>> indicated by the domain to be valid by storing ~0UL into
>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>> three levels are needed, of course).
>>
>> A small domain feeling a little tight on space could easily opt for a 2
>> or even 1 level p2m.  (After all, one advantage of virt is to cram many
>> small VMs into a server).
>>
>> How is xen and toolstack support for n-level p2ms going to be advertised
>> to guests?  Simply assuming the toolstack is capable of dealing with
>> this new scheme wont work with a new pv guest running on an older Xen.
>
> Is it really worth doing such an optimization? This would save only very
> few pages.
>
> If you think it should be done we can add another SIF_* flag to
> start_info->flags. In this case a domain using this feature could not be
> migrated to a server with old tools, however. So we would probably end
> with the need to be able to suppress that flag on a per-domain base.

Any further comments?

Which way should I go?


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-12 10:31       ` Juergen Gross
@ 2014-09-15  8:29         ` Andrew Cooper
  2014-09-15  8:52           ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2014-09-15  8:29 UTC (permalink / raw)
  To: Juergen Gross, ian.campbell, ian.jackson, jbeulich, keir, tim,
	xen-devel


On 12/09/2014 11:31, Juergen Gross wrote:
> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>> currently contains the mfn of the top level page frame of the 3 level
>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>> (and live migration) of pv domains. With three levels of the p2m tree
>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>> support more RAM on x86-64 an additional level is to be added.
>>>>
>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>> and the number of levels of the p2m tree. The new information is
>>>> indicated by the domain to be valid by storing ~0UL into
>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>> three levels are needed, of course).
>>>
>>> A small domain feeling a little tight on space could easily opt for a 2
>>> or even 1 level p2m.  (After all, one advantage of virt is to cram many
>>> small VMs into a server).
>>>
>>> How is xen and toolstack support for n-level p2ms going to be 
>>> advertised
>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>> this new scheme wont work with a new pv guest running on an older Xen.
>>
>> Is it really worth doing such an optimization? This would save only very
>> few pages.
>>
>> If you think it should be done we can add another SIF_* flag to
>> start_info->flags. In this case a domain using this feature could not be
>> migrated to a server with old tools, however. So we would probably end
>> with the need to be able to suppress that flag on a per-domain base.
>
> Any further comments?
>
> Which way should I go?
>

There are two approaches, with different up/downsides

1) continue to use the old method, and use the new method only when 
absolutely required.  This will function, but on old toolstacks, suffer 
migration/suspend failures when the toolstack fails to find the p2m.

2) Provide a Xen feature flag indicating the presence of N-level p2m 
support.  Guests which can see this flag are free to use N-level, and 
guests which can't are not.

Ultimately, giving more than 512GB to a current 64bit PV domain is not 
going to work, and the choice above depends on which failure mode you 
wish a new/old mix to have.

~Andrew

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15  8:29         ` Andrew Cooper
@ 2014-09-15  8:52           ` Juergen Gross
  2014-09-15  9:42             ` Jan Beulich
  2014-09-15  9:44             ` David Vrabel
  0 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2014-09-15  8:52 UTC (permalink / raw)
  To: Andrew Cooper, ian.campbell, ian.jackson, jbeulich, keir, tim,
	xen-devel

On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>
> On 12/09/2014 11:31, Juergen Gross wrote:
>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>> (and live migration) of pv domains. With three levels of the p2m tree
>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>
>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>> and the number of levels of the p2m tree. The new information is
>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>> three levels are needed, of course).
>>>>
>>>> A small domain feeling a little tight on space could easily opt for a 2
>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram many
>>>> small VMs into a server).
>>>>
>>>> How is xen and toolstack support for n-level p2ms going to be
>>>> advertised
>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>> this new scheme wont work with a new pv guest running on an older Xen.
>>>
>>> Is it really worth doing such an optimization? This would save only very
>>> few pages.
>>>
>>> If you think it should be done we can add another SIF_* flag to
>>> start_info->flags. In this case a domain using this feature could not be
>>> migrated to a server with old tools, however. So we would probably end
>>> with the need to be able to suppress that flag on a per-domain base.
>>
>> Any further comments?
>>
>> Which way should I go?
>>
>
> There are two approaches, with different up/downsides
>
> 1) continue to use the old method, and use the new method only when
> absolutely required.  This will function, but on old toolstacks, suffer
> migration/suspend failures when the toolstack fails to find the p2m.
>
> 2) Provide a Xen feature flag indicating the presence of N-level p2m
> support.  Guests which can see this flag are free to use N-level, and
> guests which can't are not.
>
> Ultimately, giving more than 512GB to a current 64bit PV domain is not
> going to work, and the choice above depends on which failure mode you
> wish a new/old mix to have.

I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
without requiring a change of any Xen component. Additionally large
domains can be started by users who don't care for migrating or
suspending them.

So I'd rather keep my patch as posted.


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15  8:52           ` Juergen Gross
@ 2014-09-15  9:42             ` Jan Beulich
  2014-09-15  9:48               ` Juergen Gross
  2014-09-15  9:44             ` David Vrabel
  1 sibling, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2014-09-15  9:42 UTC (permalink / raw)
  To: Andrew Cooper, Juergen Gross
  Cc: keir, tim, ian.jackson, ian.campbell, xen-devel

>>> On 15.09.14 at 10:52, <JGross@suse.com> wrote:
> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>
>> On 12/09/2014 11:31, Juergen Gross wrote:
>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>> (and live migration) of pv domains. With three levels of the p2m tree
>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>
>>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>> three levels are needed, of course).
>>>>>
>>>>> A small domain feeling a little tight on space could easily opt for a 2
>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram many
>>>>> small VMs into a server).
>>>>>
>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>> advertised
>>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>>> this new scheme wont work with a new pv guest running on an older Xen.
>>>>
>>>> Is it really worth doing such an optimization? This would save only very
>>>> few pages.
>>>>
>>>> If you think it should be done we can add another SIF_* flag to
>>>> start_info->flags. In this case a domain using this feature could not be
>>>> migrated to a server with old tools, however. So we would probably end
>>>> with the need to be able to suppress that flag on a per-domain base.
>>>
>>> Any further comments?
>>>
>>> Which way should I go?
>>>
>>
>> There are two approaches, with different up/downsides
>>
>> 1) continue to use the old method, and use the new method only when
>> absolutely required.  This will function, but on old toolstacks, suffer
>> migration/suspend failures when the toolstack fails to find the p2m.
>>
>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>> support.  Guests which can see this flag are free to use N-level, and
>> guests which can't are not.
>>
>> Ultimately, giving more than 512GB to a current 64bit PV domain is not
>> going to work, and the choice above depends on which failure mode you
>> wish a new/old mix to have.
> 
> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
> without requiring a change of any Xen component. Additionally large
> domains can be started by users who don't care for migrating or
> suspending them.

With the hopefully well understood limitation of kexec not working
there (as it, just like migration for DomU, uses this table for Dom0
in at least machine_crash_shutdown()).

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15  9:42             ` Jan Beulich
@ 2014-09-15  9:48               ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2014-09-15  9:48 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: tim, xen-devel, keir, ian.jackson, ian.campbell

On 09/15/2014 11:42 AM, Jan Beulich wrote:
>>>> On 15.09.14 at 10:52, <JGross@suse.com> wrote:
>> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>>
>>> On 12/09/2014 11:31, Juergen Gross wrote:
>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>>> (and live migration) of pv domains. With three levels of the p2m tree
>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>>
>>>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>>> three levels are needed, of course).
>>>>>>
>>>>>> A small domain feeling a little tight on space could easily opt for a 2
>>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram many
>>>>>> small VMs into a server).
>>>>>>
>>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>>> advertised
>>>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>>>> this new scheme wont work with a new pv guest running on an older Xen.
>>>>>
>>>>> Is it really worth doing such an optimization? This would save only very
>>>>> few pages.
>>>>>
>>>>> If you think it should be done we can add another SIF_* flag to
>>>>> start_info->flags. In this case a domain using this feature could not be
>>>>> migrated to a server with old tools, however. So we would probably end
>>>>> with the need to be able to suppress that flag on a per-domain base.
>>>>
>>>> Any further comments?
>>>>
>>>> Which way should I go?
>>>>
>>>
>>> There are two approaches, with different up/downsides
>>>
>>> 1) continue to use the old method, and use the new method only when
>>> absolutely required.  This will function, but on old toolstacks, suffer
>>> migration/suspend failures when the toolstack fails to find the p2m.
>>>
>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>>> support.  Guests which can see this flag are free to use N-level, and
>>> guests which can't are not.
>>>
>>> Ultimately, giving more than 512GB to a current 64bit PV domain is not
>>> going to work, and the choice above depends on which failure mode you
>>> wish a new/old mix to have.
>>
>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
>> without requiring a change of any Xen component. Additionally large
>> domains can be started by users who don't care for migrating or
>> suspending them.
>
> With the hopefully well understood limitation of kexec not working
> there (as it, just like migration for DomU, uses this table for Dom0
> in at least machine_crash_shutdown()).

Sure. That's another issue to be addressed.

Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15  8:52           ` Juergen Gross
  2014-09-15  9:42             ` Jan Beulich
@ 2014-09-15  9:44             ` David Vrabel
  2014-09-15  9:52               ` Juergen Gross
  1 sibling, 1 reply; 23+ messages in thread
From: David Vrabel @ 2014-09-15  9:44 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 15/09/14 09:52, Juergen Gross wrote:
> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>
>> On 12/09/2014 11:31, Juergen Gross wrote:
>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>> (and live migration) of pv domains. With three levels of the p2m tree
>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>> 1024
>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>
>>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>> three levels are needed, of course).
>>>>>
>>>>> A small domain feeling a little tight on space could easily opt for
>>>>> a 2
>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram
>>>>> many
>>>>> small VMs into a server).
>>>>>
>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>> advertised
>>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>>> this new scheme wont work with a new pv guest running on an older Xen.
>>>>
>>>> Is it really worth doing such an optimization? This would save only
>>>> very
>>>> few pages.
>>>>
>>>> If you think it should be done we can add another SIF_* flag to
>>>> start_info->flags. In this case a domain using this feature could
>>>> not be
>>>> migrated to a server with old tools, however. So we would probably end
>>>> with the need to be able to suppress that flag on a per-domain base.
>>>
>>> Any further comments?
>>>
>>> Which way should I go?
>>>
>>
>> There are two approaches, with different up/downsides
>>
>> 1) continue to use the old method, and use the new method only when
>> absolutely required.  This will function, but on old toolstacks, suffer
>> migration/suspend failures when the toolstack fails to find the p2m.
>>
>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>> support.  Guests which can see this flag are free to use N-level, and
>> guests which can't are not.
>>
>> Ultimately, giving more than 512GB to a current 64bit PV domain is not
>> going to work, and the choice above depends on which failure mode you
>> wish a new/old mix to have.
> 
> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
> without requiring a change of any Xen component. Additionally large
> domains can be started by users who don't care for migrating or
> suspending them.
> 
> So I'd rather keep my patch as posted.

PV guests can have extra memory added, beyond their initial limit.
Supporting this would require option 2.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15  9:44             ` David Vrabel
@ 2014-09-15  9:52               ` Juergen Gross
  2014-09-15 10:30                 ` David Vrabel
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-15  9:52 UTC (permalink / raw)
  To: David Vrabel, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 09/15/2014 11:44 AM, David Vrabel wrote:
> On 15/09/14 09:52, Juergen Gross wrote:
>> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>>
>>> On 12/09/2014 11:31, Juergen Gross wrote:
>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>>> (and live migration) of pv domains. With three levels of the p2m tree
>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>> 1024
>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>>
>>>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>>> three levels are needed, of course).
>>>>>>
>>>>>> A small domain feeling a little tight on space could easily opt for
>>>>>> a 2
>>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram
>>>>>> many
>>>>>> small VMs into a server).
>>>>>>
>>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>>> advertised
>>>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>>>> this new scheme wont work with a new pv guest running on an older Xen.
>>>>>
>>>>> Is it really worth doing such an optimization? This would save only
>>>>> very
>>>>> few pages.
>>>>>
>>>>> If you think it should be done we can add another SIF_* flag to
>>>>> start_info->flags. In this case a domain using this feature could
>>>>> not be
>>>>> migrated to a server with old tools, however. So we would probably end
>>>>> with the need to be able to suppress that flag on a per-domain base.
>>>>
>>>> Any further comments?
>>>>
>>>> Which way should I go?
>>>>
>>>
>>> There are two approaches, with different up/downsides
>>>
>>> 1) continue to use the old method, and use the new method only when
>>> absolutely required.  This will function, but on old toolstacks, suffer
>>> migration/suspend failures when the toolstack fails to find the p2m.
>>>
>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>>> support.  Guests which can see this flag are free to use N-level, and
>>> guests which can't are not.
>>>
>>> Ultimately, giving more than 512GB to a current 64bit PV domain is not
>>> going to work, and the choice above depends on which failure mode you
>>> wish a new/old mix to have.
>>
>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
>> without requiring a change of any Xen component. Additionally large
>> domains can be started by users who don't care for migrating or
>> suspending them.
>>
>> So I'd rather keep my patch as posted.
>
> PV guests can have extra memory added, beyond their initial limit.
> Supporting this would require option 2.

I don't see why this should require option 2. Option 1 only prohibits
suspending/migrating a domain with more than 512 GB. Just running it is
fine with either option. I should mention, however, that the number of
p2m tree levels will be increased on demand when needed. The tree won't
be created with more than 3 levels if the domain isn't started with more
than 512 GB.


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15  9:52               ` Juergen Gross
@ 2014-09-15 10:30                 ` David Vrabel
  2014-09-15 10:46                   ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: David Vrabel @ 2014-09-15 10:30 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 15/09/14 10:52, Juergen Gross wrote:
> On 09/15/2014 11:44 AM, David Vrabel wrote:
>> On 15/09/14 09:52, Juergen Gross wrote:
>>> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>>>
>>>> On 12/09/2014 11:31, Juergen Gross wrote:
>>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>>> level
>>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>>> restoring
>>>>>>>> (and live migration) of pv domains. With three levels of the p2m
>>>>>>>> tree
>>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv
>>>>>>>> domain.
>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>>> 1024
>>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>>>
>>>>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>>>> three levels are needed, of course).
>>>>>>>
>>>>>>> A small domain feeling a little tight on space could easily opt for
>>>>>>> a 2
>>>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram
>>>>>>> many
>>>>>>> small VMs into a server).
>>>>>>>
>>>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>>>> advertised
>>>>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>>>>> this new scheme wont work with a new pv guest running on an older
>>>>>>> Xen.
>>>>>>
>>>>>> Is it really worth doing such an optimization? This would save only
>>>>>> very
>>>>>> few pages.
>>>>>>
>>>>>> If you think it should be done we can add another SIF_* flag to
>>>>>> start_info->flags. In this case a domain using this feature could
>>>>>> not be
>>>>>> migrated to a server with old tools, however. So we would probably
>>>>>> end
>>>>>> with the need to be able to suppress that flag on a per-domain base.
>>>>>
>>>>> Any further comments?
>>>>>
>>>>> Which way should I go?
>>>>>
>>>>
>>>> There are two approaches, with different up/downsides
>>>>
>>>> 1) continue to use the old method, and use the new method only when
>>>> absolutely required.  This will function, but on old toolstacks, suffer
>>>> migration/suspend failures when the toolstack fails to find the p2m.
>>>>
>>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>>>> support.  Guests which can see this flag are free to use N-level, and
>>>> guests which can't are not.
>>>>
>>>> Ultimately, giving more than 512GB to a current 64bit PV domain is not
>>>> going to work, and the choice above depends on which failure mode you
>>>> wish a new/old mix to have.
>>>
>>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
>>> without requiring a change of any Xen component. Additionally large
>>> domains can be started by users who don't care for migrating or
>>> suspending them.
>>>
>>> So I'd rather keep my patch as posted.
>>
>> PV guests can have extra memory added, beyond their initial limit.
>> Supporting this would require option 2.
> 
> I don't see why this should require option 2.

Um...

> Option 1 only prohibits suspending/migrating a domain with more than 512 GB.

...this is the reason.

With the exception of VMs that have assigned direct access to hardware,
migration is an essential feature and must be supported.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15 10:30                 ` David Vrabel
@ 2014-09-15 10:46                   ` Juergen Gross
  2014-09-15 11:29                     ` Jan Beulich
  2014-09-15 14:30                     ` David Vrabel
  0 siblings, 2 replies; 23+ messages in thread
From: Juergen Gross @ 2014-09-15 10:46 UTC (permalink / raw)
  To: David Vrabel, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 09/15/2014 12:30 PM, David Vrabel wrote:
> On 15/09/14 10:52, Juergen Gross wrote:
>> On 09/15/2014 11:44 AM, David Vrabel wrote:
>>> On 15/09/14 09:52, Juergen Gross wrote:
>>>> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>>>>
>>>>> On 12/09/2014 11:31, Juergen Gross wrote:
>>>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>>>> level
>>>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>>>> restoring
>>>>>>>>> (and live migration) of pv domains. With three levels of the p2m
>>>>>>>>> tree
>>>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv
>>>>>>>>> domain.
>>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>>>> 1024
>>>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>>>>
>>>>>>>>> This patch expands struct arch_shared_info with a new p2m tree root
>>>>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>>>>> three levels are needed, of course).
>>>>>>>>
>>>>>>>> A small domain feeling a little tight on space could easily opt for
>>>>>>>> a 2
>>>>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram
>>>>>>>> many
>>>>>>>> small VMs into a server).
>>>>>>>>
>>>>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>>>>> advertised
>>>>>>>> to guests?  Simply assuming the toolstack is capable of dealing with
>>>>>>>> this new scheme wont work with a new pv guest running on an older
>>>>>>>> Xen.
>>>>>>>
>>>>>>> Is it really worth doing such an optimization? This would save only
>>>>>>> very
>>>>>>> few pages.
>>>>>>>
>>>>>>> If you think it should be done we can add another SIF_* flag to
>>>>>>> start_info->flags. In this case a domain using this feature could
>>>>>>> not be
>>>>>>> migrated to a server with old tools, however. So we would probably
>>>>>>> end
>>>>>>> with the need to be able to suppress that flag on a per-domain base.
>>>>>>
>>>>>> Any further comments?
>>>>>>
>>>>>> Which way should I go?
>>>>>>
>>>>>
>>>>> There are two approaches, with different up/downsides
>>>>>
>>>>> 1) continue to use the old method, and use the new method only when
>>>>> absolutely required.  This will function, but on old toolstacks, suffer
>>>>> migration/suspend failures when the toolstack fails to find the p2m.
>>>>>
>>>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>>>>> support.  Guests which can see this flag are free to use N-level, and
>>>>> guests which can't are not.
>>>>>
>>>>> Ultimately, giving more than 512GB to a current 64bit PV domain is not
>>>>> going to work, and the choice above depends on which failure mode you
>>>>> wish a new/old mix to have.
>>>>
>>>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
>>>> without requiring a change of any Xen component. Additionally large
>>>> domains can be started by users who don't care for migrating or
>>>> suspending them.
>>>>
>>>> So I'd rather keep my patch as posted.
>>>
>>> PV guests can have extra memory added, beyond their initial limit.
>>> Supporting this would require option 2.
>>
>> I don't see why this should require option 2.
>
> Um...
>
>> Option 1 only prohibits suspending/migrating a domain with more than 512 GB.
>
> ...this is the reason.
>
> With the exception of VMs that have assigned direct access to hardware,
> migration is an essential feature and must be supported.

So you'd prefer:

1) >512GB pv-domains (including Dom0) will be supported only with new
    Xen (4.6?), no matter if the user requires migration to be supported

to:

2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can
    be started on current Xen versions, migration is possible only if Xen
    is new (4.6?)

What is the common use case for migration? I doubt it is used very often
for really huge domains.

I'm not really opposed to solution 2, but I doubt it is the correct one.


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15 10:46                   ` Juergen Gross
@ 2014-09-15 11:29                     ` Jan Beulich
  2014-09-15 14:30                     ` David Vrabel
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2014-09-15 11:29 UTC (permalink / raw)
  To: David Vrabel, Juergen Gross
  Cc: keir, ian.campbell, Andrew Cooper, tim, xen-devel, ian.jackson

>>> On 15.09.14 at 12:46, <JGross@suse.com> wrote:
> On 09/15/2014 12:30 PM, David Vrabel wrote:
>> With the exception of VMs that have assigned direct access to hardware,
>> migration is an essential feature and must be supported.
> 
> So you'd prefer:
> 
> 1) >512GB pv-domains (including Dom0) will be supported only with new
>     Xen (4.6?), no matter if the user requires migration to be supported
> 
> to:
> 
> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can
>     be started on current Xen versions, migration is possible only if Xen
>     is new (4.6?)
> 
> What is the common use case for migration? I doubt it is used very often
> for really huge domains.

Even without any guessing on the likelihood and usefulness of huge
domains getting migrated, 1) clearly causing more functionality
reduction than 2) I'm having a hard time seeing why 1) would be
favored by anyone outside of academical/theoretical considerations.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15 10:46                   ` Juergen Gross
  2014-09-15 11:29                     ` Jan Beulich
@ 2014-09-15 14:30                     ` David Vrabel
  2014-09-16  3:52                       ` Juergen Gross
  1 sibling, 1 reply; 23+ messages in thread
From: David Vrabel @ 2014-09-15 14:30 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 15/09/14 11:46, Juergen Gross wrote:
> On 09/15/2014 12:30 PM, David Vrabel wrote:
>> On 15/09/14 10:52, Juergen Gross wrote:
>>> On 09/15/2014 11:44 AM, David Vrabel wrote:
>>>> On 15/09/14 09:52, Juergen Gross wrote:
>>>>> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>>>>>
>>>>>> On 12/09/2014 11:31, Juergen Gross wrote:
>>>>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>>>>> level
>>>>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>>>>> restoring
>>>>>>>>>> (and live migration) of pv domains. With three levels of the p2m
>>>>>>>>>> tree
>>>>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv
>>>>>>>>>> domain.
>>>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>>>>> 1024
>>>>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>>>>>
>>>>>>>>>> This patch expands struct arch_shared_info with a new p2m tree
>>>>>>>>>> root
>>>>>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>>>>>> three levels are needed, of course).
>>>>>>>>>
>>>>>>>>> A small domain feeling a little tight on space could easily opt
>>>>>>>>> for
>>>>>>>>> a 2
>>>>>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram
>>>>>>>>> many
>>>>>>>>> small VMs into a server).
>>>>>>>>>
>>>>>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>>>>>> advertised
>>>>>>>>> to guests?  Simply assuming the toolstack is capable of dealing
>>>>>>>>> with
>>>>>>>>> this new scheme wont work with a new pv guest running on an older
>>>>>>>>> Xen.
>>>>>>>>
>>>>>>>> Is it really worth doing such an optimization? This would save only
>>>>>>>> very
>>>>>>>> few pages.
>>>>>>>>
>>>>>>>> If you think it should be done we can add another SIF_* flag to
>>>>>>>> start_info->flags. In this case a domain using this feature could
>>>>>>>> not be
>>>>>>>> migrated to a server with old tools, however. So we would probably
>>>>>>>> end
>>>>>>>> with the need to be able to suppress that flag on a per-domain
>>>>>>>> base.
>>>>>>>
>>>>>>> Any further comments?
>>>>>>>
>>>>>>> Which way should I go?
>>>>>>>
>>>>>>
>>>>>> There are two approaches, with different up/downsides
>>>>>>
>>>>>> 1) continue to use the old method, and use the new method only when
>>>>>> absolutely required.  This will function, but on old toolstacks,
>>>>>> suffer
>>>>>> migration/suspend failures when the toolstack fails to find the p2m.
>>>>>>
>>>>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>>>>>> support.  Guests which can see this flag are free to use N-level, and
>>>>>> guests which can't are not.
>>>>>>
>>>>>> Ultimately, giving more than 512GB to a current 64bit PV domain is
>>>>>> not
>>>>>> going to work, and the choice above depends on which failure mode you
>>>>>> wish a new/old mix to have.
>>>>>
>>>>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
>>>>> without requiring a change of any Xen component. Additionally large
>>>>> domains can be started by users who don't care for migrating or
>>>>> suspending them.
>>>>>
>>>>> So I'd rather keep my patch as posted.
>>>>
>>>> PV guests can have extra memory added, beyond their initial limit.
>>>> Supporting this would require option 2.
>>>
>>> I don't see why this should require option 2.
>>
>> Um...
>>
>>> Option 1 only prohibits suspending/migrating a domain with more than
>>> 512 GB.
>>
>> ...this is the reason.
>>
>> With the exception of VMs that have assigned direct access to hardware,
>> migration is an essential feature and must be supported.
> 
> So you'd prefer:
> 
> 1) >512GB pv-domains (including Dom0) will be supported only with new
>    Xen (4.6?), no matter if the user requires migration to be supported

Yes.  >512 GiB and not being able to migrate are not obviously related
from the point of view of the end user (unlike assigning a PCI device).

Failing at domain save time is most likely too late for the end user.

> to:
> 
> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can
>    be started on current Xen versions, migration is possible only if Xen
>    is new (4.6?)

There's also my preferred option:

3) >512 GiB PV domains are not supported.  Large guests must be PVH or
PVHVM.

> What is the common use case for migration? I doubt it is used very often
> for really huge domains.

XenServer uses it for server pool upgrades with no VM downtime.

Also, today's huge VM is tomorrow's common size.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-15 14:30                     ` David Vrabel
@ 2014-09-16  3:52                       ` Juergen Gross
  2014-09-16 10:14                         ` David Vrabel
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-16  3:52 UTC (permalink / raw)
  To: David Vrabel, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 09/15/2014 04:30 PM, David Vrabel wrote:
> On 15/09/14 11:46, Juergen Gross wrote:
>> On 09/15/2014 12:30 PM, David Vrabel wrote:
>>> On 15/09/14 10:52, Juergen Gross wrote:
>>>> On 09/15/2014 11:44 AM, David Vrabel wrote:
>>>>> On 15/09/14 09:52, Juergen Gross wrote:
>>>>>> On 09/15/2014 10:29 AM, Andrew Cooper wrote:
>>>>>>>
>>>>>>> On 12/09/2014 11:31, Juergen Gross wrote:
>>>>>>>> On 09/09/2014 12:49 PM, Juergen Gross wrote:
>>>>>>>>> On 09/09/2014 12:27 PM, Andrew Cooper wrote:
>>>>>>>>>> On 09/09/14 10:58, Juergen Gross wrote:
>>>>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>>>>> currently contains the mfn of the top level page frame of the 3
>>>>>>>>>>> level
>>>>>>>>>>> p2m tree, which is used by the Xen tools during saving and
>>>>>>>>>>> restoring
>>>>>>>>>>> (and live migration) of pv domains. With three levels of the p2m
>>>>>>>>>>> tree
>>>>>>>>>>> it is possible to support up to 512 GB of RAM for a 64 bit pv
>>>>>>>>>>> domain.
>>>>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold
>>>>>>>>>>> 1024
>>>>>>>>>>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>>>>>>>>>>> support more RAM on x86-64 an additional level is to be added.
>>>>>>>>>>>
>>>>>>>>>>> This patch expands struct arch_shared_info with a new p2m tree
>>>>>>>>>>> root
>>>>>>>>>>> and the number of levels of the p2m tree. The new information is
>>>>>>>>>>> indicated by the domain to be valid by storing ~0UL into
>>>>>>>>>>> pfn_to_mfn_frame_list_list (this should be done only if more than
>>>>>>>>>>> three levels are needed, of course).
>>>>>>>>>>
>>>>>>>>>> A small domain feeling a little tight on space could easily opt
>>>>>>>>>> for
>>>>>>>>>> a 2
>>>>>>>>>> or even 1 level p2m.  (After all, one advantage of virt is to cram
>>>>>>>>>> many
>>>>>>>>>> small VMs into a server).
>>>>>>>>>>
>>>>>>>>>> How is xen and toolstack support for n-level p2ms going to be
>>>>>>>>>> advertised
>>>>>>>>>> to guests?  Simply assuming the toolstack is capable of dealing
>>>>>>>>>> with
>>>>>>>>>> this new scheme wont work with a new pv guest running on an older
>>>>>>>>>> Xen.
>>>>>>>>>
>>>>>>>>> Is it really worth doing such an optimization? This would save only
>>>>>>>>> very
>>>>>>>>> few pages.
>>>>>>>>>
>>>>>>>>> If you think it should be done we can add another SIF_* flag to
>>>>>>>>> start_info->flags. In this case a domain using this feature could
>>>>>>>>> not be
>>>>>>>>> migrated to a server with old tools, however. So we would probably
>>>>>>>>> end
>>>>>>>>> with the need to be able to suppress that flag on a per-domain
>>>>>>>>> base.
>>>>>>>>
>>>>>>>> Any further comments?
>>>>>>>>
>>>>>>>> Which way should I go?
>>>>>>>>
>>>>>>>
>>>>>>> There are two approaches, with different up/downsides
>>>>>>>
>>>>>>> 1) continue to use the old method, and use the new method only when
>>>>>>> absolutely required.  This will function, but on old toolstacks,
>>>>>>> suffer
>>>>>>> migration/suspend failures when the toolstack fails to find the p2m.
>>>>>>>
>>>>>>> 2) Provide a Xen feature flag indicating the presence of N-level p2m
>>>>>>> support.  Guests which can see this flag are free to use N-level, and
>>>>>>> guests which can't are not.
>>>>>>>
>>>>>>> Ultimately, giving more than 512GB to a current 64bit PV domain is
>>>>>>> not
>>>>>>> going to work, and the choice above depends on which failure mode you
>>>>>>> wish a new/old mix to have.
>>>>>>
>>>>>> I'd prefer solution 1), as it will enable Dom0 with more than 512 GB
>>>>>> without requiring a change of any Xen component. Additionally large
>>>>>> domains can be started by users who don't care for migrating or
>>>>>> suspending them.
>>>>>>
>>>>>> So I'd rather keep my patch as posted.
>>>>>
>>>>> PV guests can have extra memory added, beyond their initial limit.
>>>>> Supporting this would require option 2.
>>>>
>>>> I don't see why this should require option 2.
>>>
>>> Um...
>>>
>>>> Option 1 only prohibits suspending/migrating a domain with more than
>>>> 512 GB.
>>>
>>> ...this is the reason.
>>>
>>> With the exception of VMs that have assigned direct access to hardware,
>>> migration is an essential feature and must be supported.
>>
>> So you'd prefer:
>>
>> 1) >512GB pv-domains (including Dom0) will be supported only with new
>>     Xen (4.6?), no matter if the user requires migration to be supported
>
> Yes.  >512 GiB and not being able to migrate are not obviously related
> from the point of view of the end user (unlike assigning a PCI device).
>
> Failing at domain save time is most likely too late for the end user.

What would you think about following compromise:

We add a flag that indicates support of multi-level p2m. Additionally
the Linux kernel can ignore the flag not being set either if started as
Dom0 or if told so via kernel parameter.

>
>> to:
>>
>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can
>>     be started on current Xen versions, migration is possible only if Xen
>>     is new (4.6?)
>
> There's also my preferred option:
>
> 3) >512 GiB PV domains are not supported.  Large guests must be PVH or
> PVHVM.

In theory okay, but not right now, I think. PVH Dom0 is not production
ready.


Juergen

>
>> What is the common use case for migration? I doubt it is used very often
>> for really huge domains.
>
> XenServer uses it for server pool upgrades with no VM downtime.
>
> Also, today's huge VM is tomorrow's common size.
>
> David
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-16  3:52                       ` Juergen Gross
@ 2014-09-16 10:14                         ` David Vrabel
  2014-09-16 10:38                           ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: David Vrabel @ 2014-09-16 10:14 UTC (permalink / raw)
  To: Juergen Gross, David Vrabel, Andrew Cooper, ian.campbell,
	ian.jackson, jbeulich, keir, tim, xen-devel

On 16/09/14 04:52, Juergen Gross wrote:
> On 09/15/2014 04:30 PM, David Vrabel wrote:
>> On 15/09/14 11:46, Juergen Gross wrote:
>>> So you'd prefer:
>>>
>>> 1) >512GB pv-domains (including Dom0) will be supported only with new
>>>     Xen (4.6?), no matter if the user requires migration to be supported
>>
>> Yes.  >512 GiB and not being able to migrate are not obviously related
>> from the point of view of the end user (unlike assigning a PCI device).
>>
>> Failing at domain save time is most likely too late for the end user.
> 
> What would you think about following compromise:
> 
> We add a flag that indicates support of multi-level p2m. Additionally
> the Linux kernel can ignore the flag not being set either if started as
> Dom0 or if told so via kernel parameter.

This sounds fine but this override should be via the command line
parameter only.  Crash dump analysis tools may not understand the 4
level p2m.

>>> to:
>>>
>>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can
>>>     be started on current Xen versions, migration is possible only if
>>> Xen
>>>     is new (4.6?)
>>
>> There's also my preferred option:
>>
>> 3) >512 GiB PV domains are not supported.  Large guests must be PVH or
>> PVHVM.
> 
> In theory okay, but not right now, I think. PVH Dom0 is not production
> ready.

I'm not really seeing the need for such a large dom0.

I remain unconvinced that there are sufficient use cases to justify
extending the PV only ABI and increasing complexity of the current
3-level p2m code.

I'm concerned that 4-level p2m support will impact the performance of
guests that do not need the 4 levels.  It may be necessary to use the
alternatives mechanism to select the correct low-level lookup function.

I also think a flat array for the p2m might be better (less complex).
There's plenty of virtual address space in a 64-bit guest to allow for this.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-16 10:14                         ` David Vrabel
@ 2014-09-16 10:38                           ` Juergen Gross
  2014-09-16 11:56                             ` David Vrabel
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-16 10:38 UTC (permalink / raw)
  To: David Vrabel, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 09/16/2014 12:14 PM, David Vrabel wrote:
> On 16/09/14 04:52, Juergen Gross wrote:
>> On 09/15/2014 04:30 PM, David Vrabel wrote:
>>> On 15/09/14 11:46, Juergen Gross wrote:
>>>> So you'd prefer:
>>>>
>>>> 1) >512GB pv-domains (including Dom0) will be supported only with new
>>>>      Xen (4.6?), no matter if the user requires migration to be supported
>>>
>>> Yes.  >512 GiB and not being able to migrate are not obviously related
>>> from the point of view of the end user (unlike assigning a PCI device).
>>>
>>> Failing at domain save time is most likely too late for the end user.
>>
>> What would you think about following compromise:
>>
>> We add a flag that indicates support of multi-level p2m. Additionally
>> the Linux kernel can ignore the flag not being set either if started as
>> Dom0 or if told so via kernel parameter.
>
> This sounds fine but this override should be via the command line
> parameter only.  Crash dump analysis tools may not understand the 4
> level p2m.
>
>>>> to:
>>>>
>>>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw access) can
>>>>      be started on current Xen versions, migration is possible only if
>>>> Xen
>>>>      is new (4.6?)
>>>
>>> There's also my preferred option:
>>>
>>> 3) >512 GiB PV domains are not supported.  Large guests must be PVH or
>>> PVHVM.
>>
>> In theory okay, but not right now, I think. PVH Dom0 is not production
>> ready.
>
> I'm not really seeing the need for such a large dom0.

Okay, then I'd come back to V1 of my patches. This is the minimum
required to be able to boot up a system with Xen and more than 512GB
memory without having to reduce the Dom0 memory via Xen boot parameter.

Otherwise the hypervisor built mfn_list mapped into the initial address
space will be too large.

And no, I don't think setting the boot parameter is the solution here.
Dom0 should be usable on a huge machine without special parameters.

>
> I remain unconvinced that there are sufficient use cases to justify
> extending the PV only ABI and increasing complexity of the current
> 3-level p2m code.
>
> I'm concerned that 4-level p2m support will impact the performance of
> guests that do not need the 4 levels.  It may be necessary to use the
> alternatives mechanism to select the correct low-level lookup function.

I'll try to get some numbers to post together with a patch.

> I also think a flat array for the p2m might be better (less complex).
> There's plenty of virtual address space in a 64-bit guest to allow for this.

Hmm, do you think we could reserve an area of many GBs for Xen in
virtual space? I suspect this would be rejected as another "Xen-ism".

BTW: the mfn_list_list will still be required to be built as a tree.


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-16 10:38                           ` Juergen Gross
@ 2014-09-16 11:56                             ` David Vrabel
  2014-09-16 12:44                               ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: David Vrabel @ 2014-09-16 11:56 UTC (permalink / raw)
  To: Juergen Gross, David Vrabel, Andrew Cooper, ian.campbell,
	ian.jackson, jbeulich, keir, tim, xen-devel

On 16/09/14 11:38, Juergen Gross wrote:
> On 09/16/2014 12:14 PM, David Vrabel wrote:
>> On 16/09/14 04:52, Juergen Gross wrote:
>>> On 09/15/2014 04:30 PM, David Vrabel wrote:
>>>> On 15/09/14 11:46, Juergen Gross wrote:
>>>>> So you'd prefer:
>>>>>
>>>>> 1) >512GB pv-domains (including Dom0) will be supported only with new
>>>>>      Xen (4.6?), no matter if the user requires migration to be
>>>>> supported
>>>>
>>>> Yes.  >512 GiB and not being able to migrate are not obviously related
>>>> from the point of view of the end user (unlike assigning a PCI device).
>>>>
>>>> Failing at domain save time is most likely too late for the end user.
>>>
>>> What would you think about following compromise:
>>>
>>> We add a flag that indicates support of multi-level p2m. Additionally
>>> the Linux kernel can ignore the flag not being set either if started as
>>> Dom0 or if told so via kernel parameter.
>>
>> This sounds fine but this override should be via the command line
>> parameter only.  Crash dump analysis tools may not understand the 4
>> level p2m.
>>
>>>>> to:
>>>>>
>>>>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw
>>>>> access) can
>>>>>      be started on current Xen versions, migration is possible only if
>>>>> Xen
>>>>>      is new (4.6?)
>>>>
>>>> There's also my preferred option:
>>>>
>>>> 3) >512 GiB PV domains are not supported.  Large guests must be PVH or
>>>> PVHVM.
>>>
>>> In theory okay, but not right now, I think. PVH Dom0 is not production
>>> ready.
>>
>> I'm not really seeing the need for such a large dom0.
> 
> Okay, then I'd come back to V1 of my patches. This is the minimum
> required to be able to boot up a system with Xen and more than 512GB
> memory without having to reduce the Dom0 memory via Xen boot parameter.
> 
> Otherwise the hypervisor built mfn_list mapped into the initial address
> space will be too large.
> 
> And no, I don't think setting the boot parameter is the solution here.
> Dom0 should be usable on a huge machine without special parameters.

Ok. The case where's dom0's p2m format matters is pretty specialized.

>> I also think a flat array for the p2m might be better (less complex).
>> There's plenty of virtual address space in a 64-bit guest to allow for
>> this.
> 
> Hmm, do you think we could reserve an area of many GBs for Xen in
> virtual space? I suspect this would be rejected as another "Xen-ism".

alloc_vm_area()

> BTW: the mfn_list_list will still be required to be built as a tree.

The tools could be given the guest virtual address and walk the guest
page tables.

This is probably too much of a difference from the existing ABI to be
worth pursuing at this point.

David

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-16 11:56                             ` David Vrabel
@ 2014-09-16 12:44                               ` Juergen Gross
  2014-09-17  4:25                                 ` Juergen Gross
  0 siblings, 1 reply; 23+ messages in thread
From: Juergen Gross @ 2014-09-16 12:44 UTC (permalink / raw)
  To: David Vrabel, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 09/16/2014 01:56 PM, David Vrabel wrote:
> On 16/09/14 11:38, Juergen Gross wrote:
>> On 09/16/2014 12:14 PM, David Vrabel wrote:
>>> On 16/09/14 04:52, Juergen Gross wrote:
>>>> On 09/15/2014 04:30 PM, David Vrabel wrote:
>>>>> On 15/09/14 11:46, Juergen Gross wrote:
>>>>>> So you'd prefer:
>>>>>>
>>>>>> 1) >512GB pv-domains (including Dom0) will be supported only with new
>>>>>>       Xen (4.6?), no matter if the user requires migration to be
>>>>>> supported
>>>>>
>>>>> Yes.  >512 GiB and not being able to migrate are not obviously related
>>>>> from the point of view of the end user (unlike assigning a PCI device).
>>>>>
>>>>> Failing at domain save time is most likely too late for the end user.
>>>>
>>>> What would you think about following compromise:
>>>>
>>>> We add a flag that indicates support of multi-level p2m. Additionally
>>>> the Linux kernel can ignore the flag not being set either if started as
>>>> Dom0 or if told so via kernel parameter.
>>>
>>> This sounds fine but this override should be via the command line
>>> parameter only.  Crash dump analysis tools may not understand the 4
>>> level p2m.
>>>
>>>>>> to:
>>>>>>
>>>>>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw
>>>>>> access) can
>>>>>>       be started on current Xen versions, migration is possible only if
>>>>>> Xen
>>>>>>       is new (4.6?)
>>>>>
>>>>> There's also my preferred option:
>>>>>
>>>>> 3) >512 GiB PV domains are not supported.  Large guests must be PVH or
>>>>> PVHVM.
>>>>
>>>> In theory okay, but not right now, I think. PVH Dom0 is not production
>>>> ready.
>>>
>>> I'm not really seeing the need for such a large dom0.
>>
>> Okay, then I'd come back to V1 of my patches. This is the minimum
>> required to be able to boot up a system with Xen and more than 512GB
>> memory without having to reduce the Dom0 memory via Xen boot parameter.
>>
>> Otherwise the hypervisor built mfn_list mapped into the initial address
>> space will be too large.
>>
>> And no, I don't think setting the boot parameter is the solution here.
>> Dom0 should be usable on a huge machine without special parameters.
>
> Ok. The case where's dom0's p2m format matters is pretty specialized.
>
>>> I also think a flat array for the p2m might be better (less complex).
>>> There's plenty of virtual address space in a 64-bit guest to allow for
>>> this.
>>
>> Hmm, do you think we could reserve an area of many GBs for Xen in
>> virtual space? I suspect this would be rejected as another "Xen-ism".
>
> alloc_vm_area()

Nice idea, but alloc_vm_area() allocates ptes for the whole area.
__get_vm_area() would be better, I think.

>
>> BTW: the mfn_list_list will still be required to be built as a tree.
>
> The tools could be given the guest virtual address and walk the guest
> page tables.
>
> This is probably too much of a difference from the existing ABI to be
> worth pursuing at this point.

Okay, coming back to the main question:

What to do regarding support of >512GB domains:

1. we need another level of the p2m map
2. we are trying the linear p2m table
    a) with a 4 level mfn_list_list
    b) with access to the p2m table via page tables
3. my V1 patches are okay, as they enable Dom0 to start on machines
    with huge memory


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-16 12:44                               ` Juergen Gross
@ 2014-09-17  4:25                                 ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2014-09-17  4:25 UTC (permalink / raw)
  To: David Vrabel, Andrew Cooper, ian.campbell, ian.jackson, jbeulich,
	keir, tim, xen-devel

On 09/16/2014 02:44 PM, Juergen Gross wrote:
> On 09/16/2014 01:56 PM, David Vrabel wrote:
>> On 16/09/14 11:38, Juergen Gross wrote:
>>> On 09/16/2014 12:14 PM, David Vrabel wrote:
>>>> On 16/09/14 04:52, Juergen Gross wrote:
>>>>> On 09/15/2014 04:30 PM, David Vrabel wrote:
>>>>>> On 15/09/14 11:46, Juergen Gross wrote:
>>>>>>> So you'd prefer:
>>>>>>>
>>>>>>> 1) >512GB pv-domains (including Dom0) will be supported only with
>>>>>>> new
>>>>>>>       Xen (4.6?), no matter if the user requires migration to be
>>>>>>> supported
>>>>>>
>>>>>> Yes.  >512 GiB and not being able to migrate are not obviously
>>>>>> related
>>>>>> from the point of view of the end user (unlike assigning a PCI
>>>>>> device).
>>>>>>
>>>>>> Failing at domain save time is most likely too late for the end user.
>>>>>
>>>>> What would you think about following compromise:
>>>>>
>>>>> We add a flag that indicates support of multi-level p2m. Additionally
>>>>> the Linux kernel can ignore the flag not being set either if
>>>>> started as
>>>>> Dom0 or if told so via kernel parameter.
>>>>
>>>> This sounds fine but this override should be via the command line
>>>> parameter only.  Crash dump analysis tools may not understand the 4
>>>> level p2m.
>>>>
>>>>>>> to:
>>>>>>>
>>>>>>> 2) >512GB pv-domains (especially Dom0 and VMs with direct hw
>>>>>>> access) can
>>>>>>>       be started on current Xen versions, migration is possible
>>>>>>> only if
>>>>>>> Xen
>>>>>>>       is new (4.6?)
>>>>>>
>>>>>> There's also my preferred option:
>>>>>>
>>>>>> 3) >512 GiB PV domains are not supported.  Large guests must be
>>>>>> PVH or
>>>>>> PVHVM.
>>>>>
>>>>> In theory okay, but not right now, I think. PVH Dom0 is not production
>>>>> ready.
>>>>
>>>> I'm not really seeing the need for such a large dom0.
>>>
>>> Okay, then I'd come back to V1 of my patches. This is the minimum
>>> required to be able to boot up a system with Xen and more than 512GB
>>> memory without having to reduce the Dom0 memory via Xen boot parameter.
>>>
>>> Otherwise the hypervisor built mfn_list mapped into the initial address
>>> space will be too large.
>>>
>>> And no, I don't think setting the boot parameter is the solution here.
>>> Dom0 should be usable on a huge machine without special parameters.
>>
>> Ok. The case where's dom0's p2m format matters is pretty specialized.
>>
>>>> I also think a flat array for the p2m might be better (less complex).
>>>> There's plenty of virtual address space in a 64-bit guest to allow for
>>>> this.
>>>
>>> Hmm, do you think we could reserve an area of many GBs for Xen in
>>> virtual space? I suspect this would be rejected as another "Xen-ism".
>>
>> alloc_vm_area()
>
> Nice idea, but alloc_vm_area() allocates ptes for the whole area.
> __get_vm_area() would be better, I think.
>
>>
>>> BTW: the mfn_list_list will still be required to be built as a tree.
>>
>> The tools could be given the guest virtual address and walk the guest
>> page tables.
>>
>> This is probably too much of a difference from the existing ABI to be
>> worth pursuing at this point.
>
> Okay, coming back to the main question:
>
> What to do regarding support of >512GB domains:
>
> 1. we need another level of the p2m map
> 2. we are trying the linear p2m table
>     a) with a 4 level mfn_list_list
>     b) with access to the p2m table via page tables
> 3. my V1 patches are okay, as they enable Dom0 to start on machines
>     with huge memory

I thought a little bit more about this.

I like the idea to use the virtual mapped linear p2m list. It would
remove the need to build the p2m tree at an early boot stage, as the
initial mfn_list supplied by the hypervisor can be used until the kernel
builds it's own list.

I'll try to create patch doing this. As this is not affecting the
initial mapping of initrd and mfn_list I've posted V2 of my patches
to eliminate some of the limitations of those initial mappings.

Whether the mfn_list_list should be kept as a tree or (if indicated by
a flag to be supported) is accessed via page table walk of the tools
can be decided later.


Juergen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
  2014-09-09  9:58 ` [PATCH V3 1/1] expand x86 arch_shared_info to " Juergen Gross
  2014-09-09 10:27   ` Andrew Cooper
@ 2014-09-30  8:53   ` Jan Beulich
       [not found]   ` <542A8B93020000780003AE7B@suse.com>
  2 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2014-09-30  8:53 UTC (permalink / raw)
  To: David Vrabel, Juergen Gross
  Cc: keir, tim, ian.jackson, ian.campbell, xen-devel

>>> On 09.09.14 at 11:58, <"jgross@suse.com".non-mime.internet> wrote:
> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
> currently contains the mfn of the top level page frame of the 3 level
> p2m tree, which is used by the Xen tools during saving and restoring
> (and live migration) of pv domains. With three levels of the p2m tree
> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
> A 32 bit pv domain can support more, as each memory page can hold 1024
> instead of 512 entries, leading to a limit of 4 TB. To be able to
> support more RAM on x86-64 an additional level is to be added.
> 
> This patch expands struct arch_shared_info with a new p2m tree root
> and the number of levels of the p2m tree. The new information is
> indicated by the domain to be valid by storing ~0UL into
> pfn_to_mfn_frame_list_list (this should be done only if more than
> three levels are needed, of course).
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Still having this in my to-be-committed-or-otherwise list, David -
you had reservations here. Did they get addressed by Jürgen?
Is there any alternative proposal? Or are we setting this aside
until after 4.5?

Thanks, Jan

> ---
>  xen/include/public/arch-x86/xen.h | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/include/public/arch-x86/xen.h 
> b/xen/include/public/arch-x86/xen.h
> index f35804b..2ca996c 100644
> --- a/xen/include/public/arch-x86/xen.h
> +++ b/xen/include/public/arch-x86/xen.h
> @@ -224,7 +224,12 @@ struct arch_shared_info {
>      /* Frame containing list of mfns containing list of mfns containing 
> p2m. */
>      xen_pfn_t     pfn_to_mfn_frame_list_list;
>      unsigned long nmi_reason;
> -    uint64_t pad[32];
> +    /*
> +     * Following two fields are valid if pfn_to_mfn_frame_list_list 
> contains
> +     * ~0UL.
> +     */
> +    unsigned long p2m_levels;   /* number of levels of p2m tree */
> +    xen_pfn_t     p2m_root;     /* p2m tree top level mfn */
>  };
>  typedef struct arch_shared_info arch_shared_info_t;
>  
> -- 
> 1.8.4.5



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

[parent not found: <542A8B93020000780003AE7B@suse.com>]

* Re: [PATCH V3 1/1] expand x86 arch_shared_info to support >3 level p2m tree
       [not found]   ` <542A8B93020000780003AE7B@suse.com>
@ 2014-09-30  8:59     ` Juergen Gross
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Gross @ 2014-09-30  8:59 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel; +Cc: keir, tim, ian.jackson, ian.campbell, xen-devel

On 09/30/2014 10:53 AM, Jan Beulich wrote:
>>>> On 09.09.14 at 11:58, <"jgross@suse.com".non-mime.internet> wrote:
>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>> currently contains the mfn of the top level page frame of the 3 level
>> p2m tree, which is used by the Xen tools during saving and restoring
>> (and live migration) of pv domains. With three levels of the p2m tree
>> it is possible to support up to 512 GB of RAM for a 64 bit pv domain.
>> A 32 bit pv domain can support more, as each memory page can hold 1024
>> instead of 512 entries, leading to a limit of 4 TB. To be able to
>> support more RAM on x86-64 an additional level is to be added.
>>
>> This patch expands struct arch_shared_info with a new p2m tree root
>> and the number of levels of the p2m tree. The new information is
>> indicated by the domain to be valid by storing ~0UL into
>> pfn_to_mfn_frame_list_list (this should be done only if more than
>> three levels are needed, of course).
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>
> Still having this in my to-be-committed-or-otherwise list, David -
> you had reservations here. Did they get addressed by Jürgen?
> Is there any alternative proposal? Or are we setting this aside
> until after 4.5?

David had the alternative proposal to use a virtual mapped linear
mfn_list with the tools doing the translation as needed. I'm just
trying to do this, so please ignore my patch.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-09-30  8:59 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-09  9:58 [PATCH V3 0/1] support >3 level p2m tree Juergen Gross
2014-09-09  9:58 ` [PATCH V3 1/1] expand x86 arch_shared_info to " Juergen Gross
2014-09-09 10:27   ` Andrew Cooper
2014-09-09 10:49     ` Juergen Gross
2014-09-12 10:31       ` Juergen Gross
2014-09-15  8:29         ` Andrew Cooper
2014-09-15  8:52           ` Juergen Gross
2014-09-15  9:42             ` Jan Beulich
2014-09-15  9:48               ` Juergen Gross
2014-09-15  9:44             ` David Vrabel
2014-09-15  9:52               ` Juergen Gross
2014-09-15 10:30                 ` David Vrabel
2014-09-15 10:46                   ` Juergen Gross
2014-09-15 11:29                     ` Jan Beulich
2014-09-15 14:30                     ` David Vrabel
2014-09-16  3:52                       ` Juergen Gross
2014-09-16 10:14                         ` David Vrabel
2014-09-16 10:38                           ` Juergen Gross
2014-09-16 11:56                             ` David Vrabel
2014-09-16 12:44                               ` Juergen Gross
2014-09-17  4:25                                 ` Juergen Gross
2014-09-30  8:53   ` Jan Beulich
     [not found]   ` <542A8B93020000780003AE7B@suse.com>
2014-09-30  8:59     ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).