* [Patch V2] support guest virtual mapped p2m list
@ 2014-12-01 9:29 Juergen Gross
2014-12-01 9:29 ` [Patch V2] expand x86 arch_shared_info to support linear " Juergen Gross
0 siblings, 1 reply; 10+ messages in thread
From: Juergen Gross @ 2014-12-01 9:29 UTC (permalink / raw)
To: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
david.vrabel, xen-devel
Cc: Juergen Gross
The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
currently contains the mfn of the top level page frame of the 3 level
p2m tree, which is used by the Xen tools during saving and restoring
(and live migration) of pv domains and for crash dump analysis. With
three levels of the p2m tree it is possible to support up to 512 GB of
RAM for a 64 bit pv domain.
A 32 bit pv domain can support more, as each memory page can hold 1024
instead of 512 entries, leading to a limit of 4 TB.
To be able to support more RAM on x86-64 switch to a virtual mapped
p2m list.
Changes in V2:
- add new structure member p2m_generation in arch_shared_info
- rename structure member referencing the p2m address space to p2m_cr3
- add some comments
- removed patches 2-4 as overriding missing XENFEAT_virtual_p2m will be
done via kernel parameter (patch 2 will be resent after Xen 4.5 is out)
Juergen Gross (1):
expand x86 arch_shared_info to support linear p2m list
xen/include/public/arch-x86/xen.h | 22 +++++++++++++++++++++-
xen/include/public/features.h | 3 +++
2 files changed, 24 insertions(+), 1 deletion(-)
--
2.1.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 9:29 [Patch V2] support guest virtual mapped p2m list Juergen Gross
@ 2014-12-01 9:29 ` Juergen Gross
2014-12-01 10:15 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Juergen Gross @ 2014-12-01 9:29 UTC (permalink / raw)
To: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
david.vrabel, xen-devel
Cc: Juergen Gross
The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
currently contains the mfn of the top level page frame of the 3 level
p2m tree, which is used by the Xen tools during saving and restoring
(and live migration) of pv domains and for crash dump analysis. With
three levels of the p2m tree it is possible to support up to 512 GB of
RAM for a 64 bit pv domain.
A 32 bit pv domain can support more, as each memory page can hold 1024
instead of 512 entries, leading to a limit of 4 TB.
To be able to support more RAM on x86-64 switch to a virtual mapped
p2m list.
This patch expands struct arch_shared_info with a new p2m list virtual
address, the root of the page table root and a p2m generation count.
The new information is indicated by the domain to be valid by storing
~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
usability of this feature by a new flag XENFEAT_virtual_p2m.
Right now XENFEAT_virtual_p2m will not be set. This will change when
the Xen tools support the virtual mapped p2m list.
Signed-off-by: Juergen Gross <jgross@suse.com>
---
xen/include/public/arch-x86/xen.h | 22 +++++++++++++++++++++-
xen/include/public/features.h | 3 +++
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/xen/include/public/arch-x86/xen.h b/xen/include/public/arch-x86/xen.h
index f35804b..14d3090 100644
--- a/xen/include/public/arch-x86/xen.h
+++ b/xen/include/public/arch-x86/xen.h
@@ -224,7 +224,27 @@ struct arch_shared_info {
/* Frame containing list of mfns containing list of mfns containing p2m. */
xen_pfn_t pfn_to_mfn_frame_list_list;
unsigned long nmi_reason;
- uint64_t pad[32];
+ /*
+ * Following three fields are valid if pfn_to_mfn_frame_list_list contains
+ * ~0UL.
+ * p2m_vaddr holds the virtual address of the linear p2m list. All entries
+ * in the range [0...max_pfn[ are accessible via this pointer.
+ * p2m_cr3 is the root of the address space where p2m_vaddr is valid.
+ * p2m_cr3 is in the same format as a cr3 value in the vcpu register state
+ * and holds the folded machine frame number (via xen_pfn_to_cr3) of a
+ * L3 or L4 page table.
+ * p2m_generation will be incremented by the guest before and after each
+ * change of the mappings of the p2m list. p2m_generation starts at 0 and
+ * a value with the least significant bit set indicates that a mapping
+ * update is in progress. This allows guest external software (e.g. in Dom0)
+ * to verify that read mappings are consistent and whether they have changed
+ * since the last check.
+ * Modifying a p2m element in the linear p2m list is allowed via an atomic
+ * write only.
+ */
+ unsigned long p2m_vaddr; /* virtual address of the p2m list */
+ unsigned long p2m_cr3; /* cr3 value of the p2m address space */
+ unsigned long p2m_generation; /* generation count of p2m mapping */
};
typedef struct arch_shared_info arch_shared_info_t;
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 16d92aa..ff0b82d 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -99,6 +99,9 @@
#define XENFEAT_grant_map_identity 12
*/
+/* x86: guest may specify virtual address of p2m list */
+#define XENFEAT_virtual_p2m 13
+
#define XENFEAT_NR_SUBMAPS 1
#endif /* __XEN_PUBLIC_FEATURES_H__ */
--
2.1.2
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 9:29 ` [Patch V2] expand x86 arch_shared_info to support linear " Juergen Gross
@ 2014-12-01 10:15 ` Jan Beulich
2014-12-01 11:19 ` David Vrabel
0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2014-12-01 10:15 UTC (permalink / raw)
To: Juergen Gross
Cc: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
david.vrabel, xen-devel
>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
> currently contains the mfn of the top level page frame of the 3 level
> p2m tree, which is used by the Xen tools during saving and restoring
> (and live migration) of pv domains and for crash dump analysis. With
> three levels of the p2m tree it is possible to support up to 512 GB of
> RAM for a 64 bit pv domain.
>
> A 32 bit pv domain can support more, as each memory page can hold 1024
> instead of 512 entries, leading to a limit of 4 TB.
>
> To be able to support more RAM on x86-64 switch to a virtual mapped
> p2m list.
>
> This patch expands struct arch_shared_info with a new p2m list virtual
> address, the root of the page table root and a p2m generation count.
> The new information is indicated by the domain to be valid by storing
> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
> usability of this feature by a new flag XENFEAT_virtual_p2m.
>
> Right now XENFEAT_virtual_p2m will not be set. This will change when
> the Xen tools support the virtual mapped p2m list.
This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
I.e. the availability of the new functionality may need to be
advertised another way - xenstore perhaps?
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 10:15 ` Jan Beulich
@ 2014-12-01 11:19 ` David Vrabel
2014-12-01 11:29 ` Jan Beulich
[not found] ` <547C5F31020000780004BB1F@suse.com>
0 siblings, 2 replies; 10+ messages in thread
From: David Vrabel @ 2014-12-01 11:19 UTC (permalink / raw)
To: Jan Beulich, Juergen Gross
Cc: keir, Ian.Campbell, andrew.cooper3, tim, xen-devel, ian.jackson
On 01/12/14 10:15, Jan Beulich wrote:
>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>> currently contains the mfn of the top level page frame of the 3 level
>> p2m tree, which is used by the Xen tools during saving and restoring
>> (and live migration) of pv domains and for crash dump analysis. With
>> three levels of the p2m tree it is possible to support up to 512 GB of
>> RAM for a 64 bit pv domain.
>>
>> A 32 bit pv domain can support more, as each memory page can hold 1024
>> instead of 512 entries, leading to a limit of 4 TB.
>>
>> To be able to support more RAM on x86-64 switch to a virtual mapped
>> p2m list.
>>
>> This patch expands struct arch_shared_info with a new p2m list virtual
>> address, the root of the page table root and a p2m generation count.
>> The new information is indicated by the domain to be valid by storing
>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>
>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>> the Xen tools support the virtual mapped p2m list.
>
> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
> I.e. the availability of the new functionality may need to be
> advertised another way - xenstore perhaps?
Xenstore doesn't work for dom0.
Shouldn't this be something the guest kernel reports using a ELF note bit?
When building a domain (either in Xen for dom0 or in the tools), the
builder may provide a linear p2m iff supported by the guest kernel and
then (and only then) can it provide a guest with > 512 GiB.
David
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 11:19 ` David Vrabel
@ 2014-12-01 11:29 ` Jan Beulich
[not found] ` <547C5F31020000780004BB1F@suse.com>
1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2014-12-01 11:29 UTC (permalink / raw)
To: David Vrabel, Juergen Gross
Cc: keir, Ian.Campbell, andrew.cooper3, tim, xen-devel, ian.jackson
>>> On 01.12.14 at 12:19, <david.vrabel@citrix.com> wrote:
> On 01/12/14 10:15, Jan Beulich wrote:
>>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>> currently contains the mfn of the top level page frame of the 3 level
>>> p2m tree, which is used by the Xen tools during saving and restoring
>>> (and live migration) of pv domains and for crash dump analysis. With
>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>> RAM for a 64 bit pv domain.
>>>
>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>> instead of 512 entries, leading to a limit of 4 TB.
>>>
>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>> p2m list.
>>>
>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>> address, the root of the page table root and a p2m generation count.
>>> The new information is indicated by the domain to be valid by storing
>>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>>
>>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>>> the Xen tools support the virtual mapped p2m list.
>>
>> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
>> I.e. the availability of the new functionality may need to be
>> advertised another way - xenstore perhaps?
>
> Xenstore doesn't work for dom0.
>
> Shouldn't this be something the guest kernel reports using a ELF note bit?
>
> When building a domain (either in Xen for dom0 or in the tools), the
> builder may provide a linear p2m iff supported by the guest kernel and
> then (and only then) can it provide a guest with > 512 GiB.
Yes, surely this flag could act as a kernel capability indicator (via
the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
XENFEAT_dom0 already does. Jürgen's final statement, however,
suggested to me that this is meant to be only consumed by kernels.
Otoh the P2M provided by both Dom0 and DomU builders have always
been linear anyway; it's the pv-ops kernel that constructs a tree as
replacement.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
[not found] ` <547C5F31020000780004BB1F@suse.com>
@ 2014-12-01 13:11 ` Juergen Gross
2014-12-01 13:37 ` Jan Beulich
[not found] ` <547C7D13020000780004BC3D@suse.com>
0 siblings, 2 replies; 10+ messages in thread
From: Juergen Gross @ 2014-12-01 13:11 UTC (permalink / raw)
To: Jan Beulich, David Vrabel
Cc: keir, Ian.Campbell, andrew.cooper3, tim, xen-devel, ian.jackson
On 12/01/2014 12:29 PM, Jan Beulich wrote:
>>>> On 01.12.14 at 12:19, <david.vrabel@citrix.com> wrote:
>> On 01/12/14 10:15, Jan Beulich wrote:
>>>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>> currently contains the mfn of the top level page frame of the 3 level
>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>>> RAM for a 64 bit pv domain.
>>>>
>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>
>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>> p2m list.
>>>>
>>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>>> address, the root of the page table root and a p2m generation count.
>>>> The new information is indicated by the domain to be valid by storing
>>>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>>>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>>>
>>>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>>>> the Xen tools support the virtual mapped p2m list.
>>>
>>> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
>>> I.e. the availability of the new functionality may need to be
>>> advertised another way - xenstore perhaps?
>>
>> Xenstore doesn't work for dom0.
>>
>> Shouldn't this be something the guest kernel reports using a ELF note bit?
>>
>> When building a domain (either in Xen for dom0 or in the tools), the
>> builder may provide a linear p2m iff supported by the guest kernel and
>> then (and only then) can it provide a guest with > 512 GiB.
>
> Yes, surely this flag could act as a kernel capability indicator (via
> the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
> XENFEAT_dom0 already does. Jürgen's final statement, however,
> suggested to me that this is meant to be only consumed by kernels.
Yes. The p2m list built by the domain builder is already linear. It may
just be to small to hold all entries required e.g. for Dom0.
It's Xen-tools and kdump which have to deal with the linear p2m list.
So the guest kernel has to be told if it is allowed to present the
linear list instead of the 3-level tree at pfn_to_mfn_frame_list_list.
As this is true for Dom0 as well, this information must be given by the
hypervisor.
I'm aware that XENFEAT_* is only used for hypervisor capabilities up to
now. As the Xen tools are tightly coupled to the hypervisor I don't see
why the features can't express the capability of the complete Xen
installation instead. Would you prefer introducing another leaf for
that purpose (submap.idx == 1) ?
> Otoh the P2M provided by both Dom0 and DomU builders have always
> been linear anyway; it's the pv-ops kernel that constructs a tree as
> replacement.
Correct.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 13:11 ` Juergen Gross
@ 2014-12-01 13:37 ` Jan Beulich
[not found] ` <547C7D13020000780004BC3D@suse.com>
1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2014-12-01 13:37 UTC (permalink / raw)
To: Juergen Gross
Cc: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
David Vrabel, xen-devel
>>> On 01.12.14 at 14:11, <JGross@suse.com> wrote:
> On 12/01/2014 12:29 PM, Jan Beulich wrote:
>>>>> On 01.12.14 at 12:19, <david.vrabel@citrix.com> wrote:
>>> On 01/12/14 10:15, Jan Beulich wrote:
>>>>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>>>> RAM for a 64 bit pv domain.
>>>>>
>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>
>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>> p2m list.
>>>>>
>>>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>>>> address, the root of the page table root and a p2m generation count.
>>>>> The new information is indicated by the domain to be valid by storing
>>>>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>>>>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>>>>
>>>>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>>>>> the Xen tools support the virtual mapped p2m list.
>>>>
>>>> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
>>>> I.e. the availability of the new functionality may need to be
>>>> advertised another way - xenstore perhaps?
>>>
>>> Xenstore doesn't work for dom0.
>>>
>>> Shouldn't this be something the guest kernel reports using a ELF note bit?
>>>
>>> When building a domain (either in Xen for dom0 or in the tools), the
>>> builder may provide a linear p2m iff supported by the guest kernel and
>>> then (and only then) can it provide a guest with > 512 GiB.
>>
>> Yes, surely this flag could act as a kernel capability indicator (via
>> the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
>> XENFEAT_dom0 already does. Jürgen's final statement, however,
>> suggested to me that this is meant to be only consumed by kernels.
>
> Yes. The p2m list built by the domain builder is already linear. It may
> just be to small to hold all entries required e.g. for Dom0.
>
> It's Xen-tools and kdump which have to deal with the linear p2m list.
> So the guest kernel has to be told if it is allowed to present the
> linear list instead of the 3-level tree at pfn_to_mfn_frame_list_list.
>
> As this is true for Dom0 as well, this information must be given by the
> hypervisor.
>
> I'm aware that XENFEAT_* is only used for hypervisor capabilities up to
> now. As the Xen tools are tightly coupled to the hypervisor I don't see
> why the features can't express the capability of the complete Xen
> installation instead. Would you prefer introducing another leaf for
> that purpose (submap.idx == 1) ?
That wouldn't change the odd situation of reporting a capability of
another component. That's even more of a problem for the Dom0
case, where the affected tool (kdump) isn't even under our control
(and shouldn't be).
But in the end - what's wrong with always (or conditionally upon a
CONFIG_* option and/or command line parameter and/or memory
size) filling both the old and new shared info fields? A capable tool
can determine whether the new one is valid, and an incapable tool
won't work on huge memory configs anyway.
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
[not found] ` <547C7D13020000780004BC3D@suse.com>
@ 2014-12-01 14:33 ` Juergen Gross
2014-12-01 16:28 ` Jan Beulich
0 siblings, 1 reply; 10+ messages in thread
From: Juergen Gross @ 2014-12-01 14:33 UTC (permalink / raw)
To: Jan Beulich
Cc: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
David Vrabel, xen-devel
On 12/01/2014 02:37 PM, Jan Beulich wrote:
>>>> On 01.12.14 at 14:11, <JGross@suse.com> wrote:
>> On 12/01/2014 12:29 PM, Jan Beulich wrote:
>>>>>> On 01.12.14 at 12:19, <david.vrabel@citrix.com> wrote:
>>>> On 01/12/14 10:15, Jan Beulich wrote:
>>>>>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>>>>> RAM for a 64 bit pv domain.
>>>>>>
>>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>
>>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>>> p2m list.
>>>>>>
>>>>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>>>>> address, the root of the page table root and a p2m generation count.
>>>>>> The new information is indicated by the domain to be valid by storing
>>>>>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>>>>>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>>>>>
>>>>>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>>>>>> the Xen tools support the virtual mapped p2m list.
>>>>>
>>>>> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
>>>>> I.e. the availability of the new functionality may need to be
>>>>> advertised another way - xenstore perhaps?
>>>>
>>>> Xenstore doesn't work for dom0.
>>>>
>>>> Shouldn't this be something the guest kernel reports using a ELF note bit?
>>>>
>>>> When building a domain (either in Xen for dom0 or in the tools), the
>>>> builder may provide a linear p2m iff supported by the guest kernel and
>>>> then (and only then) can it provide a guest with > 512 GiB.
>>>
>>> Yes, surely this flag could act as a kernel capability indicator (via
>>> the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
>>> XENFEAT_dom0 already does. Jürgen's final statement, however,
>>> suggested to me that this is meant to be only consumed by kernels.
>>
>> Yes. The p2m list built by the domain builder is already linear. It may
>> just be to small to hold all entries required e.g. for Dom0.
>>
>> It's Xen-tools and kdump which have to deal with the linear p2m list.
>> So the guest kernel has to be told if it is allowed to present the
>> linear list instead of the 3-level tree at pfn_to_mfn_frame_list_list.
>>
>> As this is true for Dom0 as well, this information must be given by the
>> hypervisor.
>>
>> I'm aware that XENFEAT_* is only used for hypervisor capabilities up to
>> now. As the Xen tools are tightly coupled to the hypervisor I don't see
>> why the features can't express the capability of the complete Xen
>> installation instead. Would you prefer introducing another leaf for
>> that purpose (submap.idx == 1) ?
>
> That wouldn't change the odd situation of reporting a capability of
> another component. That's even more of a problem for the Dom0
> case, where the affected tool (kdump) isn't even under our control
> (and shouldn't be).
>
> But in the end - what's wrong with always (or conditionally upon a
> CONFIG_* option and/or command line parameter and/or memory
> size) filling both the old and new shared info fields? A capable tool
> can determine whether the new one is valid, and an incapable tool
> won't work on huge memory configs anyway.
Okay, but this would require another way of reporting the validity of
the linear p2m list anchor, as setting pfn_to_mfn_frame_list_list to
an invalid value is no longer an option then.
As the shared info page is always zeroed when the domain is built we
could use a value different from 0 of e.g. the p2m_generation member
as an indicator for the validity.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 14:33 ` Juergen Gross
@ 2014-12-01 16:28 ` Jan Beulich
2014-12-01 16:39 ` Jürgen Groß
0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2014-12-01 16:28 UTC (permalink / raw)
To: Juergen Gross
Cc: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
David Vrabel, xen-devel
>>> On 01.12.14 at 15:33, <JGross@suse.com> wrote:
> On 12/01/2014 02:37 PM, Jan Beulich wrote:
>>>>> On 01.12.14 at 14:11, <JGross@suse.com> wrote:
>>> On 12/01/2014 12:29 PM, Jan Beulich wrote:
>>>>>>> On 01.12.14 at 12:19, <david.vrabel@citrix.com> wrote:
>>>>> On 01/12/14 10:15, Jan Beulich wrote:
>>>>>>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>>>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>>>>>> RAM for a 64 bit pv domain.
>>>>>>>
>>>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>>
>>>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>>>> p2m list.
>>>>>>>
>>>>>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>>>>>> address, the root of the page table root and a p2m generation count.
>>>>>>> The new information is indicated by the domain to be valid by storing
>>>>>>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>>>>>>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>>>>>>
>>>>>>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>>>>>>> the Xen tools support the virtual mapped p2m list.
>>>>>>
>>>>>> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
>>>>>> I.e. the availability of the new functionality may need to be
>>>>>> advertised another way - xenstore perhaps?
>>>>>
>>>>> Xenstore doesn't work for dom0.
>>>>>
>>>>> Shouldn't this be something the guest kernel reports using a ELF note bit?
>>>>>
>>>>> When building a domain (either in Xen for dom0 or in the tools), the
>>>>> builder may provide a linear p2m iff supported by the guest kernel and
>>>>> then (and only then) can it provide a guest with > 512 GiB.
>>>>
>>>> Yes, surely this flag could act as a kernel capability indicator (via
>>>> the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
>>>> XENFEAT_dom0 already does. Jürgen's final statement, however,
>>>> suggested to me that this is meant to be only consumed by kernels.
>>>
>>> Yes. The p2m list built by the domain builder is already linear. It may
>>> just be to small to hold all entries required e.g. for Dom0.
>>>
>>> It's Xen-tools and kdump which have to deal with the linear p2m list.
>>> So the guest kernel has to be told if it is allowed to present the
>>> linear list instead of the 3-level tree at pfn_to_mfn_frame_list_list.
>>>
>>> As this is true for Dom0 as well, this information must be given by the
>>> hypervisor.
>>>
>>> I'm aware that XENFEAT_* is only used for hypervisor capabilities up to
>>> now. As the Xen tools are tightly coupled to the hypervisor I don't see
>>> why the features can't express the capability of the complete Xen
>>> installation instead. Would you prefer introducing another leaf for
>>> that purpose (submap.idx == 1) ?
>>
>> That wouldn't change the odd situation of reporting a capability of
>> another component. That's even more of a problem for the Dom0
>> case, where the affected tool (kdump) isn't even under our control
>> (and shouldn't be).
>>
>> But in the end - what's wrong with always (or conditionally upon a
>> CONFIG_* option and/or command line parameter and/or memory
>> size) filling both the old and new shared info fields? A capable tool
>> can determine whether the new one is valid, and an incapable tool
>> won't work on huge memory configs anyway.
>
> Okay, but this would require another way of reporting the validity of
> the linear p2m list anchor, as setting pfn_to_mfn_frame_list_list to
> an invalid value is no longer an option then.
>
> As the shared info page is always zeroed when the domain is built we
> could use a value different from 0 of e.g. the p2m_generation member
> as an indicator for the validity.
Wouldn't both of the other new fields be guaranteed non-zero
when used?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Patch V2] expand x86 arch_shared_info to support linear p2m list
2014-12-01 16:28 ` Jan Beulich
@ 2014-12-01 16:39 ` Jürgen Groß
0 siblings, 0 replies; 10+ messages in thread
From: Jürgen Groß @ 2014-12-01 16:39 UTC (permalink / raw)
To: Jan Beulich
Cc: keir, Ian.Campbell, andrew.cooper3, ian.jackson, tim,
David Vrabel, xen-devel
On 12/01/2014 05:28 PM, Jan Beulich wrote:
>>>> On 01.12.14 at 15:33, <JGross@suse.com> wrote:
>> On 12/01/2014 02:37 PM, Jan Beulich wrote:
>>>>>> On 01.12.14 at 14:11, <JGross@suse.com> wrote:
>>>> On 12/01/2014 12:29 PM, Jan Beulich wrote:
>>>>>>>> On 01.12.14 at 12:19, <david.vrabel@citrix.com> wrote:
>>>>>> On 01/12/14 10:15, Jan Beulich wrote:
>>>>>>>>>> On 01.12.14 at 10:29, <JGross@suse.com> wrote:
>>>>>>>> The x86 struct arch_shared_info field pfn_to_mfn_frame_list_list
>>>>>>>> currently contains the mfn of the top level page frame of the 3 level
>>>>>>>> p2m tree, which is used by the Xen tools during saving and restoring
>>>>>>>> (and live migration) of pv domains and for crash dump analysis. With
>>>>>>>> three levels of the p2m tree it is possible to support up to 512 GB of
>>>>>>>> RAM for a 64 bit pv domain.
>>>>>>>>
>>>>>>>> A 32 bit pv domain can support more, as each memory page can hold 1024
>>>>>>>> instead of 512 entries, leading to a limit of 4 TB.
>>>>>>>>
>>>>>>>> To be able to support more RAM on x86-64 switch to a virtual mapped
>>>>>>>> p2m list.
>>>>>>>>
>>>>>>>> This patch expands struct arch_shared_info with a new p2m list virtual
>>>>>>>> address, the root of the page table root and a p2m generation count.
>>>>>>>> The new information is indicated by the domain to be valid by storing
>>>>>>>> ~0UL into pfn_to_mfn_frame_list_list. The hypervisor indicates
>>>>>>>> usability of this feature by a new flag XENFEAT_virtual_p2m.
>>>>>>>>
>>>>>>>> Right now XENFEAT_virtual_p2m will not be set. This will change when
>>>>>>>> the Xen tools support the virtual mapped p2m list.
>>>>>>>
>>>>>>> This seems wrong: XENFEAT_* only reflect hypervisor capabilities.
>>>>>>> I.e. the availability of the new functionality may need to be
>>>>>>> advertised another way - xenstore perhaps?
>>>>>>
>>>>>> Xenstore doesn't work for dom0.
>>>>>>
>>>>>> Shouldn't this be something the guest kernel reports using a ELF note bit?
>>>>>>
>>>>>> When building a domain (either in Xen for dom0 or in the tools), the
>>>>>> builder may provide a linear p2m iff supported by the guest kernel and
>>>>>> then (and only then) can it provide a guest with > 512 GiB.
>>>>>
>>>>> Yes, surely this flag could act as a kernel capability indicator (via
>>>>> the XEN_ELFNOTE_SUPPORTED_FEATURES note), like e.g.
>>>>> XENFEAT_dom0 already does. Jürgen's final statement, however,
>>>>> suggested to me that this is meant to be only consumed by kernels.
>>>>
>>>> Yes. The p2m list built by the domain builder is already linear. It may
>>>> just be to small to hold all entries required e.g. for Dom0.
>>>>
>>>> It's Xen-tools and kdump which have to deal with the linear p2m list.
>>>> So the guest kernel has to be told if it is allowed to present the
>>>> linear list instead of the 3-level tree at pfn_to_mfn_frame_list_list.
>>>>
>>>> As this is true for Dom0 as well, this information must be given by the
>>>> hypervisor.
>>>>
>>>> I'm aware that XENFEAT_* is only used for hypervisor capabilities up to
>>>> now. As the Xen tools are tightly coupled to the hypervisor I don't see
>>>> why the features can't express the capability of the complete Xen
>>>> installation instead. Would you prefer introducing another leaf for
>>>> that purpose (submap.idx == 1) ?
>>>
>>> That wouldn't change the odd situation of reporting a capability of
>>> another component. That's even more of a problem for the Dom0
>>> case, where the affected tool (kdump) isn't even under our control
>>> (and shouldn't be).
>>>
>>> But in the end - what's wrong with always (or conditionally upon a
>>> CONFIG_* option and/or command line parameter and/or memory
>>> size) filling both the old and new shared info fields? A capable tool
>>> can determine whether the new one is valid, and an incapable tool
>>> won't work on huge memory configs anyway.
>>
>> Okay, but this would require another way of reporting the validity of
>> the linear p2m list anchor, as setting pfn_to_mfn_frame_list_list to
>> an invalid value is no longer an option then.
>>
>> As the shared info page is always zeroed when the domain is built we
>> could use a value different from 0 of e.g. the p2m_generation member
>> as an indicator for the validity.
>
> Wouldn't both of the other new fields be guaranteed non-zero
> when used?
p2m_cr3 yes, I suppose (mfn 0 is never part of a guest). I'm not sure
about p2m_vaddr. A paravirtualized OS could choose to put the p2m list
at virtual address 0.
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-12-01 16:39 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-01 9:29 [Patch V2] support guest virtual mapped p2m list Juergen Gross
2014-12-01 9:29 ` [Patch V2] expand x86 arch_shared_info to support linear " Juergen Gross
2014-12-01 10:15 ` Jan Beulich
2014-12-01 11:19 ` David Vrabel
2014-12-01 11:29 ` Jan Beulich
[not found] ` <547C5F31020000780004BB1F@suse.com>
2014-12-01 13:11 ` Juergen Gross
2014-12-01 13:37 ` Jan Beulich
[not found] ` <547C7D13020000780004BC3D@suse.com>
2014-12-01 14:33 ` Juergen Gross
2014-12-01 16:28 ` Jan Beulich
2014-12-01 16:39 ` Jürgen Groß
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.