* [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
@ 2013-07-09 13:38 David Vrabel
0 siblings, 0 replies; 11+ messages in thread
From: David Vrabel @ 2013-07-09 13:38 UTC (permalink / raw)
To: xen-devel; +Cc: David Vrabel, stable
From: David Vrabel <david.vrabel@citrix.com>
If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.
There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
leave the region as RAM.
Since the obtaining the memory map for dom0 and domU are now more
different, refactor each into separate functions.
This fixes a dom0 boot failure if tboot is used (because tboot was
marking a memory region as UNUSABLE).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: stable@vger.kernel.org
---
arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++----------
1 files changed, 91 insertions(+), 23 deletions(-)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 94eac5c..3fdb6bd 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -55,6 +55,91 @@ unsigned long xen_released_pages;
*/
#define EXTRA_MEM_RATIO (10)
+static int __init xen_get_memory_map_dom0(struct e820entry *map,
+ unsigned *nr_entries)
+{
+ struct xen_memory_map memmap;
+ unsigned i;
+ int ret;
+
+ /*
+ * Dom0 requires access to machine addresses for BIOS data and
+ * MMIO (e.g. PCI) devices. The reset of the kernel expects
+ * to be able to access these through a 1:1 p2m mapping.
+ *
+ * We need to take the pseudo physical memory map and set up
+ * 1:1 mappings corresponding to the RESERVED regions and
+ * holes in the /machine/ memory map, adding/expanding the RAM
+ * region at the end of the map for the relocated RAM.
+ *
+ * This is more easily done if we just start with the machine
+ * memory map.
+ *
+ * UNUSABLE regions are awkward, they are not interesting to
+ * dom0 and Xen won't allow them to be mapped so we want to
+ * leave these as RAM in the pseudo physical map.
+ *
+ * Again, this is easiest if we take the machine memory map
+ * and change the UNUSABLE regions to RAM.
+ */
+
+ memmap.nr_entries = *nr_entries;
+ set_xen_guest_handle(memmap.buffer, map);
+
+ ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
+ if (ret < 0)
+ return ret;
+
+ for (i = 0; i < memmap.nr_entries; i++) {
+ if (map[i].type == E820_UNUSABLE)
+ map[i].type = E820_RAM;
+ }
+
+ *nr_entries = memmap.nr_entries;
+ return 0;
+}
+
+static int __init xen_get_memory_map_domu(struct e820entry *map,
+ unsigned *nr_entries)
+{
+ struct xen_memory_map memmap;
+ int ret;
+
+ /*
+ * For domUs, use the psuedo-physical memory map.
+ *
+ * If this is not available, fake up a memory map with a
+ * single region containing the initial number of pages.
+ */
+
+ memmap.nr_entries = *nr_entries;
+ set_xen_guest_handle(memmap.buffer, map);
+
+ ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
+ if (ret == -ENOSYS) {
+ unsigned long max_pfn = min(MAX_DOMAIN_PAGES,
+ xen_start_info->nr_pages);
+ memmap.nr_entries = 1;
+ map[0].addr = 0ULL;
+ map[0].size = PFN_PHYS(max_pfn);
+ /* 8MB slack (to balance backend allocations). */
+ map[0].size += 8ULL << 20;
+ map[0].type = E820_RAM;
+ } else if (ret < 0)
+ return ret;
+
+ *nr_entries = memmap.nr_entries;
+ return 0;
+}
+
+static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries)
+{
+ if (xen_initial_domain())
+ return xen_get_memory_map_dom0(map, nr_entries);
+ else
+ return xen_get_memory_map_domu(map, nr_entries);
+}
+
static void __init xen_add_extra_mem(u64 start, u64 size)
{
unsigned long pfn;
@@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
char * __init xen_memory_setup(void)
{
static struct e820entry map[E820MAX] __initdata;
+ unsigned map_nr_entries = E820MAX;
unsigned long max_pfn = xen_start_info->nr_pages;
unsigned long long mem_end;
int rc;
- struct xen_memory_map memmap;
unsigned long max_pages;
unsigned long last_pfn = 0;
unsigned long extra_pages = 0;
unsigned long populated;
int i;
- int op;
max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
mem_end = PFN_PHYS(max_pfn);
- memmap.nr_entries = E820MAX;
- set_xen_guest_handle(memmap.buffer, map);
-
- op = xen_initial_domain() ?
- XENMEM_machine_memory_map :
- XENMEM_memory_map;
- rc = HYPERVISOR_memory_op(op, &memmap);
- if (rc == -ENOSYS) {
- BUG_ON(xen_initial_domain());
- memmap.nr_entries = 1;
- map[0].addr = 0ULL;
- map[0].size = mem_end;
- /* 8MB slack (to balance backend allocations). */
- map[0].size += 8ULL << 20;
- map[0].type = E820_RAM;
- rc = 0;
- }
+ rc = xen_get_memory_map(map, &map_nr_entries);
BUG_ON(rc);
/* Make sure the Xen-supplied memory map is well-ordered. */
- sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries);
+ sanitize_e820_map(map, map_nr_entries, &map_nr_entries);
max_pages = xen_get_max_pages();
if (max_pages > max_pfn)
@@ -366,12 +434,12 @@ char * __init xen_memory_setup(void)
* this are first released.
*/
xen_released_pages = xen_set_identity_and_release(
- map, memmap.nr_entries, max_pfn);
+ map, map_nr_entries, max_pfn);
/*
* Populate back the non-RAM pages and E820 gaps that had been
* released. */
- populated = xen_populate_chunk(map, memmap.nr_entries,
+ populated = xen_populate_chunk(map, map_nr_entries,
max_pfn, &last_pfn, xen_released_pages);
xen_released_pages -= populated;
@@ -395,7 +463,7 @@ char * __init xen_memory_setup(void)
extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
extra_pages);
i = 0;
- while (i < memmap.nr_entries) {
+ while (i < map_nr_entries) {
u64 addr = map[i].addr;
u64 size = map[i].size;
u32 type = map[i].type;
--
1.7.2.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] <1373377133-11018-1-git-send-email-david.vrabel@citrix.com>
@ 2013-07-09 14:13 ` Konrad Rzeszutek Wilk
[not found] ` <20130709141329.GC24897@phenom.dumpdata.com>
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-09 14:13 UTC (permalink / raw)
To: David Vrabel; +Cc: stable, xen-devel
On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> If there are UNUSABLE regions in the machine memory map, dom0 will
> attempt to map them 1:1 which is not permitted by Xen and the kernel
> will crash.
>
> There isn't anything interesting in the UNUSABLE region that the dom0
> kernel needs access to so we can avoid making the 1:1 mapping and
> leave the region as RAM.
>
> Since the obtaining the memory map for dom0 and domU are now more
> different, refactor each into separate functions.
>
> This fixes a dom0 boot failure if tboot is used (because tboot was
> marking a memory region as UNUSABLE).
Please also include the serial log that shows the crash.
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: stable@vger.kernel.org
> ---
> arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++----------
> 1 files changed, 91 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 94eac5c..3fdb6bd 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -55,6 +55,91 @@ unsigned long xen_released_pages;
> */
> #define EXTRA_MEM_RATIO (10)
>
> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
> + unsigned *nr_entries)
> +{
> + struct xen_memory_map memmap;
> + unsigned i;
> + int ret;
> +
> + /*
> + * Dom0 requires access to machine addresses for BIOS data and
> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
> + * to be able to access these through a 1:1 p2m mapping.
> + *
> + * We need to take the pseudo physical memory map and set up
> + * 1:1 mappings corresponding to the RESERVED regions and
> + * holes in the /machine/ memory map, adding/expanding the RAM
> + * region at the end of the map for the relocated RAM.
> + *
> + * This is more easily done if we just start with the machine
> + * memory map.
> + *
> + * UNUSABLE regions are awkward, they are not interesting to
> + * dom0 and Xen won't allow them to be mapped so we want to
> + * leave these as RAM in the pseudo physical map.
We just want them as such in the P2M but not do any PTE creation for it?
Why do we care about it? We are not creating any page tables for
E820_UNUSABLE regions.
> + *
> + * Again, this is easiest if we take the machine memory map
> + * and change the UNUSABLE regions to RAM.
Won't then Linux try to map them then? In 3.9 (or was it 3.8?) and further
the page table creation starts ignoring any E820 entries that are not RAM.
See init_range_memory_mapping and its comment:
/*
* We need to iterate through the E820 memory map and create direct mappings
* for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply
* create direct mappings for all pfns from [0 to max_low_pfn) and
* [4GB to max_pfn) because of possible memory holes in high addresses
* that cannot be marked as UC by fixed/variable range MTRRs.
* Depending on the alignment of E820 ranges, this may possibly result
* in using smaller size (i.e. 4K instead of 2M or 1G) page tables.
*
So in effect you are now altering them.
> + */
> +
> + memmap.nr_entries = *nr_entries;
> + set_xen_guest_handle(memmap.buffer, map);
> +
> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
> + if (ret < 0)
> + return ret;
> +
> + for (i = 0; i < memmap.nr_entries; i++) {
> + if (map[i].type == E820_UNUSABLE)
What if the E820_UNUSABLE regions were manufactured by the BIOS? Or
somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB
box.
> + map[i].type = E820_RAM;
> + }
> +
> + *nr_entries = memmap.nr_entries;
> + return 0;
> +}
> +
> +static int __init xen_get_memory_map_domu(struct e820entry *map,
> + unsigned *nr_entries)
> +{
> + struct xen_memory_map memmap;
> + int ret;
> +
> + /*
> + * For domUs, use the psuedo-physical memory map.
> + *
> + * If this is not available, fake up a memory map with a
> + * single region containing the initial number of pages.
> + */
> +
> + memmap.nr_entries = *nr_entries;
> + set_xen_guest_handle(memmap.buffer, map);
> +
> + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> + if (ret == -ENOSYS) {
> + unsigned long max_pfn = min(MAX_DOMAIN_PAGES,
> + xen_start_info->nr_pages);
> + memmap.nr_entries = 1;
> + map[0].addr = 0ULL;
> + map[0].size = PFN_PHYS(max_pfn);
> + /* 8MB slack (to balance backend allocations). */
> + map[0].size += 8ULL << 20;
> + map[0].type = E820_RAM;
> + } else if (ret < 0)
> + return ret;
> +
> + *nr_entries = memmap.nr_entries;
> + return 0;
> +}
> +
> +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries)
> +{
> + if (xen_initial_domain())
> + return xen_get_memory_map_dom0(map, nr_entries);
> + else
> + return xen_get_memory_map_domu(map, nr_entries);
> +}
> +
> static void __init xen_add_extra_mem(u64 start, u64 size)
> {
> unsigned long pfn;
> @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
> char * __init xen_memory_setup(void)
> {
> static struct e820entry map[E820MAX] __initdata;
> + unsigned map_nr_entries = E820MAX;
>
> unsigned long max_pfn = xen_start_info->nr_pages;
> unsigned long long mem_end;
> int rc;
> - struct xen_memory_map memmap;
> unsigned long max_pages;
> unsigned long last_pfn = 0;
> unsigned long extra_pages = 0;
> unsigned long populated;
> int i;
> - int op;
>
> max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
> mem_end = PFN_PHYS(max_pfn);
>
> - memmap.nr_entries = E820MAX;
> - set_xen_guest_handle(memmap.buffer, map);
> -
> - op = xen_initial_domain() ?
> - XENMEM_machine_memory_map :
> - XENMEM_memory_map;
> - rc = HYPERVISOR_memory_op(op, &memmap);
> - if (rc == -ENOSYS) {
> - BUG_ON(xen_initial_domain());
> - memmap.nr_entries = 1;
> - map[0].addr = 0ULL;
> - map[0].size = mem_end;
> - /* 8MB slack (to balance backend allocations). */
> - map[0].size += 8ULL << 20;
> - map[0].type = E820_RAM;
> - rc = 0;
> - }
> + rc = xen_get_memory_map(map, &map_nr_entries);
> BUG_ON(rc);
>
> /* Make sure the Xen-supplied memory map is well-ordered. */
> - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries);
> + sanitize_e820_map(map, map_nr_entries, &map_nr_entries);
>
> max_pages = xen_get_max_pages();
> if (max_pages > max_pfn)
> @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void)
> * this are first released.
> */
> xen_released_pages = xen_set_identity_and_release(
> - map, memmap.nr_entries, max_pfn);
> + map, map_nr_entries, max_pfn);
>
> /*
> * Populate back the non-RAM pages and E820 gaps that had been
> * released. */
> - populated = xen_populate_chunk(map, memmap.nr_entries,
> + populated = xen_populate_chunk(map, map_nr_entries,
> max_pfn, &last_pfn, xen_released_pages);
>
> xen_released_pages -= populated;
> @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void)
> extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
> extra_pages);
> i = 0;
> - while (i < memmap.nr_entries) {
> + while (i < map_nr_entries) {
> u64 addr = map[i].addr;
> u64 size = map[i].size;
> u32 type = map[i].type;
> --
> 1.7.2.5
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] ` <20130709141329.GC24897@phenom.dumpdata.com>
@ 2013-07-09 14:44 ` David Vrabel
[not found] ` <51DC21D6.4000107@citrix.com>
1 sibling, 0 replies; 11+ messages in thread
From: David Vrabel @ 2013-07-09 14:44 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: stable, xen-devel
On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> If there are UNUSABLE regions in the machine memory map, dom0 will
>> attempt to map them 1:1 which is not permitted by Xen and the kernel
>> will crash.
>>
>> There isn't anything interesting in the UNUSABLE region that the dom0
>> kernel needs access to so we can avoid making the 1:1 mapping and
>> leave the region as RAM.
>>
>> Since the obtaining the memory map for dom0 and domU are now more
>> different, refactor each into separate functions.
>>
>> This fixes a dom0 boot failure if tboot is used (because tboot was
>> marking a memory region as UNUSABLE).
>
> Please also include the serial log that shows the crash.
It's a domain crash by Xen and it isn't useful as none of the stack is
decoded.
>> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
>> + unsigned *nr_entries)
>> +{
>> + struct xen_memory_map memmap;
>> + unsigned i;
>> + int ret;
>> +
>> + /*
>> + * Dom0 requires access to machine addresses for BIOS data and
>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
>> + * to be able to access these through a 1:1 p2m mapping.
>> + *
>> + * We need to take the pseudo physical memory map and set up
>> + * 1:1 mappings corresponding to the RESERVED regions and
>> + * holes in the /machine/ memory map, adding/expanding the RAM
>> + * region at the end of the map for the relocated RAM.
This is the key paragraph. The apparent use of the machine memory map
for dom0 is a confusing fiction.
>> + *
>> + * This is more easily done if we just start with the machine
>> + * memory map.
>> + *
>> + * UNUSABLE regions are awkward, they are not interesting to
>> + * dom0 and Xen won't allow them to be mapped so we want to
>> + * leave these as RAM in the pseudo physical map.
>
> We just want them as such in the P2M but not do any PTE creation for it?
> Why do we care about it? We are not creating any page tables for
> E820_UNUSABLE regions.
I don't follow what you're asking here.
In dom0, UNUSABLE regions in the machine memory map are RAM regions on
the pseudo-physical memory map. So instead of playing games and making
these regions special in the pseudo-physical map we just leave them as RAM.
>> + *
>> + * Again, this is easiest if we take the machine memory map
>> + * and change the UNUSABLE regions to RAM.
>
> Won't then Linux try to map them then? In 3.9 (or was it 3.8?) and further
> the page table creation starts ignoring any E820 entries that are not RAM.
> See init_range_memory_mapping and its comment:
Yes. They are just regular RAM in the pseudo-physical map.
> /*
> * We need to iterate through the E820 memory map and create direct mappings
> * for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply
> * create direct mappings for all pfns from [0 to max_low_pfn) and
> * [4GB to max_pfn) because of possible memory holes in high addresses
> * that cannot be marked as UC by fixed/variable range MTRRs.
> * Depending on the alignment of E820 ranges, this may possibly result
> * in using smaller size (i.e. 4K instead of 2M or 1G) page tables.
> *
>
>
> So in effect you are now altering them.
No.
>> + */
>> +
>> + memmap.nr_entries = *nr_entries;
>> + set_xen_guest_handle(memmap.buffer, map);
>> +
>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
>> + if (ret < 0)
>> + return ret;
>> +
>> + for (i = 0; i < memmap.nr_entries; i++) {
>> + if (map[i].type == E820_UNUSABLE)
>
> What if the E820_UNUSABLE regions were manufactured by the BIOS? Or
> somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB
> box.
The resulting memory map should be clipped by the result of the call to
xen_get_max_pages().
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] <1373377133-11018-1-git-send-email-david.vrabel@citrix.com>
2013-07-09 14:13 ` [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE Konrad Rzeszutek Wilk
[not found] ` <20130709141329.GC24897@phenom.dumpdata.com>
@ 2013-07-09 17:00 ` Aurelien Chartier
[not found] ` <51DC41A1.6010909@citrix.com>
3 siblings, 0 replies; 11+ messages in thread
From: Aurelien Chartier @ 2013-07-09 17:00 UTC (permalink / raw)
To: David Vrabel; +Cc: stable, xen-devel
On 09/07/13 14:38, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> If there are UNUSABLE regions in the machine memory map, dom0 will
> attempt to map them 1:1 which is not permitted by Xen and the kernel
> will crash.
>
> There isn't anything interesting in the UNUSABLE region that the dom0
> kernel needs access to so we can avoid making the 1:1 mapping and
> leave the region as RAM.
>
> Since the obtaining the memory map for dom0 and domU are now more
> different, refactor each into separate functions.
>
> This fixes a dom0 boot failure if tboot is used (because tboot was
> marking a memory region as UNUSABLE).
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: stable@vger.kernel.org
I tested it with Xen 4.1.3-rc and Linux 3.8.13.4 and it is fixing the
early boot crash I reported.
Tested-by: Aurelien Chartier <aurelien.chartier@citrix.com>
> ---
> arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++----------
> 1 files changed, 91 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index 94eac5c..3fdb6bd 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -55,6 +55,91 @@ unsigned long xen_released_pages;
> */
> #define EXTRA_MEM_RATIO (10)
>
> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
> + unsigned *nr_entries)
> +{
> + struct xen_memory_map memmap;
> + unsigned i;
> + int ret;
> +
> + /*
> + * Dom0 requires access to machine addresses for BIOS data and
> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
> + * to be able to access these through a 1:1 p2m mapping.
> + *
> + * We need to take the pseudo physical memory map and set up
> + * 1:1 mappings corresponding to the RESERVED regions and
> + * holes in the /machine/ memory map, adding/expanding the RAM
> + * region at the end of the map for the relocated RAM.
> + *
> + * This is more easily done if we just start with the machine
> + * memory map.
> + *
> + * UNUSABLE regions are awkward, they are not interesting to
> + * dom0 and Xen won't allow them to be mapped so we want to
> + * leave these as RAM in the pseudo physical map.
> + *
> + * Again, this is easiest if we take the machine memory map
> + * and change the UNUSABLE regions to RAM.
> + */
> +
> + memmap.nr_entries = *nr_entries;
> + set_xen_guest_handle(memmap.buffer, map);
> +
> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
> + if (ret < 0)
> + return ret;
> +
> + for (i = 0; i < memmap.nr_entries; i++) {
> + if (map[i].type == E820_UNUSABLE)
> + map[i].type = E820_RAM;
> + }
> +
> + *nr_entries = memmap.nr_entries;
> + return 0;
> +}
> +
> +static int __init xen_get_memory_map_domu(struct e820entry *map,
> + unsigned *nr_entries)
> +{
> + struct xen_memory_map memmap;
> + int ret;
> +
> + /*
> + * For domUs, use the psuedo-physical memory map.
Small typo here.
> + *
> + * If this is not available, fake up a memory map with a
> + * single region containing the initial number of pages.
> + */
> +
> + memmap.nr_entries = *nr_entries;
> + set_xen_guest_handle(memmap.buffer, map);
> +
> + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> + if (ret == -ENOSYS) {
> + unsigned long max_pfn = min(MAX_DOMAIN_PAGES,
> + xen_start_info->nr_pages);
> + memmap.nr_entries = 1;
> + map[0].addr = 0ULL;
> + map[0].size = PFN_PHYS(max_pfn);
> + /* 8MB slack (to balance backend allocations). */
> + map[0].size += 8ULL << 20;
> + map[0].type = E820_RAM;
> + } else if (ret < 0)
> + return ret;
> +
> + *nr_entries = memmap.nr_entries;
> + return 0;
> +}
> +
> +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries)
> +{
> + if (xen_initial_domain())
> + return xen_get_memory_map_dom0(map, nr_entries);
> + else
> + return xen_get_memory_map_domu(map, nr_entries);
> +}
> +
> static void __init xen_add_extra_mem(u64 start, u64 size)
> {
> unsigned long pfn;
> @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
> char * __init xen_memory_setup(void)
> {
> static struct e820entry map[E820MAX] __initdata;
> + unsigned map_nr_entries = E820MAX;
>
> unsigned long max_pfn = xen_start_info->nr_pages;
> unsigned long long mem_end;
> int rc;
> - struct xen_memory_map memmap;
> unsigned long max_pages;
> unsigned long last_pfn = 0;
> unsigned long extra_pages = 0;
> unsigned long populated;
> int i;
> - int op;
>
> max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
> mem_end = PFN_PHYS(max_pfn);
>
> - memmap.nr_entries = E820MAX;
> - set_xen_guest_handle(memmap.buffer, map);
> -
> - op = xen_initial_domain() ?
> - XENMEM_machine_memory_map :
> - XENMEM_memory_map;
> - rc = HYPERVISOR_memory_op(op, &memmap);
> - if (rc == -ENOSYS) {
> - BUG_ON(xen_initial_domain());
> - memmap.nr_entries = 1;
> - map[0].addr = 0ULL;
> - map[0].size = mem_end;
> - /* 8MB slack (to balance backend allocations). */
> - map[0].size += 8ULL << 20;
> - map[0].type = E820_RAM;
> - rc = 0;
> - }
> + rc = xen_get_memory_map(map, &map_nr_entries);
> BUG_ON(rc);
>
> /* Make sure the Xen-supplied memory map is well-ordered. */
> - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries);
> + sanitize_e820_map(map, map_nr_entries, &map_nr_entries);
>
> max_pages = xen_get_max_pages();
> if (max_pages > max_pfn)
> @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void)
> * this are first released.
> */
> xen_released_pages = xen_set_identity_and_release(
> - map, memmap.nr_entries, max_pfn);
> + map, map_nr_entries, max_pfn);
>
> /*
> * Populate back the non-RAM pages and E820 gaps that had been
> * released. */
> - populated = xen_populate_chunk(map, memmap.nr_entries,
> + populated = xen_populate_chunk(map, map_nr_entries,
> max_pfn, &last_pfn, xen_released_pages);
>
> xen_released_pages -= populated;
> @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void)
> extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
> extra_pages);
> i = 0;
> - while (i < memmap.nr_entries) {
> + while (i < map_nr_entries) {
> u64 addr = map[i].addr;
> u64 size = map[i].size;
> u32 type = map[i].type;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] ` <51DC41A1.6010909@citrix.com>
@ 2013-07-09 18:36 ` Konrad Rzeszutek Wilk
[not found] ` <20130709183647.GA10188@phenom.dumpdata.com>
1 sibling, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-09 18:36 UTC (permalink / raw)
To: Aurelien Chartier; +Cc: David Vrabel, stable, xen-devel
On Tue, Jul 09, 2013 at 06:00:17PM +0100, Aurelien Chartier wrote:
> On 09/07/13 14:38, David Vrabel wrote:
> > From: David Vrabel <david.vrabel@citrix.com>
> >
> > If there are UNUSABLE regions in the machine memory map, dom0 will
> > attempt to map them 1:1 which is not permitted by Xen and the kernel
> > will crash.
> >
> > There isn't anything interesting in the UNUSABLE region that the dom0
> > kernel needs access to so we can avoid making the 1:1 mapping and
> > leave the region as RAM.
> >
> > Since the obtaining the memory map for dom0 and domU are now more
> > different, refactor each into separate functions.
> >
> > This fixes a dom0 boot failure if tboot is used (because tboot was
> > marking a memory region as UNUSABLE).
> >
> > Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> > Cc: stable@vger.kernel.org
>
> I tested it with Xen 4.1.3-rc and Linux 3.8.13.4 and it is fixing the
> early boot crash I reported.
How did you test it? How can one reproduce this crash? Do you see
this failure with v3.10 or just with v3.8 (and earlier)?
>
> Tested-by: Aurelien Chartier <aurelien.chartier@citrix.com>
>
> > ---
> > arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++----------
> > 1 files changed, 91 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> > index 94eac5c..3fdb6bd 100644
> > --- a/arch/x86/xen/setup.c
> > +++ b/arch/x86/xen/setup.c
> > @@ -55,6 +55,91 @@ unsigned long xen_released_pages;
> > */
> > #define EXTRA_MEM_RATIO (10)
> >
> > +static int __init xen_get_memory_map_dom0(struct e820entry *map,
> > + unsigned *nr_entries)
> > +{
> > + struct xen_memory_map memmap;
> > + unsigned i;
> > + int ret;
> > +
> > + /*
> > + * Dom0 requires access to machine addresses for BIOS data and
> > + * MMIO (e.g. PCI) devices. The reset of the kernel expects
> > + * to be able to access these through a 1:1 p2m mapping.
> > + *
> > + * We need to take the pseudo physical memory map and set up
> > + * 1:1 mappings corresponding to the RESERVED regions and
> > + * holes in the /machine/ memory map, adding/expanding the RAM
> > + * region at the end of the map for the relocated RAM.
> > + *
> > + * This is more easily done if we just start with the machine
> > + * memory map.
> > + *
> > + * UNUSABLE regions are awkward, they are not interesting to
> > + * dom0 and Xen won't allow them to be mapped so we want to
> > + * leave these as RAM in the pseudo physical map.
> > + *
> > + * Again, this is easiest if we take the machine memory map
> > + * and change the UNUSABLE regions to RAM.
> > + */
> > +
> > + memmap.nr_entries = *nr_entries;
> > + set_xen_guest_handle(memmap.buffer, map);
> > +
> > + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
> > + if (ret < 0)
> > + return ret;
> > +
> > + for (i = 0; i < memmap.nr_entries; i++) {
> > + if (map[i].type == E820_UNUSABLE)
> > + map[i].type = E820_RAM;
> > + }
> > +
> > + *nr_entries = memmap.nr_entries;
> > + return 0;
> > +}
> > +
> > +static int __init xen_get_memory_map_domu(struct e820entry *map,
> > + unsigned *nr_entries)
> > +{
> > + struct xen_memory_map memmap;
> > + int ret;
> > +
> > + /*
> > + * For domUs, use the psuedo-physical memory map.
>
> Small typo here.
>
> > + *
> > + * If this is not available, fake up a memory map with a
> > + * single region containing the initial number of pages.
> > + */
> > +
> > + memmap.nr_entries = *nr_entries;
> > + set_xen_guest_handle(memmap.buffer, map);
> > +
> > + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
> > + if (ret == -ENOSYS) {
> > + unsigned long max_pfn = min(MAX_DOMAIN_PAGES,
> > + xen_start_info->nr_pages);
> > + memmap.nr_entries = 1;
> > + map[0].addr = 0ULL;
> > + map[0].size = PFN_PHYS(max_pfn);
> > + /* 8MB slack (to balance backend allocations). */
> > + map[0].size += 8ULL << 20;
> > + map[0].type = E820_RAM;
> > + } else if (ret < 0)
> > + return ret;
> > +
> > + *nr_entries = memmap.nr_entries;
> > + return 0;
> > +}
> > +
> > +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries)
> > +{
> > + if (xen_initial_domain())
> > + return xen_get_memory_map_dom0(map, nr_entries);
> > + else
> > + return xen_get_memory_map_domu(map, nr_entries);
> > +}
> > +
> > static void __init xen_add_extra_mem(u64 start, u64 size)
> > {
> > unsigned long pfn;
> > @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
> > char * __init xen_memory_setup(void)
> > {
> > static struct e820entry map[E820MAX] __initdata;
> > + unsigned map_nr_entries = E820MAX;
> >
> > unsigned long max_pfn = xen_start_info->nr_pages;
> > unsigned long long mem_end;
> > int rc;
> > - struct xen_memory_map memmap;
> > unsigned long max_pages;
> > unsigned long last_pfn = 0;
> > unsigned long extra_pages = 0;
> > unsigned long populated;
> > int i;
> > - int op;
> >
> > max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
> > mem_end = PFN_PHYS(max_pfn);
> >
> > - memmap.nr_entries = E820MAX;
> > - set_xen_guest_handle(memmap.buffer, map);
> > -
> > - op = xen_initial_domain() ?
> > - XENMEM_machine_memory_map :
> > - XENMEM_memory_map;
> > - rc = HYPERVISOR_memory_op(op, &memmap);
> > - if (rc == -ENOSYS) {
> > - BUG_ON(xen_initial_domain());
> > - memmap.nr_entries = 1;
> > - map[0].addr = 0ULL;
> > - map[0].size = mem_end;
> > - /* 8MB slack (to balance backend allocations). */
> > - map[0].size += 8ULL << 20;
> > - map[0].type = E820_RAM;
> > - rc = 0;
> > - }
> > + rc = xen_get_memory_map(map, &map_nr_entries);
> > BUG_ON(rc);
> >
> > /* Make sure the Xen-supplied memory map is well-ordered. */
> > - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries);
> > + sanitize_e820_map(map, map_nr_entries, &map_nr_entries);
> >
> > max_pages = xen_get_max_pages();
> > if (max_pages > max_pfn)
> > @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void)
> > * this are first released.
> > */
> > xen_released_pages = xen_set_identity_and_release(
> > - map, memmap.nr_entries, max_pfn);
> > + map, map_nr_entries, max_pfn);
> >
> > /*
> > * Populate back the non-RAM pages and E820 gaps that had been
> > * released. */
> > - populated = xen_populate_chunk(map, memmap.nr_entries,
> > + populated = xen_populate_chunk(map, map_nr_entries,
> > max_pfn, &last_pfn, xen_released_pages);
> >
> > xen_released_pages -= populated;
> > @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void)
> > extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
> > extra_pages);
> > i = 0;
> > - while (i < memmap.nr_entries) {
> > + while (i < map_nr_entries) {
> > u64 addr = map[i].addr;
> > u64 size = map[i].size;
> > u32 type = map[i].type;
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] ` <51DC21D6.4000107@citrix.com>
@ 2013-07-09 18:45 ` Konrad Rzeszutek Wilk
[not found] ` <20130709184538.GB10188@phenom.dumpdata.com>
1 sibling, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-09 18:45 UTC (permalink / raw)
To: David Vrabel; +Cc: stable, xen-devel
On Tue, Jul 09, 2013 at 03:44:38PM +0100, David Vrabel wrote:
> On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote:
> >> From: David Vrabel <david.vrabel@citrix.com>
> >>
> >> If there are UNUSABLE regions in the machine memory map, dom0 will
> >> attempt to map them 1:1 which is not permitted by Xen and the kernel
> >> will crash.
> >>
> >> There isn't anything interesting in the UNUSABLE region that the dom0
> >> kernel needs access to so we can avoid making the 1:1 mapping and
> >> leave the region as RAM.
> >>
> >> Since the obtaining the memory map for dom0 and domU are now more
> >> different, refactor each into separate functions.
> >>
> >> This fixes a dom0 boot failure if tboot is used (because tboot was
> >> marking a memory region as UNUSABLE).
> >
> > Please also include the serial log that shows the crash.
>
> It's a domain crash by Xen and it isn't useful as none of the stack is
> decoded.
Could you include the E820 at least to get a sense of where and how
this looks? As in - without tboot and then with tboot?
>
> >> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
> >> + unsigned *nr_entries)
> >> +{
> >> + struct xen_memory_map memmap;
> >> + unsigned i;
> >> + int ret;
> >> +
> >> + /*
> >> + * Dom0 requires access to machine addresses for BIOS data and
> >> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
> >> + * to be able to access these through a 1:1 p2m mapping.
> >> + *
> >> + * We need to take the pseudo physical memory map and set up
> >> + * 1:1 mappings corresponding to the RESERVED regions and
> >> + * holes in the /machine/ memory map, adding/expanding the RAM
> >> + * region at the end of the map for the relocated RAM.
>
> This is the key paragraph. The apparent use of the machine memory map
> for dom0 is a confusing fiction.
OK, but I don't follow when dom0 would be using the E820_UNUSED regions.
Is it the xen_do_chunk that is failing on said PFNs? Or is it in this
code xen_set_identity_and_release_chunk:
"217 /*
218 * If the PFNs are currently mapped, the VA mapping also needs
219 * to be updated to be 1:1.
220 */
221 for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
222 (void)HYPERVISOR_update_va_mapping(
223 (unsigned long)__va(pfn << PAGE_SHIFT),
224 mfn_pte(pfn, PAGE_KERNEL_IO), 0);
225 "
which updates the initial PTE's with the 1-1 PFN and the E820_UNUSABLE is
somehow in between two E820_RAM regions?
>
> >> + *
> >> + * This is more easily done if we just start with the machine
> >> + * memory map.
> >> + *
> >> + * UNUSABLE regions are awkward, they are not interesting to
> >> + * dom0 and Xen won't allow them to be mapped so we want to
> >> + * leave these as RAM in the pseudo physical map.
> >
> > We just want them as such in the P2M but not do any PTE creation for it?
> > Why do we care about it? We are not creating any page tables for
> > E820_UNUSABLE regions.
>
> I don't follow what you're asking here.
What code maps said PFNs.
>
> In dom0, UNUSABLE regions in the machine memory map are RAM regions on
> the pseudo-physical memory map. So instead of playing games and making
> these regions special in the pseudo-physical map we just leave them as RAM.
.. And then exposing them to the kernel to be used as normal RAM?
>
> >> + *
> >> + * Again, this is easiest if we take the machine memory map
> >> + * and change the UNUSABLE regions to RAM.
> >
> > Won't then Linux try to map them then? In 3.9 (or was it 3.8?) and further
> > the page table creation starts ignoring any E820 entries that are not RAM.
> > See init_range_memory_mapping and its comment:
>
> Yes. They are just regular RAM in the pseudo-physical map.
With your change it is. But without your change it would not map it.
>
> > /*
> > * We need to iterate through the E820 memory map and create direct mappings
> > * for only E820_RAM and E820_KERN_RESERVED regions. We cannot simply
> > * create direct mappings for all pfns from [0 to max_low_pfn) and
> > * [4GB to max_pfn) because of possible memory holes in high addresses
> > * that cannot be marked as UC by fixed/variable range MTRRs.
> > * Depending on the alignment of E820 ranges, this may possibly result
> > * in using smaller size (i.e. 4K instead of 2M or 1G) page tables.
> > *
> >
> >
> > So in effect you are now altering them.
>
> No.
>
> >> + */
> >> +
> >> + memmap.nr_entries = *nr_entries;
> >> + set_xen_guest_handle(memmap.buffer, map);
> >> +
> >> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
> >> + if (ret < 0)
> >> + return ret;
> >> +
> >> + for (i = 0; i < memmap.nr_entries; i++) {
> >> + if (map[i].type == E820_UNUSABLE)
> >
> > What if the E820_UNUSABLE regions were manufactured by the BIOS? Or
> > somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB
> > box.
>
> The resulting memory map should be clipped by the result of the call to
> xen_get_max_pages().
OK. What about the BIOS manufacturing it?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] ` <20130709183647.GA10188@phenom.dumpdata.com>
@ 2013-07-10 14:24 ` Aurelien Chartier
0 siblings, 0 replies; 11+ messages in thread
From: Aurelien Chartier @ 2013-07-10 14:24 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: David Vrabel, stable, xen-devel
On 09/07/13 19:36, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 09, 2013 at 06:00:17PM +0100, Aurelien Chartier wrote:
>> On 09/07/13 14:38, David Vrabel wrote:
>>> From: David Vrabel <david.vrabel@citrix.com>
>>>
>>> If there are UNUSABLE regions in the machine memory map, dom0 will
>>> attempt to map them 1:1 which is not permitted by Xen and the kernel
>>> will crash.
>>>
>>> There isn't anything interesting in the UNUSABLE region that the dom0
>>> kernel needs access to so we can avoid making the 1:1 mapping and
>>> leave the region as RAM.
>>>
>>> Since the obtaining the memory map for dom0 and domU are now more
>>> different, refactor each into separate functions.
>>>
>>> This fixes a dom0 boot failure if tboot is used (because tboot was
>>> marking a memory region as UNUSABLE).
>>>
>>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>>> Cc: stable@vger.kernel.org
>> I tested it with Xen 4.1.3-rc and Linux 3.8.13.4 and it is fixing the
>> early boot crash I reported.
> How did you test it? How can one reproduce this crash? Do you see
> this failure with v3.10 or just with v3.8 (and earlier)?
I tested it by booting Xen and Linux over tboot (1.7.0) with TXT/TPM
enabled. The issue is always reproducible.
I posted a log of the kernel panic on xen-devel in May if than can help:
http://lists.xen.org/archives/html/xen-devel/2013-05/msg01534.html
I did not test with v3.10 yet, but I did not see any commit that could
have fixed it. I'll give it a try though.
>> Tested-by: Aurelien Chartier <aurelien.chartier@citrix.com>
>>
>>> ---
>>> arch/x86/xen/setup.c | 114 ++++++++++++++++++++++++++++++++++++++++----------
>>> 1 files changed, 91 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
>>> index 94eac5c..3fdb6bd 100644
>>> --- a/arch/x86/xen/setup.c
>>> +++ b/arch/x86/xen/setup.c
>>> @@ -55,6 +55,91 @@ unsigned long xen_released_pages;
>>> */
>>> #define EXTRA_MEM_RATIO (10)
>>>
>>> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
>>> + unsigned *nr_entries)
>>> +{
>>> + struct xen_memory_map memmap;
>>> + unsigned i;
>>> + int ret;
>>> +
>>> + /*
>>> + * Dom0 requires access to machine addresses for BIOS data and
>>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
>>> + * to be able to access these through a 1:1 p2m mapping.
>>> + *
>>> + * We need to take the pseudo physical memory map and set up
>>> + * 1:1 mappings corresponding to the RESERVED regions and
>>> + * holes in the /machine/ memory map, adding/expanding the RAM
>>> + * region at the end of the map for the relocated RAM.
>>> + *
>>> + * This is more easily done if we just start with the machine
>>> + * memory map.
>>> + *
>>> + * UNUSABLE regions are awkward, they are not interesting to
>>> + * dom0 and Xen won't allow them to be mapped so we want to
>>> + * leave these as RAM in the pseudo physical map.
>>> + *
>>> + * Again, this is easiest if we take the machine memory map
>>> + * and change the UNUSABLE regions to RAM.
>>> + */
>>> +
>>> + memmap.nr_entries = *nr_entries;
>>> + set_xen_guest_handle(memmap.buffer, map);
>>> +
>>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
>>> + if (ret < 0)
>>> + return ret;
>>> +
>>> + for (i = 0; i < memmap.nr_entries; i++) {
>>> + if (map[i].type == E820_UNUSABLE)
>>> + map[i].type = E820_RAM;
>>> + }
>>> +
>>> + *nr_entries = memmap.nr_entries;
>>> + return 0;
>>> +}
>>> +
>>> +static int __init xen_get_memory_map_domu(struct e820entry *map,
>>> + unsigned *nr_entries)
>>> +{
>>> + struct xen_memory_map memmap;
>>> + int ret;
>>> +
>>> + /*
>>> + * For domUs, use the psuedo-physical memory map.
>> Small typo here.
>>
>>> + *
>>> + * If this is not available, fake up a memory map with a
>>> + * single region containing the initial number of pages.
>>> + */
>>> +
>>> + memmap.nr_entries = *nr_entries;
>>> + set_xen_guest_handle(memmap.buffer, map);
>>> +
>>> + ret = HYPERVISOR_memory_op(XENMEM_memory_map, &memmap);
>>> + if (ret == -ENOSYS) {
>>> + unsigned long max_pfn = min(MAX_DOMAIN_PAGES,
>>> + xen_start_info->nr_pages);
>>> + memmap.nr_entries = 1;
>>> + map[0].addr = 0ULL;
>>> + map[0].size = PFN_PHYS(max_pfn);
>>> + /* 8MB slack (to balance backend allocations). */
>>> + map[0].size += 8ULL << 20;
>>> + map[0].type = E820_RAM;
>>> + } else if (ret < 0)
>>> + return ret;
>>> +
>>> + *nr_entries = memmap.nr_entries;
>>> + return 0;
>>> +}
>>> +
>>> +static int __init xen_get_memory_map(struct e820entry *map, unsigned *nr_entries)
>>> +{
>>> + if (xen_initial_domain())
>>> + return xen_get_memory_map_dom0(map, nr_entries);
>>> + else
>>> + return xen_get_memory_map_domu(map, nr_entries);
>>> +}
>>> +
>>> static void __init xen_add_extra_mem(u64 start, u64 size)
>>> {
>>> unsigned long pfn;
>>> @@ -319,42 +404,25 @@ static void xen_align_and_add_e820_region(u64 start, u64 size, int type)
>>> char * __init xen_memory_setup(void)
>>> {
>>> static struct e820entry map[E820MAX] __initdata;
>>> + unsigned map_nr_entries = E820MAX;
>>>
>>> unsigned long max_pfn = xen_start_info->nr_pages;
>>> unsigned long long mem_end;
>>> int rc;
>>> - struct xen_memory_map memmap;
>>> unsigned long max_pages;
>>> unsigned long last_pfn = 0;
>>> unsigned long extra_pages = 0;
>>> unsigned long populated;
>>> int i;
>>> - int op;
>>>
>>> max_pfn = min(MAX_DOMAIN_PAGES, max_pfn);
>>> mem_end = PFN_PHYS(max_pfn);
>>>
>>> - memmap.nr_entries = E820MAX;
>>> - set_xen_guest_handle(memmap.buffer, map);
>>> -
>>> - op = xen_initial_domain() ?
>>> - XENMEM_machine_memory_map :
>>> - XENMEM_memory_map;
>>> - rc = HYPERVISOR_memory_op(op, &memmap);
>>> - if (rc == -ENOSYS) {
>>> - BUG_ON(xen_initial_domain());
>>> - memmap.nr_entries = 1;
>>> - map[0].addr = 0ULL;
>>> - map[0].size = mem_end;
>>> - /* 8MB slack (to balance backend allocations). */
>>> - map[0].size += 8ULL << 20;
>>> - map[0].type = E820_RAM;
>>> - rc = 0;
>>> - }
>>> + rc = xen_get_memory_map(map, &map_nr_entries);
>>> BUG_ON(rc);
>>>
>>> /* Make sure the Xen-supplied memory map is well-ordered. */
>>> - sanitize_e820_map(map, memmap.nr_entries, &memmap.nr_entries);
>>> + sanitize_e820_map(map, map_nr_entries, &map_nr_entries);
>>>
>>> max_pages = xen_get_max_pages();
>>> if (max_pages > max_pfn)
>>> @@ -366,12 +434,12 @@ char * __init xen_memory_setup(void)
>>> * this are first released.
>>> */
>>> xen_released_pages = xen_set_identity_and_release(
>>> - map, memmap.nr_entries, max_pfn);
>>> + map, map_nr_entries, max_pfn);
>>>
>>> /*
>>> * Populate back the non-RAM pages and E820 gaps that had been
>>> * released. */
>>> - populated = xen_populate_chunk(map, memmap.nr_entries,
>>> + populated = xen_populate_chunk(map, map_nr_entries,
>>> max_pfn, &last_pfn, xen_released_pages);
>>>
>>> xen_released_pages -= populated;
>>> @@ -395,7 +463,7 @@ char * __init xen_memory_setup(void)
>>> extra_pages = min(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)),
>>> extra_pages);
>>> i = 0;
>>> - while (i < memmap.nr_entries) {
>>> + while (i < map_nr_entries) {
>>> u64 addr = map[i].addr;
>>> u64 size = map[i].size;
>>> u32 type = map[i].type;
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
[not found] ` <20130709184538.GB10188@phenom.dumpdata.com>
@ 2013-07-11 11:21 ` David Vrabel
2013-07-12 19:33 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2013-07-11 11:21 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: David Vrabel, xen-devel
On 09/07/13 19:45, Konrad Rzeszutek Wilk wrote:
> On Tue, Jul 09, 2013 at 03:44:38PM +0100, David Vrabel wrote:
>> On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote:
>>>> From: David Vrabel <david.vrabel@citrix.com>
>>>>
>>>> If there are UNUSABLE regions in the machine memory map, dom0 will
>>>> attempt to map them 1:1 which is not permitted by Xen and the kernel
>>>> will crash.
>>>>
>>>> There isn't anything interesting in the UNUSABLE region that the dom0
>>>> kernel needs access to so we can avoid making the 1:1 mapping and
>>>> leave the region as RAM.
>>>>
>>>> Since the obtaining the memory map for dom0 and domU are now more
>>>> different, refactor each into separate functions.
>>>>
>>>> This fixes a dom0 boot failure if tboot is used (because tboot was
>>>> marking a memory region as UNUSABLE).
>>>
>>> Please also include the serial log that shows the crash.
>>
>> It's a domain crash by Xen and it isn't useful as none of the stack is
>> decoded.
>
> Could you include the E820 at least to get a sense of where and how
> this looks? As in - without tboot and then with tboot?
This would take time to set up the host again and I don't think
including a specific example of an E820 map with an UNUSABLE region
really adds anything useful to the commit log.
You can look at some of the previous threads for examples.
>>>> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
>>>> + unsigned *nr_entries)
>>>> +{
>>>> + struct xen_memory_map memmap;
>>>> + unsigned i;
>>>> + int ret;
>>>> +
>>>> + /*
>>>> + * Dom0 requires access to machine addresses for BIOS data and
>>>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
>>>> + * to be able to access these through a 1:1 p2m mapping.
>>>> + *
>>>> + * We need to take the pseudo physical memory map and set up
>>>> + * 1:1 mappings corresponding to the RESERVED regions and
>>>> + * holes in the /machine/ memory map, adding/expanding the RAM
>>>> + * region at the end of the map for the relocated RAM.
>>
>> This is the key paragraph. The apparent use of the machine memory map
>> for dom0 is a confusing fiction.
>
> OK, but I don't follow when dom0 would be using the E820_UNUSED regions.
> Is it the xen_do_chunk that is failing on said PFNs? Or is it in this
> code xen_set_identity_and_release_chunk:
>
> "217 /*
> 218 * If the PFNs are currently mapped, the VA mapping also needs
> 219 * to be updated to be 1:1.
> 220 */
> 221 for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
> 222 (void)HYPERVISOR_update_va_mapping(
> 223 (unsigned long)__va(pfn << PAGE_SHIFT),
> 224 mfn_pte(pfn, PAGE_KERNEL_IO), 0);
> 225 "
>
> which updates the initial PTE's with the 1-1 PFN and the E820_UNUSABLE is
> somehow in between two E820_RAM regions?
It's here, yes.
>>>> + *
>>>> + * This is more easily done if we just start with the machine
>>>> + * memory map.
>>>> + *
>>>> + * UNUSABLE regions are awkward, they are not interesting to
>>>> + * dom0 and Xen won't allow them to be mapped so we want to
>>>> + * leave these as RAM in the pseudo physical map.
>>>
>>> We just want them as such in the P2M but not do any PTE creation for it?
>>> Why do we care about it? We are not creating any page tables for
>>> E820_UNUSABLE regions.
>>
>> I don't follow what you're asking here.
>
> What code maps said PFNs.
See above.
>> In dom0, UNUSABLE regions in the machine memory map are RAM regions on
>> the pseudo-physical memory map. So instead of playing games and making
>> these regions special in the pseudo-physical map we just leave them as RAM.
>
> .. And then exposing them to the kernel to be used as normal RAM?
Yes.
> With your change it is. But without your change it would not map it.
Incorrect. See above.
>>>> + */
>>>> +
>>>> + memmap.nr_entries = *nr_entries;
>>>> + set_xen_guest_handle(memmap.buffer, map);
>>>> +
>>>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
>>>> + if (ret < 0)
>>>> + return ret;
>>>> +
>>>> + for (i = 0; i < memmap.nr_entries; i++) {
>>>> + if (map[i].type == E820_UNUSABLE)
>>>
>>> What if the E820_UNUSABLE regions were manufactured by the BIOS? Or
>>> somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB
>>> box.
>>
>> The resulting memory map should be clipped by the result of the call to
>> xen_get_max_pages().
>
> OK. What about the BIOS manufacturing it?
What about it? As a PV guest we don't care what the machine memory map
looks like, /except/ as a means to find interesting bits of hardware
that we want 1:1 mappings for.
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
2013-07-11 11:21 ` David Vrabel
@ 2013-07-12 19:33 ` Konrad Rzeszutek Wilk
2013-07-12 21:38 ` David Vrabel
0 siblings, 1 reply; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-12 19:33 UTC (permalink / raw)
To: David Vrabel; +Cc: xen-devel
On Thu, Jul 11, 2013 at 12:21:42PM +0100, David Vrabel wrote:
> On 09/07/13 19:45, Konrad Rzeszutek Wilk wrote:
> > On Tue, Jul 09, 2013 at 03:44:38PM +0100, David Vrabel wrote:
> >> On 09/07/13 15:13, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Jul 09, 2013 at 02:38:53PM +0100, David Vrabel wrote:
> >>>> From: David Vrabel <david.vrabel@citrix.com>
> >>>>
> >>>> If there are UNUSABLE regions in the machine memory map, dom0 will
> >>>> attempt to map them 1:1 which is not permitted by Xen and the kernel
> >>>> will crash.
> >>>>
> >>>> There isn't anything interesting in the UNUSABLE region that the dom0
> >>>> kernel needs access to so we can avoid making the 1:1 mapping and
> >>>> leave the region as RAM.
> >>>>
> >>>> Since the obtaining the memory map for dom0 and domU are now more
> >>>> different, refactor each into separate functions.
> >>>>
> >>>> This fixes a dom0 boot failure if tboot is used (because tboot was
> >>>> marking a memory region as UNUSABLE).
> >>>
> >>> Please also include the serial log that shows the crash.
> >>
> >> It's a domain crash by Xen and it isn't useful as none of the stack is
> >> decoded.
> >
> > Could you include the E820 at least to get a sense of where and how
> > this looks? As in - without tboot and then with tboot?
>
> This would take time to set up the host again and I don't think
> including a specific example of an E820 map with an UNUSABLE region
> really adds anything useful to the commit log.
>
> You can look at some of the previous threads for examples.
>
> >>>> +static int __init xen_get_memory_map_dom0(struct e820entry *map,
> >>>> + unsigned *nr_entries)
> >>>> +{
> >>>> + struct xen_memory_map memmap;
> >>>> + unsigned i;
> >>>> + int ret;
> >>>> +
> >>>> + /*
> >>>> + * Dom0 requires access to machine addresses for BIOS data and
> >>>> + * MMIO (e.g. PCI) devices. The reset of the kernel expects
> >>>> + * to be able to access these through a 1:1 p2m mapping.
> >>>> + *
> >>>> + * We need to take the pseudo physical memory map and set up
> >>>> + * 1:1 mappings corresponding to the RESERVED regions and
> >>>> + * holes in the /machine/ memory map, adding/expanding the RAM
> >>>> + * region at the end of the map for the relocated RAM.
> >>
> >> This is the key paragraph. The apparent use of the machine memory map
> >> for dom0 is a confusing fiction.
> >
> > OK, but I don't follow when dom0 would be using the E820_UNUSED regions.
> > Is it the xen_do_chunk that is failing on said PFNs? Or is it in this
> > code xen_set_identity_and_release_chunk:
> >
> > "217 /*
> > 218 * If the PFNs are currently mapped, the VA mapping also needs
> > 219 * to be updated to be 1:1.
> > 220 */
> > 221 for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++)
> > 222 (void)HYPERVISOR_update_va_mapping(
> > 223 (unsigned long)__va(pfn << PAGE_SHIFT),
> > 224 mfn_pte(pfn, PAGE_KERNEL_IO), 0);
> > 225 "
> >
> > which updates the initial PTE's with the 1-1 PFN and the E820_UNUSABLE is
> > somehow in between two E820_RAM regions?
>
> It's here, yes.
>
> >>>> + *
> >>>> + * This is more easily done if we just start with the machine
> >>>> + * memory map.
> >>>> + *
> >>>> + * UNUSABLE regions are awkward, they are not interesting to
> >>>> + * dom0 and Xen won't allow them to be mapped so we want to
> >>>> + * leave these as RAM in the pseudo physical map.
> >>>
> >>> We just want them as such in the P2M but not do any PTE creation for it?
> >>> Why do we care about it? We are not creating any page tables for
> >>> E820_UNUSABLE regions.
> >>
> >> I don't follow what you're asking here.
> >
> > What code maps said PFNs.
>
> See above.
So the 'HYPERVISOR_update_va_mapping' fails b/c we include it in the
xen_set_identify_and_release_chunk. Why not make the logic that sets "gaps"
and E820_RESERVED regions to omit E820_UNUSABLE regions? That would solve
it as well - and we won't be messing with the E820.
>
> >> In dom0, UNUSABLE regions in the machine memory map are RAM regions on
> >> the pseudo-physical memory map. So instead of playing games and making
> >> these regions special in the pseudo-physical map we just leave them as RAM.
> >
> > .. And then exposing them to the kernel to be used as normal RAM?
>
> Yes.
>
> > With your change it is. But without your change it would not map it.
>
> Incorrect. See above.
It would map it using the hypercall. But it would not create pagetables for it
(the generic code that is it).
>
> >>>> + */
> >>>> +
> >>>> + memmap.nr_entries = *nr_entries;
> >>>> + set_xen_guest_handle(memmap.buffer, map);
> >>>> +
> >>>> + ret = HYPERVISOR_memory_op(XENMEM_machine_memory_map, &memmap);
> >>>> + if (ret < 0)
> >>>> + return ret;
> >>>> +
> >>>> + for (i = 0; i < memmap.nr_entries; i++) {
> >>>> + if (map[i].type == E820_UNUSABLE)
> >>>
> >>> What if the E820_UNUSABLE regions were manufactured by the BIOS? Or
> >>> somebody booted Xen with mem=3G (in which we clip the memory) on a 16GB
> >>> box.
> >>
> >> The resulting memory map should be clipped by the result of the call to
> >> xen_get_max_pages().
> >
> > OK. What about the BIOS manufacturing it?
>
> What about it? As a PV guest we don't care what the machine memory map
> looks like, /except/ as a means to find interesting bits of hardware
> that we want 1:1 mappings for.
Right but now you are converting it from 1:1 to a RAM region - where we
don't do 1:1.
>
> David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
2013-07-12 19:33 ` Konrad Rzeszutek Wilk
@ 2013-07-12 21:38 ` David Vrabel
2013-07-15 12:48 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 11+ messages in thread
From: David Vrabel @ 2013-07-12 21:38 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: David Vrabel, xen-devel
On 12/07/2013 20:33, Konrad Rzeszutek Wilk wrote:
>
> So the 'HYPERVISOR_update_va_mapping' fails b/c we include it in the
> xen_set_identify_and_release_chunk.
That's what I understand.
> Why not make the logic that sets "gaps"
> and E820_RESERVED regions to omit E820_UNUSABLE regions? That would solve
> it as well - and we won't be messing with the E820.
I suppose we could do what you suggest but there would have to be a
xen_release_chunk() function added, otherwise you would waste the frames
in that region.
I remain unconvinced that adding pointless unusable regions into the
dom0's memory map makes any sense.
>>> OK. What about the BIOS manufacturing [unusable regions?
>>
>> What about it? As a PV guest we don't care what the machine memory map
>> looks like, /except/ as a means to find interesting bits of hardware
>> that we want 1:1 mappings for.
>
> Right but now you are converting it from 1:1 to a RAM region - where we
> don't do 1:1.
No, it's leaving it as a RAM region, as setup by Xen (and as marked as
such in the pseudo-physical map).
I guess I still haven't explained very well what the (confusing) setup
code is trying to do.
1. Get psuedo-physical memory map.
2. Get machine memory map.
3. For each "interesting" (reserved or MMIO hole) region in the machine
memory map:
a. Add reserved region or hole to pseudo-physical memory map.
b. Release memory.
c. Set as 1:1 in p2m.
d. Update any existing mappings.
The code as-is doesn't work like that but the end result should be the same.
David
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE
2013-07-12 21:38 ` David Vrabel
@ 2013-07-15 12:48 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 11+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-07-15 12:48 UTC (permalink / raw)
To: David Vrabel; +Cc: David Vrabel, xen-devel
On Fri, Jul 12, 2013 at 10:38:13PM +0100, David Vrabel wrote:
> On 12/07/2013 20:33, Konrad Rzeszutek Wilk wrote:
> >
> > So the 'HYPERVISOR_update_va_mapping' fails b/c we include it in the
> > xen_set_identify_and_release_chunk.
>
> That's what I understand.
>
> > Why not make the logic that sets "gaps"
> > and E820_RESERVED regions to omit E820_UNUSABLE regions? That would solve
> > it as well - and we won't be messing with the E820.
>
> I suppose we could do what you suggest but there would have to be a
> xen_release_chunk() function added, otherwise you would waste the frames
> in that region.
That is a bit different logic - but yes that would still have to work
as before and release the frames.
>
> I remain unconvinced that adding pointless unusable regions into the
> dom0's memory map makes any sense.
Please also keep in mind that domU with PCI passthrough can have an E820
that looks like the host. Meaning this is not just for dom0.
>
> >>> OK. What about the BIOS manufacturing [unusable regions?
> >>
> >> What about it? As a PV guest we don't care what the machine memory map
> >> looks like, /except/ as a means to find interesting bits of hardware
> >> that we want 1:1 mappings for.
> >
> > Right but now you are converting it from 1:1 to a RAM region - where we
> > don't do 1:1.
>
> No, it's leaving it as a RAM region, as setup by Xen (and as marked as
> such in the pseudo-physical map).
>
> I guess I still haven't explained very well what the (confusing) setup
> code is trying to do.
>
> 1. Get psuedo-physical memory map.
>
> 2. Get machine memory map.
>
> 3. For each "interesting" (reserved or MMIO hole) region in the machine
> memory map:
>
> a. Add reserved region or hole to pseudo-physical memory map.
> b. Release memory.
> c. Set as 1:1 in p2m.
> d. Update any existing mappings.
>
> The code as-is doesn't work like that but the end result should be the same.
It does right now (thought the operations are in a different order - b, d, a, c).
But your point about the E820_UNUSED throws a wrench in here. The amount of pages
that a guest has should (after re-organizing the P2M to represent the E820)
equal the amount of E820_RAM regions. I am ignoring the balloon case here.
That is still the case with 'tboot' as 'tboot' consumes some of the E820_RAM regions
and converts them in E820_UNUSED. The amount of pages that the guest gets is the
total_pages - <pages consumed for E820_UNUSED>. With your patch, you convert the
E820_UNUSED to E820_RAM. I believe that what you end up is that the amount of
E820_RAM PFNS >= nr_pages. In this case I am ignoring the usage of 'dom0_memory_max'
parameter.
Since the only code (since v3.9) that maps E820_UNUSED is the special code you
added - the HYPERVISOR_update_va_mapping in xen_set_identity_and_release_chunk,
perhaps a better mechanism is for said code to consult the E820 and not call
said update_va_mapping for the E820_UNUSED regions. After all, it is OK
_not_ to call that for 1-1 regions. It worked before - albeit it was buggy with
special (shared) 1-1 regions.
Heck, we could make that loop consult the M2P and _only_ call the update_va_mapping
on the M2P[pfn] = 0x55555555555.
Either way I think that there are two easy solutions that are also candidates for
stable tree and also make it possible to release the frames:
- Either add some extra logic in xen_set_identity_and_release_chunk to not do the
hypercall for E820_UNUSED region
- Or do a check in the M2P[pfn] and skip if M2P[pfn]==shared_mfn entry value (thought
I have no idea how hard-coded that is in the ABI) in the xen_set_identity_and_release_chunk.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-07-15 12:48 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1373377133-11018-1-git-send-email-david.vrabel@citrix.com>
2013-07-09 14:13 ` [PATCH] x86/xen: do not identity map E820 memory regions that are UNUSABLE Konrad Rzeszutek Wilk
[not found] ` <20130709141329.GC24897@phenom.dumpdata.com>
2013-07-09 14:44 ` David Vrabel
[not found] ` <51DC21D6.4000107@citrix.com>
2013-07-09 18:45 ` Konrad Rzeszutek Wilk
[not found] ` <20130709184538.GB10188@phenom.dumpdata.com>
2013-07-11 11:21 ` David Vrabel
2013-07-12 19:33 ` Konrad Rzeszutek Wilk
2013-07-12 21:38 ` David Vrabel
2013-07-15 12:48 ` Konrad Rzeszutek Wilk
2013-07-09 17:00 ` Aurelien Chartier
[not found] ` <51DC41A1.6010909@citrix.com>
2013-07-09 18:36 ` Konrad Rzeszutek Wilk
[not found] ` <20130709183647.GA10188@phenom.dumpdata.com>
2013-07-10 14:24 ` Aurelien Chartier
2013-07-09 13:38 David Vrabel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).