public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7]  xen: fix dom0 PV boot on some AMD machines
@ 2024-09-10 10:39 Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 1/7] xen: use correct end address of kernel for conflict checking Juergen Gross
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86, linux-acpi
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
	Rafael J. Wysocki, Len Brown

There have been reports of failed boots with Xen due to an overlap of
the kernel's memory with ACPI NVS reported in the E820 map of the host.

This series fixes this issue by moving the NVS area in dom0 to some
higher address.

Changes in V2:
- split of v1 patch 5
- new patch 6

Changes in V3:
- addressed comments

Juergen Gross (7):
  xen: use correct end address of kernel for conflict checking
  xen: introduce generic helper checking for memory map conflicts
  xen: move checks for e820 conflicts further up
  xen: move max_pfn in xen_memory_setup() out of function scope
  xen: add capability to remap non-RAM pages to different PFNs
  xen: allow mapping ACPI data using a different physical address
  xen: tolerate ACPI NVS memory overlapping with Xen allocated memory

 arch/x86/include/asm/acpi.h        |   8 ++
 arch/x86/kernel/acpi/boot.c        |  10 ++
 arch/x86/kernel/mmconf-fam10h_64.c |   2 +-
 arch/x86/kernel/x86_init.c         |   2 +-
 arch/x86/xen/mmu_pv.c              |   5 +-
 arch/x86/xen/p2m.c                 |  99 ++++++++++++++
 arch/x86/xen/setup.c               | 202 ++++++++++++++++++++++-------
 arch/x86/xen/xen-ops.h             |   6 +-
 8 files changed, 282 insertions(+), 52 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/7] xen: use correct end address of kernel for conflict checking
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 2/7] xen: introduce generic helper checking for memory map conflicts Juergen Gross
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
	Marek Marczykowski-Górecki, Jan Beulich

When running as a Xen PV dom0 the kernel is loaded by the hypervisor
using a different memory map than that of the host. In order to
minimize the required changes in the kernel, the kernel adapts its
memory map to that of the host. In order to do that it is checking
for conflicts of its load address with the host memory map.

Unfortunately the tested memory range does not include the .brk
area, which might result in crashes or memory corruption when this
area does conflict with the memory map of the host.

Fix the test by using the _end label instead of __bss_stop.

Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout")

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 arch/x86/xen/setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 806ddb2391d9..4bcc70a71b7d 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -825,7 +825,7 @@ char * __init xen_memory_setup(void)
 	 * to relocating (and even reusing) pages with kernel text or data.
 	 */
 	if (xen_is_e820_reserved(__pa_symbol(_text),
-			__pa_symbol(__bss_stop) - __pa_symbol(_text))) {
+				 __pa_symbol(_end) - __pa_symbol(_text))) {
 		xen_raw_console_write("Xen hypervisor allocated kernel memory conflicts with E820 map\n");
 		BUG();
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/7] xen: introduce generic helper checking for memory map conflicts
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 1/7] xen: use correct end address of kernel for conflict checking Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 3/7] xen: move checks for e820 conflicts further up Juergen Gross
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
	Marek Marczykowski-Górecki, Jan Beulich

When booting as a Xen PV dom0 the memory layout of the dom0 is
modified to match that of the host, as this requires less changes in
the kernel for supporting Xen.

There are some cases, though, which are problematic, as it is the Xen
hypervisor selecting the kernel's load address plus some other data,
which might conflict with the host's memory map.

These conflicts are detected at boot time and result in a boot error.
In order to support handling at least some of these conflicts in
future, introduce a generic helper function which will later gain the
ability to adapt the memory layout when possible.

Add the missing check for the xen_start_info area.

Note that possible p2m map and initrd memory conflicts are handled
already by copying the data to memory areas not conflicting with the
memory map. The initial stack allocated by Xen doesn't need to be
checked, as early boot code is switching to the statically allocated
initial kernel stack. Initial page tables and the kernel itself will
be handled later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 arch/x86/xen/mmu_pv.c  |  5 +----
 arch/x86/xen/setup.c   | 34 ++++++++++++++++++++++++++++------
 arch/x86/xen/xen-ops.h |  3 ++-
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index f1ce39d6d32c..839e6613753d 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2018,10 +2018,7 @@ void __init xen_reserve_special_pages(void)
 
 void __init xen_pt_check_e820(void)
 {
-	if (xen_is_e820_reserved(xen_pt_base, xen_pt_size)) {
-		xen_raw_console_write("Xen hypervisor allocated page table memory conflicts with E820 map\n");
-		BUG();
-	}
+	xen_chk_is_e820_usable(xen_pt_base, xen_pt_size, "page table");
 }
 
 static unsigned char dummy_mapping[PAGE_SIZE] __page_aligned_bss;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 4bcc70a71b7d..96765180514b 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -567,7 +567,7 @@ static void __init xen_ignore_unusable(void)
 	}
 }
 
-bool __init xen_is_e820_reserved(phys_addr_t start, phys_addr_t size)
+static bool __init xen_is_e820_reserved(phys_addr_t start, phys_addr_t size)
 {
 	struct e820_entry *entry;
 	unsigned mapcnt;
@@ -624,6 +624,23 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size)
 	return 0;
 }
 
+/*
+ * Check for an area in physical memory to be usable for non-movable purposes.
+ * An area is considered to usable if the used E820 map lists it to be RAM.
+ * In case the area is not usable, crash the system with an error message.
+ */
+void __init xen_chk_is_e820_usable(phys_addr_t start, phys_addr_t size,
+				   const char *component)
+{
+	if (!xen_is_e820_reserved(start, size))
+		return;
+
+	xen_raw_console_write("Xen hypervisor allocated ");
+	xen_raw_console_write(component);
+	xen_raw_console_write(" memory conflicts with E820 map\n");
+	BUG();
+}
+
 /*
  * Like memcpy, but with physical addresses for dest and src.
  */
@@ -824,11 +841,16 @@ char * __init xen_memory_setup(void)
 	 * Failing now is better than running into weird problems later due
 	 * to relocating (and even reusing) pages with kernel text or data.
 	 */
-	if (xen_is_e820_reserved(__pa_symbol(_text),
-				 __pa_symbol(_end) - __pa_symbol(_text))) {
-		xen_raw_console_write("Xen hypervisor allocated kernel memory conflicts with E820 map\n");
-		BUG();
-	}
+	xen_chk_is_e820_usable(__pa_symbol(_text),
+			       __pa_symbol(_end) - __pa_symbol(_text),
+			       "kernel");
+
+	/*
+	 * Check for a conflict of the xen_start_info memory with the target
+	 * E820 map.
+	 */
+	xen_chk_is_e820_usable(__pa(xen_start_info), sizeof(*xen_start_info),
+			       "xen_start_info");
 
 	/*
 	 * Check for a conflict of the hypervisor supplied page tables with
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 0cf16fc79e0b..9a27d1d653d3 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -48,7 +48,8 @@ void xen_mm_unpin_all(void);
 void __init xen_relocate_p2m(void);
 #endif
 
-bool __init xen_is_e820_reserved(phys_addr_t start, phys_addr_t size);
+void __init xen_chk_is_e820_usable(phys_addr_t start, phys_addr_t size,
+				   const char *component);
 unsigned long __ref xen_chk_extra_mem(unsigned long pfn);
 void __init xen_inv_extra_mem(void);
 void __init xen_remap_memory(void);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/7] xen: move checks for e820 conflicts further up
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 1/7] xen: use correct end address of kernel for conflict checking Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 2/7] xen: introduce generic helper checking for memory map conflicts Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 4/7] xen: move max_pfn in xen_memory_setup() out of function scope Juergen Gross
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
	Marek Marczykowski-Górecki, Jan Beulich

Move the checks for e820 memory map conflicts using the
xen_chk_is_e820_usable() helper further up in order to prepare
resolving some of the possible conflicts by doing some e820 map
modifications, which must happen before evaluating the RAM layout.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 arch/x86/xen/setup.c | 44 ++++++++++++++++++++++----------------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 96765180514b..dba68951ed6b 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -764,6 +764,28 @@ char * __init xen_memory_setup(void)
 	/* Make sure the Xen-supplied memory map is well-ordered. */
 	e820__update_table(&xen_e820_table);
 
+	/*
+	 * Check whether the kernel itself conflicts with the target E820 map.
+	 * Failing now is better than running into weird problems later due
+	 * to relocating (and even reusing) pages with kernel text or data.
+	 */
+	xen_chk_is_e820_usable(__pa_symbol(_text),
+			       __pa_symbol(_end) - __pa_symbol(_text),
+			       "kernel");
+
+	/*
+	 * Check for a conflict of the xen_start_info memory with the target
+	 * E820 map.
+	 */
+	xen_chk_is_e820_usable(__pa(xen_start_info), sizeof(*xen_start_info),
+			       "xen_start_info");
+
+	/*
+	 * Check for a conflict of the hypervisor supplied page tables with
+	 * the target E820 map.
+	 */
+	xen_pt_check_e820();
+
 	max_pages = xen_get_max_pages();
 
 	/* How many extra pages do we need due to remapping? */
@@ -836,28 +858,6 @@ char * __init xen_memory_setup(void)
 
 	e820__update_table(e820_table);
 
-	/*
-	 * Check whether the kernel itself conflicts with the target E820 map.
-	 * Failing now is better than running into weird problems later due
-	 * to relocating (and even reusing) pages with kernel text or data.
-	 */
-	xen_chk_is_e820_usable(__pa_symbol(_text),
-			       __pa_symbol(_end) - __pa_symbol(_text),
-			       "kernel");
-
-	/*
-	 * Check for a conflict of the xen_start_info memory with the target
-	 * E820 map.
-	 */
-	xen_chk_is_e820_usable(__pa(xen_start_info), sizeof(*xen_start_info),
-			       "xen_start_info");
-
-	/*
-	 * Check for a conflict of the hypervisor supplied page tables with
-	 * the target E820 map.
-	 */
-	xen_pt_check_e820();
-
 	xen_reserve_xen_mfnlist();
 
 	/* Check for a conflict of the initrd with the target E820 map. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/7] xen: move max_pfn in xen_memory_setup() out of function scope
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
                   ` (2 preceding siblings ...)
  2024-09-10 10:39 ` [PATCH v3 3/7] xen: move checks for e820 conflicts further up Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs Juergen Gross
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
	Marek Marczykowski-Górecki, Jan Beulich

Instead of having max_pfn as a local variable of xen_memory_setup(),
make it a static variable in setup.c instead. This avoids having to
pass it to subfunctions, which will be needed in more cases in future.

Rename it to ini_nr_pages, as the value denotes the currently usable
number of memory pages as passed from the hypervisor at boot time.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 arch/x86/xen/setup.c | 52 ++++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index dba68951ed6b..2c79bb5a9cd0 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -46,6 +46,9 @@ bool xen_pv_pci_possible;
 /* E820 map used during setting up memory. */
 static struct e820_table xen_e820_table __initdata;
 
+/* Number of initially usable memory pages. */
+static unsigned long ini_nr_pages __initdata;
+
 /*
  * Buffer used to remap identity mapped pages. We only need the virtual space.
  * The physical page behind this address is remapped as needed to different
@@ -212,7 +215,7 @@ static int __init xen_free_mfn(unsigned long mfn)
  * as a fallback if the remapping fails.
  */
 static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn,
-			unsigned long end_pfn, unsigned long nr_pages)
+						      unsigned long end_pfn)
 {
 	unsigned long pfn, end;
 	int ret;
@@ -220,7 +223,7 @@ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn,
 	WARN_ON(start_pfn > end_pfn);
 
 	/* Release pages first. */
-	end = min(end_pfn, nr_pages);
+	end = min(end_pfn, ini_nr_pages);
 	for (pfn = start_pfn; pfn < end; pfn++) {
 		unsigned long mfn = pfn_to_mfn(pfn);
 
@@ -341,15 +344,14 @@ static void __init xen_do_set_identity_and_remap_chunk(
  * to Xen and not remapped.
  */
 static unsigned long __init xen_set_identity_and_remap_chunk(
-	unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
-	unsigned long remap_pfn)
+	unsigned long start_pfn, unsigned long end_pfn, unsigned long remap_pfn)
 {
 	unsigned long pfn;
 	unsigned long i = 0;
 	unsigned long n = end_pfn - start_pfn;
 
 	if (remap_pfn == 0)
-		remap_pfn = nr_pages;
+		remap_pfn = ini_nr_pages;
 
 	while (i < n) {
 		unsigned long cur_pfn = start_pfn + i;
@@ -358,19 +360,19 @@ static unsigned long __init xen_set_identity_and_remap_chunk(
 		unsigned long remap_range_size;
 
 		/* Do not remap pages beyond the current allocation */
-		if (cur_pfn >= nr_pages) {
+		if (cur_pfn >= ini_nr_pages) {
 			/* Identity map remaining pages */
 			set_phys_range_identity(cur_pfn, cur_pfn + size);
 			break;
 		}
-		if (cur_pfn + size > nr_pages)
-			size = nr_pages - cur_pfn;
+		if (cur_pfn + size > ini_nr_pages)
+			size = ini_nr_pages - cur_pfn;
 
 		remap_range_size = xen_find_pfn_range(&remap_pfn);
 		if (!remap_range_size) {
 			pr_warn("Unable to find available pfn range, not remapping identity pages\n");
 			xen_set_identity_and_release_chunk(cur_pfn,
-						cur_pfn + left, nr_pages);
+							   cur_pfn + left);
 			break;
 		}
 		/* Adjust size to fit in current e820 RAM region */
@@ -397,18 +399,18 @@ static unsigned long __init xen_set_identity_and_remap_chunk(
 }
 
 static unsigned long __init xen_count_remap_pages(
-	unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages,
+	unsigned long start_pfn, unsigned long end_pfn,
 	unsigned long remap_pages)
 {
-	if (start_pfn >= nr_pages)
+	if (start_pfn >= ini_nr_pages)
 		return remap_pages;
 
-	return remap_pages + min(end_pfn, nr_pages) - start_pfn;
+	return remap_pages + min(end_pfn, ini_nr_pages) - start_pfn;
 }
 
-static unsigned long __init xen_foreach_remap_area(unsigned long nr_pages,
+static unsigned long __init xen_foreach_remap_area(
 	unsigned long (*func)(unsigned long start_pfn, unsigned long end_pfn,
-			      unsigned long nr_pages, unsigned long last_val))
+			      unsigned long last_val))
 {
 	phys_addr_t start = 0;
 	unsigned long ret_val = 0;
@@ -436,8 +438,7 @@ static unsigned long __init xen_foreach_remap_area(unsigned long nr_pages,
 				end_pfn = PFN_UP(entry->addr);
 
 			if (start_pfn < end_pfn)
-				ret_val = func(start_pfn, end_pfn, nr_pages,
-					       ret_val);
+				ret_val = func(start_pfn, end_pfn, ret_val);
 			start = end;
 		}
 	}
@@ -700,7 +701,7 @@ static void __init xen_reserve_xen_mfnlist(void)
  **/
 char * __init xen_memory_setup(void)
 {
-	unsigned long max_pfn, pfn_s, n_pfns;
+	unsigned long pfn_s, n_pfns;
 	phys_addr_t mem_end, addr, size, chunk_size;
 	u32 type;
 	int rc;
@@ -712,9 +713,8 @@ char * __init xen_memory_setup(void)
 	int op;
 
 	xen_parse_512gb();
-	max_pfn = xen_get_pages_limit();
-	max_pfn = min(max_pfn, xen_start_info->nr_pages);
-	mem_end = PFN_PHYS(max_pfn);
+	ini_nr_pages = min(xen_get_pages_limit(), xen_start_info->nr_pages);
+	mem_end = PFN_PHYS(ini_nr_pages);
 
 	memmap.nr_entries = ARRAY_SIZE(xen_e820_table.entries);
 	set_xen_guest_handle(memmap.buffer, xen_e820_table.entries);
@@ -789,10 +789,10 @@ char * __init xen_memory_setup(void)
 	max_pages = xen_get_max_pages();
 
 	/* How many extra pages do we need due to remapping? */
-	max_pages += xen_foreach_remap_area(max_pfn, xen_count_remap_pages);
+	max_pages += xen_foreach_remap_area(xen_count_remap_pages);
 
-	if (max_pages > max_pfn)
-		extra_pages += max_pages - max_pfn;
+	if (max_pages > ini_nr_pages)
+		extra_pages += max_pages - ini_nr_pages;
 
 	/*
 	 * Clamp the amount of extra memory to a EXTRA_MEM_RATIO
@@ -801,8 +801,8 @@ char * __init xen_memory_setup(void)
 	 * Make sure we have no memory above max_pages, as this area
 	 * isn't handled by the p2m management.
 	 */
-	maxmem_pages = EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM));
-	extra_pages = min3(maxmem_pages, extra_pages, max_pages - max_pfn);
+	maxmem_pages = EXTRA_MEM_RATIO * min(ini_nr_pages, PFN_DOWN(MAXMEM));
+	extra_pages = min3(maxmem_pages, extra_pages, max_pages - ini_nr_pages);
 	i = 0;
 	addr = xen_e820_table.entries[0].addr;
 	size = xen_e820_table.entries[0].size;
@@ -885,7 +885,7 @@ char * __init xen_memory_setup(void)
 	 * Set identity map on non-RAM pages and prepare remapping the
 	 * underlying RAM.
 	 */
-	xen_foreach_remap_area(max_pfn, xen_set_identity_and_remap_chunk);
+	xen_foreach_remap_area(xen_set_identity_and_remap_chunk);
 
 	pr_info("Released %ld page(s)\n", xen_released_pages);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
                   ` (3 preceding siblings ...)
  2024-09-10 10:39 ` [PATCH v3 4/7] xen: move max_pfn in xen_memory_setup() out of function scope Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 12:26   ` Jan Beulich
  2024-09-10 10:39 ` [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address Juergen Gross
  2024-09-10 10:39 ` [PATCH v3 7/7] xen: tolerate ACPI NVS memory overlapping with Xen allocated memory Juergen Gross
  6 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel

When running as a Xen PV dom0 it can happen that the kernel is being
loaded to a guest physical address conflicting with the host memory
map.

In order to be able to resolve this conflict, add the capability to
remap non-RAM areas to different guest PFNs. A function to use this
remapping information for other purposes than doing the remap will be
added when needed.

As the number of conflicts should be rather low (currently only
machines with max. 1 conflict are known), save the remap data in a
small statically allocated array.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- split off from patch 5 of V1 of the series
- moved to p2m.c
V3:
- add const qualifier (Jan Beulich)
- change remap size type to size_t (Jan Beulich)
- add __ro_after_init qualifier (Jan Beulich)
- log frame numbers in hex (Jan Beulich)
- always issue message regarding number of remapped pages (Jan Beulich)
- add sanity check (Jan Beulich)
- fix remap loop (Jan Beulich)

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 arch/x86/xen/p2m.c     | 65 ++++++++++++++++++++++++++++++++++++++++++
 arch/x86/xen/xen-ops.h |  3 ++
 2 files changed, 68 insertions(+)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 7c735b730acd..5b2aeae6f9e4 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -80,6 +80,7 @@
 #include <asm/xen/hypervisor.h>
 #include <xen/balloon.h>
 #include <xen/grant_table.h>
+#include <xen/hvc-console.h>
 
 #include "xen-ops.h"
 
@@ -792,6 +793,70 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
 	return ret;
 }
 
+/* Remapped non-RAM areas */
+#define NR_NONRAM_REMAP 4
+static struct nonram_remap {
+	phys_addr_t maddr;
+	phys_addr_t paddr;
+	size_t size;
+} xen_nonram_remap[NR_NONRAM_REMAP] __ro_after_init;
+static unsigned int nr_nonram_remap __ro_after_init;
+
+/*
+ * Do the real remapping of non-RAM regions as specified in the
+ * xen_nonram_remap[] array.
+ * In case of an error just crash the system.
+ */
+void __init xen_do_remap_nonram(void)
+{
+	unsigned int i;
+	unsigned int remapped = 0;
+	const struct nonram_remap *remap = xen_nonram_remap;
+	unsigned long pfn, mfn, end_pfn;
+
+	for (i = 0; i < nr_nonram_remap; i++) {
+		end_pfn = PFN_UP(remap->paddr + remap->size);
+		pfn = PFN_DOWN(remap->paddr);
+		mfn = PFN_DOWN(remap->maddr);
+		while (pfn < end_pfn) {
+			if (!set_phys_to_machine(pfn, mfn)) {
+				pr_err("Failed to set p2m mapping for pfn=%lx mfn=%lx\n",
+				       pfn, mfn);
+				BUG();
+			}
+
+			pfn++;
+			mfn++;
+			remapped++;
+		}
+
+		remap++;
+	}
+
+	pr_info("Remapped %u non-RAM page(s)\n", remapped);
+}
+
+/*
+ * Add a new non-RAM remap entry.
+ * In case of no free entry found, just crash the system.
+ */
+void __init xen_add_remap_nonram(phys_addr_t maddr, phys_addr_t paddr,
+				 unsigned long size)
+{
+	BUG_ON((maddr & ~PAGE_MASK) != (paddr & ~PAGE_MASK));
+
+	if (nr_nonram_remap == NR_NONRAM_REMAP) {
+		xen_raw_console_write("Number of required E820 entry remapping actions exceed maximum value\n");
+		BUG();
+	}
+
+	xen_nonram_remap[nr_nonram_remap].maddr = maddr;
+	xen_nonram_remap[nr_nonram_remap].paddr = paddr;
+	xen_nonram_remap[nr_nonram_remap].size = size;
+
+	nr_nonram_remap++;
+}
+
 #ifdef CONFIG_XEN_DEBUG_FS
 #include <linux/debugfs.h>
 static int p2m_dump_show(struct seq_file *m, void *v)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 9a27d1d653d3..e1b782e823e6 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -47,6 +47,9 @@ void xen_mm_unpin_all(void);
 #ifdef CONFIG_X86_64
 void __init xen_relocate_p2m(void);
 #endif
+void __init xen_do_remap_nonram(void);
+void __init xen_add_remap_nonram(phys_addr_t maddr, phys_addr_t paddr,
+				 unsigned long size);
 
 void __init xen_chk_is_e820_usable(phys_addr_t start, phys_addr_t size,
 				   const char *component);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
                   ` (4 preceding siblings ...)
  2024-09-10 10:39 ` [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 12:34   ` Jan Beulich
  2024-09-10 10:39 ` [PATCH v3 7/7] xen: tolerate ACPI NVS memory overlapping with Xen allocated memory Juergen Gross
  6 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86, linux-acpi
  Cc: Juergen Gross, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Rafael J. Wysocki, Len Brown,
	Boris Ostrovsky, xen-devel

When running as a Xen PV dom0 the system needs to map ACPI data of the
host using host physical addresses, while those addresses can conflict
with the guest physical addresses of the loaded linux kernel. The same
problem might apply in case a PV guest is configured to use the host
memory map.

This conflict can be solved by mapping the ACPI data to a different
guest physical address, but mapping the data via acpi_os_ioremap()
must still be possible using the host physical address, as this
address might be generated by AML when referencing some of the ACPI
data.

When configured to support running as a Xen PV domain, have an
implementation of acpi_os_ioremap() being aware of the possibility to
need above mentioned translation of a host physical address to the
guest physical address.

This modification requires to fix some #include of asm/acpi.h in x86
code to use linux/acpi.h instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch (Jan Beulich)
V3:
- add const attribute (Jan Beulich)
- guard ACPI related code with CONFIG_ACPI (Jan Beulich)
- use CONFIG_XEN_PV instead of CONFIG_XEN_PV_DOM0
---
 arch/x86/include/asm/acpi.h        |  8 +++++++
 arch/x86/kernel/acpi/boot.c        | 10 +++++++++
 arch/x86/kernel/mmconf-fam10h_64.c |  2 +-
 arch/x86/kernel/x86_init.c         |  2 +-
 arch/x86/xen/p2m.c                 | 34 ++++++++++++++++++++++++++++++
 arch/x86/xen/setup.c               |  2 +-
 6 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index 21bc53f5ed0c..5ab1a4598d00 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -174,6 +174,14 @@ void acpi_generic_reduced_hw_init(void);
 void x86_default_set_root_pointer(u64 addr);
 u64 x86_default_get_root_pointer(void);
 
+#ifdef CONFIG_XEN_PV
+/* A Xen PV domain needs a special acpi_os_ioremap() handling. */
+extern void __iomem * (*acpi_os_ioremap)(acpi_physical_address phys,
+					 acpi_size size);
+void __iomem *x86_acpi_os_ioremap(acpi_physical_address phys, acpi_size size);
+#define acpi_os_ioremap acpi_os_ioremap
+#endif
+
 #else /* !CONFIG_ACPI */
 
 #define acpi_lapic 0
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9f4618dcd704..2de8510c56dd 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -1778,3 +1778,13 @@ u64 x86_default_get_root_pointer(void)
 {
 	return boot_params.acpi_rsdp_addr;
 }
+
+#ifdef CONFIG_XEN_PV
+void __iomem *x86_acpi_os_ioremap(acpi_physical_address phys, acpi_size size)
+{
+	return ioremap_cache(phys, size);
+}
+
+void __iomem * (*acpi_os_ioremap)(acpi_physical_address phys, acpi_size size) =
+	x86_acpi_os_ioremap;
+#endif
diff --git a/arch/x86/kernel/mmconf-fam10h_64.c b/arch/x86/kernel/mmconf-fam10h_64.c
index c94dec6a1834..8347a29f9db4 100644
--- a/arch/x86/kernel/mmconf-fam10h_64.c
+++ b/arch/x86/kernel/mmconf-fam10h_64.c
@@ -9,12 +9,12 @@
 #include <linux/pci.h>
 #include <linux/dmi.h>
 #include <linux/range.h>
+#include <linux/acpi.h>
 
 #include <asm/pci-direct.h>
 #include <linux/sort.h>
 #include <asm/io.h>
 #include <asm/msr.h>
-#include <asm/acpi.h>
 #include <asm/mmconfig.h>
 #include <asm/pci_x86.h>
 
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 82b128d3f309..47ef8af23101 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -8,8 +8,8 @@
 #include <linux/ioport.h>
 #include <linux/export.h>
 #include <linux/pci.h>
+#include <linux/acpi.h>
 
-#include <asm/acpi.h>
 #include <asm/bios_ebda.h>
 #include <asm/paravirt.h>
 #include <asm/pci_x86.h>
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 5b2aeae6f9e4..a64e9562733e 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -70,6 +70,7 @@
 #include <linux/memblock.h>
 #include <linux/slab.h>
 #include <linux/vmalloc.h>
+#include <linux/acpi.h>
 
 #include <asm/cache.h>
 #include <asm/setup.h>
@@ -836,6 +837,33 @@ void __init xen_do_remap_nonram(void)
 	pr_info("Remapped %u non-RAM page(s)\n", remapped);
 }
 
+#ifdef CONFIG_ACPI
+/*
+ * Xen variant of acpi_os_ioremap() taking potentially remapped non-RAM
+ * regions into acount.
+ * Any attempt to map an area crossing a remap boundary will produce a
+ * WARN() splat.
+ */
+static void __iomem *xen_acpi_os_ioremap(acpi_physical_address phys,
+					 acpi_size size)
+{
+	unsigned int i;
+	const struct nonram_remap *remap = xen_nonram_remap;
+
+	for (i = 0; i < nr_nonram_remap; i++) {
+		if (phys + size > remap->maddr &&
+		    phys < remap->maddr + remap->size) {
+			WARN_ON(phys < remap->maddr ||
+				phys + size > remap->maddr + remap->size);
+			phys = remap->paddr + phys - remap->maddr;
+			break;
+		}
+	}
+
+	return x86_acpi_os_ioremap(phys, size);
+}
+#endif /* CONFIG_ACPI */
+
 /*
  * Add a new non-RAM remap entry.
  * In case of no free entry found, just crash the system.
@@ -850,6 +878,12 @@ void __init xen_add_remap_nonram(phys_addr_t maddr, phys_addr_t paddr,
 		BUG();
 	}
 
+#ifdef CONFIG_ACPI
+	/* Switch to the Xen acpi_os_ioremap() variant. */
+	if (nr_nonram_remap == 0)
+		acpi_os_ioremap = xen_acpi_os_ioremap;
+#endif
+
 	xen_nonram_remap[nr_nonram_remap].maddr = maddr;
 	xen_nonram_remap[nr_nonram_remap].paddr = paddr;
 	xen_nonram_remap[nr_nonram_remap].size = size;
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 2c79bb5a9cd0..1114e49937da 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -15,12 +15,12 @@
 #include <linux/cpuidle.h>
 #include <linux/cpufreq.h>
 #include <linux/memory_hotplug.h>
+#include <linux/acpi.h>
 
 #include <asm/elf.h>
 #include <asm/vdso.h>
 #include <asm/e820/api.h>
 #include <asm/setup.h>
-#include <asm/acpi.h>
 #include <asm/numa.h>
 #include <asm/idtentry.h>
 #include <asm/xen/hypervisor.h>
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 7/7] xen: tolerate ACPI NVS memory overlapping with Xen allocated memory
  2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
                   ` (5 preceding siblings ...)
  2024-09-10 10:39 ` [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address Juergen Gross
@ 2024-09-10 10:39 ` Juergen Gross
  2024-09-10 12:36   ` Jan Beulich
  6 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2024-09-10 10:39 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, xen-devel,
	Marek Marczykowski-Górecki

In order to minimize required special handling for running as Xen PV
dom0, the memory layout is modified to match that of the host. This
requires to have only RAM at the locations where Xen allocated memory
is living. Unfortunately there seem to be some machines, where ACPI
NVS is located at 64 MB, resulting in a conflict with the loaded
kernel or the initial page tables built by Xen.

Avoid this conflict by swapping the ACPI NVS area in the memory map
with unused RAM. This is possible via modification of the dom0 P2M map.
Accesses to the ACPI NVS area are done either for saving and restoring
it across suspend operations (this will work the same way as before),
or by ACPI code when NVS memory is referenced from other ACPI tables.
The latter case is handled by a Xen specific indirection of
acpi_os_ioremap().

While the E820 map can (and should) be modified right away, the P2M
map can be updated only after memory allocation is working, as the P2M
map might need to be extended.

Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout")
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
V2:
- remap helpers split off into other patch
V3:
- adjust commit message (Jan Beulich)
---
 arch/x86/xen/setup.c | 92 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 91 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index 1114e49937da..c3db71d96c43 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -495,6 +495,8 @@ void __init xen_remap_memory(void)
 	set_pte_mfn(buf, mfn_save, PAGE_KERNEL);
 
 	pr_info("Remapped %ld page(s)\n", remapped);
+
+	xen_do_remap_nonram();
 }
 
 static unsigned long __init xen_get_pages_limit(void)
@@ -625,14 +627,102 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size)
 	return 0;
 }
 
+/*
+ * Swap a non-RAM E820 map entry with RAM above ini_nr_pages.
+ * Note that the E820 map is modified accordingly, but the P2M map isn't yet.
+ * The adaption of the P2M must be deferred until page allocation is possible.
+ */
+static void __init xen_e820_swap_entry_with_ram(struct e820_entry *swap_entry)
+{
+	struct e820_entry *entry;
+	unsigned int mapcnt;
+	phys_addr_t mem_end = PFN_PHYS(ini_nr_pages);
+	phys_addr_t swap_addr, swap_size, entry_end;
+
+	swap_addr = PAGE_ALIGN_DOWN(swap_entry->addr);
+	swap_size = PAGE_ALIGN(swap_entry->addr - swap_addr + swap_entry->size);
+	entry = xen_e820_table.entries;
+
+	for (mapcnt = 0; mapcnt < xen_e820_table.nr_entries; mapcnt++) {
+		entry_end = entry->addr + entry->size;
+		if (entry->type == E820_TYPE_RAM && entry->size >= swap_size &&
+		    entry_end - swap_size >= mem_end) {
+			/* Reduce RAM entry by needed space (whole pages). */
+			entry->size -= swap_size;
+
+			/* Add new entry at the end of E820 map. */
+			entry = xen_e820_table.entries +
+				xen_e820_table.nr_entries;
+			xen_e820_table.nr_entries++;
+
+			/* Fill new entry (keep size and page offset). */
+			entry->type = swap_entry->type;
+			entry->addr = entry_end - swap_size +
+				      swap_addr - swap_entry->addr;
+			entry->size = swap_entry->size;
+
+			/* Convert old entry to RAM, align to pages. */
+			swap_entry->type = E820_TYPE_RAM;
+			swap_entry->addr = swap_addr;
+			swap_entry->size = swap_size;
+
+			/* Remember PFN<->MFN relation for P2M update. */
+			xen_add_remap_nonram(swap_addr, entry_end - swap_size,
+					     swap_size);
+
+			/* Order E820 table and merge entries. */
+			e820__update_table(&xen_e820_table);
+
+			return;
+		}
+
+		entry++;
+	}
+
+	xen_raw_console_write("No suitable area found for required E820 entry remapping action\n");
+	BUG();
+}
+
+/*
+ * Look for non-RAM memory types in a specific guest physical area and move
+ * those away if possible (ACPI NVS only for now).
+ */
+static void __init xen_e820_resolve_conflicts(phys_addr_t start,
+					      phys_addr_t size)
+{
+	struct e820_entry *entry;
+	unsigned int mapcnt;
+	phys_addr_t end;
+
+	if (!size)
+		return;
+
+	end = start + size;
+	entry = xen_e820_table.entries;
+
+	for (mapcnt = 0; mapcnt < xen_e820_table.nr_entries; mapcnt++) {
+		if (entry->addr >= end)
+			return;
+
+		if (entry->addr + entry->size > start &&
+		    entry->type == E820_TYPE_NVS)
+			xen_e820_swap_entry_with_ram(entry);
+
+		entry++;
+	}
+}
+
 /*
  * Check for an area in physical memory to be usable for non-movable purposes.
- * An area is considered to usable if the used E820 map lists it to be RAM.
+ * An area is considered to usable if the used E820 map lists it to be RAM or
+ * some other type which can be moved to higher PFNs while keeping the MFNs.
  * In case the area is not usable, crash the system with an error message.
  */
 void __init xen_chk_is_e820_usable(phys_addr_t start, phys_addr_t size,
 				   const char *component)
 {
+	xen_e820_resolve_conflicts(start, size);
+
 	if (!xen_is_e820_reserved(start, size))
 		return;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs
  2024-09-10 10:39 ` [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs Juergen Gross
@ 2024-09-10 12:26   ` Jan Beulich
  2024-09-10 12:51     ` Jürgen Groß
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2024-09-10 12:26 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Boris Ostrovsky, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, xen-devel, linux-kernel, x86

On 10.09.2024 12:39, Juergen Gross wrote:
> When running as a Xen PV dom0 it can happen that the kernel is being
> loaded to a guest physical address conflicting with the host memory
> map.
> 
> In order to be able to resolve this conflict, add the capability to
> remap non-RAM areas to different guest PFNs. A function to use this
> remapping information for other purposes than doing the remap will be
> added when needed.
> 
> As the number of conflicts should be rather low (currently only
> machines with max. 1 conflict are known), save the remap data in a
> small statically allocated array.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with two cosmetic remarks:

> @@ -792,6 +793,70 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
>  	return ret;
>  }
>  
> +/* Remapped non-RAM areas */
> +#define NR_NONRAM_REMAP 4
> +static struct nonram_remap {
> +	phys_addr_t maddr;
> +	phys_addr_t paddr;
> +	size_t size;
> +} xen_nonram_remap[NR_NONRAM_REMAP] __ro_after_init;
> +static unsigned int nr_nonram_remap __ro_after_init;

I take it this is the canonical placement of section attributes in the
kernel? (In Xen I'd ask for the attributes to be moved ahead of the
identifiers being declared.)

> +/*
> + * Do the real remapping of non-RAM regions as specified in the
> + * xen_nonram_remap[] array.
> + * In case of an error just crash the system.
> + */
> +void __init xen_do_remap_nonram(void)
> +{
> +	unsigned int i;
> +	unsigned int remapped = 0;
> +	const struct nonram_remap *remap = xen_nonram_remap;
> +	unsigned long pfn, mfn, end_pfn;
> +
> +	for (i = 0; i < nr_nonram_remap; i++) {
> +		end_pfn = PFN_UP(remap->paddr + remap->size);
> +		pfn = PFN_DOWN(remap->paddr);
> +		mfn = PFN_DOWN(remap->maddr);
> +		while (pfn < end_pfn) {
> +			if (!set_phys_to_machine(pfn, mfn)) {
> +				pr_err("Failed to set p2m mapping for pfn=%lx mfn=%lx\n",
> +				       pfn, mfn);
> +				BUG();

Wouldn't panic() get you both in one operation? Or do you expect the call
trace / register state to be of immediate relevance for analysis?

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address
  2024-09-10 10:39 ` [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address Juergen Gross
@ 2024-09-10 12:34   ` Jan Beulich
  2024-09-10 12:52     ` Jürgen Groß
  0 siblings, 1 reply; 13+ messages in thread
From: Jan Beulich @ 2024-09-10 12:34 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Rafael J. Wysocki, Len Brown, Boris Ostrovsky,
	xen-devel, linux-kernel, x86, linux-acpi

On 10.09.2024 12:39, Juergen Gross wrote:
> When running as a Xen PV dom0 the system needs to map ACPI data of the
> host using host physical addresses, while those addresses can conflict
> with the guest physical addresses of the loaded linux kernel. The same
> problem might apply in case a PV guest is configured to use the host
> memory map.
> 
> This conflict can be solved by mapping the ACPI data to a different
> guest physical address, but mapping the data via acpi_os_ioremap()
> must still be possible using the host physical address, as this
> address might be generated by AML when referencing some of the ACPI
> data.
> 
> When configured to support running as a Xen PV domain, have an
> implementation of acpi_os_ioremap() being aware of the possibility to
> need above mentioned translation of a host physical address to the
> guest physical address.
> 
> This modification requires to fix some #include of asm/acpi.h in x86
> code to use linux/acpi.h instead.
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with a request to comment a tiny bit more:

> @@ -836,6 +837,33 @@ void __init xen_do_remap_nonram(void)
>  	pr_info("Remapped %u non-RAM page(s)\n", remapped);
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Xen variant of acpi_os_ioremap() taking potentially remapped non-RAM
> + * regions into acount.

(Nit: account)

> + * Any attempt to map an area crossing a remap boundary will produce a
> + * WARN() splat.
> + */
> +static void __iomem *xen_acpi_os_ioremap(acpi_physical_address phys,
> +					 acpi_size size)
> +{
> +	unsigned int i;
> +	const struct nonram_remap *remap = xen_nonram_remap;
> +
> +	for (i = 0; i < nr_nonram_remap; i++) {
> +		if (phys + size > remap->maddr &&
> +		    phys < remap->maddr + remap->size) {
> +			WARN_ON(phys < remap->maddr ||
> +				phys + size > remap->maddr + remap->size);
> +			phys = remap->paddr + phys - remap->maddr;

This might be slightly easier / more logical to read as

			phys += remap->paddr - remap->maddr;

Also because of "phys" not consistently expressing a physical address
(when you need convert it, the incoming value is a machine address) a
comment may help here. In fact at the first glance (and despite having
seen the code before) I thought the translation was done the wrong way
round, simply because of the name of the variable.

Jan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 7/7] xen: tolerate ACPI NVS memory overlapping with Xen allocated memory
  2024-09-10 10:39 ` [PATCH v3 7/7] xen: tolerate ACPI NVS memory overlapping with Xen allocated memory Juergen Gross
@ 2024-09-10 12:36   ` Jan Beulich
  0 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2024-09-10 12:36 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Boris Ostrovsky, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, xen-devel,
	Marek Marczykowski-Górecki, linux-kernel, x86

On 10.09.2024 12:39, Juergen Gross wrote:
> In order to minimize required special handling for running as Xen PV
> dom0, the memory layout is modified to match that of the host. This
> requires to have only RAM at the locations where Xen allocated memory
> is living. Unfortunately there seem to be some machines, where ACPI
> NVS is located at 64 MB, resulting in a conflict with the loaded
> kernel or the initial page tables built by Xen.
> 
> Avoid this conflict by swapping the ACPI NVS area in the memory map
> with unused RAM. This is possible via modification of the dom0 P2M map.
> Accesses to the ACPI NVS area are done either for saving and restoring
> it across suspend operations (this will work the same way as before),
> or by ACPI code when NVS memory is referenced from other ACPI tables.
> The latter case is handled by a Xen specific indirection of
> acpi_os_ioremap().
> 
> While the E820 map can (and should) be modified right away, the P2M
> map can be updated only after memory allocation is working, as the P2M
> map might need to be extended.
> 
> Fixes: 808fdb71936c ("xen: check for kernel memory conflicting with memory layout")
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs
  2024-09-10 12:26   ` Jan Beulich
@ 2024-09-10 12:51     ` Jürgen Groß
  0 siblings, 0 replies; 13+ messages in thread
From: Jürgen Groß @ 2024-09-10 12:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, xen-devel, linux-kernel, x86


[-- Attachment #1.1.1: Type: text/plain, Size: 2523 bytes --]

On 10.09.24 14:26, Jan Beulich wrote:
> On 10.09.2024 12:39, Juergen Gross wrote:
>> When running as a Xen PV dom0 it can happen that the kernel is being
>> loaded to a guest physical address conflicting with the host memory
>> map.
>>
>> In order to be able to resolve this conflict, add the capability to
>> remap non-RAM areas to different guest PFNs. A function to use this
>> remapping information for other purposes than doing the remap will be
>> added when needed.
>>
>> As the number of conflicts should be rather low (currently only
>> machines with max. 1 conflict are known), save the remap data in a
>> small statically allocated array.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with two cosmetic remarks:
> 
>> @@ -792,6 +793,70 @@ int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops,
>>   	return ret;
>>   }
>>   
>> +/* Remapped non-RAM areas */
>> +#define NR_NONRAM_REMAP 4
>> +static struct nonram_remap {
>> +	phys_addr_t maddr;
>> +	phys_addr_t paddr;
>> +	size_t size;
>> +} xen_nonram_remap[NR_NONRAM_REMAP] __ro_after_init;
>> +static unsigned int nr_nonram_remap __ro_after_init;
> 
> I take it this is the canonical placement of section attributes in the
> kernel? (In Xen I'd ask for the attributes to be moved ahead of the
> identifiers being declared.)

I didn't find an explicit mentioning in the coding style, but most
examples I've found place the section attribute after the name of the
variable.

> 
>> +/*
>> + * Do the real remapping of non-RAM regions as specified in the
>> + * xen_nonram_remap[] array.
>> + * In case of an error just crash the system.
>> + */
>> +void __init xen_do_remap_nonram(void)
>> +{
>> +	unsigned int i;
>> +	unsigned int remapped = 0;
>> +	const struct nonram_remap *remap = xen_nonram_remap;
>> +	unsigned long pfn, mfn, end_pfn;
>> +
>> +	for (i = 0; i < nr_nonram_remap; i++) {
>> +		end_pfn = PFN_UP(remap->paddr + remap->size);
>> +		pfn = PFN_DOWN(remap->paddr);
>> +		mfn = PFN_DOWN(remap->maddr);
>> +		while (pfn < end_pfn) {
>> +			if (!set_phys_to_machine(pfn, mfn)) {
>> +				pr_err("Failed to set p2m mapping for pfn=%lx mfn=%lx\n",
>> +				       pfn, mfn);
>> +				BUG();
> 
> Wouldn't panic() get you both in one operation? Or do you expect the call
> trace / register state to be of immediate relevance for analysis?

You are right, in this case panic() is a better option.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address
  2024-09-10 12:34   ` Jan Beulich
@ 2024-09-10 12:52     ` Jürgen Groß
  0 siblings, 0 replies; 13+ messages in thread
From: Jürgen Groß @ 2024-09-10 12:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Rafael J. Wysocki, Len Brown, Boris Ostrovsky,
	xen-devel, linux-kernel, x86, linux-acpi


[-- Attachment #1.1.1: Type: text/plain, Size: 2637 bytes --]

On 10.09.24 14:34, Jan Beulich wrote:
> On 10.09.2024 12:39, Juergen Gross wrote:
>> When running as a Xen PV dom0 the system needs to map ACPI data of the
>> host using host physical addresses, while those addresses can conflict
>> with the guest physical addresses of the loaded linux kernel. The same
>> problem might apply in case a PV guest is configured to use the host
>> memory map.
>>
>> This conflict can be solved by mapping the ACPI data to a different
>> guest physical address, but mapping the data via acpi_os_ioremap()
>> must still be possible using the host physical address, as this
>> address might be generated by AML when referencing some of the ACPI
>> data.
>>
>> When configured to support running as a Xen PV domain, have an
>> implementation of acpi_os_ioremap() being aware of the possibility to
>> need above mentioned translation of a host physical address to the
>> guest physical address.
>>
>> This modification requires to fix some #include of asm/acpi.h in x86
>> code to use linux/acpi.h instead.
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with a request to comment a tiny bit more:
> 
>> @@ -836,6 +837,33 @@ void __init xen_do_remap_nonram(void)
>>   	pr_info("Remapped %u non-RAM page(s)\n", remapped);
>>   }
>>   
>> +#ifdef CONFIG_ACPI
>> +/*
>> + * Xen variant of acpi_os_ioremap() taking potentially remapped non-RAM
>> + * regions into acount.
> 
> (Nit: account)

Indeed.

> 
>> + * Any attempt to map an area crossing a remap boundary will produce a
>> + * WARN() splat.
>> + */
>> +static void __iomem *xen_acpi_os_ioremap(acpi_physical_address phys,
>> +					 acpi_size size)
>> +{
>> +	unsigned int i;
>> +	const struct nonram_remap *remap = xen_nonram_remap;
>> +
>> +	for (i = 0; i < nr_nonram_remap; i++) {
>> +		if (phys + size > remap->maddr &&
>> +		    phys < remap->maddr + remap->size) {
>> +			WARN_ON(phys < remap->maddr ||
>> +				phys + size > remap->maddr + remap->size);
>> +			phys = remap->paddr + phys - remap->maddr;
> 
> This might be slightly easier / more logical to read as
> 
> 			phys += remap->paddr - remap->maddr;
> 
> Also because of "phys" not consistently expressing a physical address
> (when you need convert it, the incoming value is a machine address) a
> comment may help here. In fact at the first glance (and despite having
> seen the code before) I thought the translation was done the wrong way
> round, simply because of the name of the variable.

Will add a comment and change the line as you suggest.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-09-10 12:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 10:39 [PATCH v3 0/7] xen: fix dom0 PV boot on some AMD machines Juergen Gross
2024-09-10 10:39 ` [PATCH v3 1/7] xen: use correct end address of kernel for conflict checking Juergen Gross
2024-09-10 10:39 ` [PATCH v3 2/7] xen: introduce generic helper checking for memory map conflicts Juergen Gross
2024-09-10 10:39 ` [PATCH v3 3/7] xen: move checks for e820 conflicts further up Juergen Gross
2024-09-10 10:39 ` [PATCH v3 4/7] xen: move max_pfn in xen_memory_setup() out of function scope Juergen Gross
2024-09-10 10:39 ` [PATCH v3 5/7] xen: add capability to remap non-RAM pages to different PFNs Juergen Gross
2024-09-10 12:26   ` Jan Beulich
2024-09-10 12:51     ` Jürgen Groß
2024-09-10 10:39 ` [PATCH v3 6/7] xen: allow mapping ACPI data using a different physical address Juergen Gross
2024-09-10 12:34   ` Jan Beulich
2024-09-10 12:52     ` Jürgen Groß
2024-09-10 10:39 ` [PATCH v3 7/7] xen: tolerate ACPI NVS memory overlapping with Xen allocated memory Juergen Gross
2024-09-10 12:36   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox