public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] x86/pvh: Make PVH entry relocatable
@ 2024-04-10 19:48 Jason Andryuk
  2024-04-10 19:48 ` [PATCH 1/5] xen: sync elfnote.h from xen tree Jason Andryuk
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jason Andryuk @ 2024-04-10 19:48 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

Using the PVH entry point, the uncompressed vmlinux is loaded at
LOAD_PHYSICAL_ADDR, and execution starts in 32bit mode at the
address in XEN_ELFNOTE_PHYS32_ENTRY, pvh_start_xen, with paging
disabled.

Loading at LOAD_PHYSICAL_ADDR has not been a problem in the past as
virtual machines don't have conflicting memory maps.  But Xen now
supports a PVH dom0, which uses the host memory map, and there are
Coreboot/EDK2 firmwares that have reserved regions conflicting with
LOAD_PHYSICAL_ADDR.  Xen recently added XEN_ELFNOTE_PHYS32_RELOC to
specify an alignment, minimum and maximum load address when
LOAD_PHYSICAL_ADDR cannot be used.  This patch series makes the PVH
entry path PIC to support relocation.

Only x86-64 is converted.  The 32bit entry path calling into vmlinux,
which is not PIC, will not support relocation.

The entry path needs pages tables to switch to 64bit mode.  A new
pvh_init_top_pgt is added to make the transition into the startup_64
when the regular init_top_pgt pagetables are setup.  This duplication is
unfortunate, but it keeps the changes simpler.  __startup_64() can't be
used to setup init_top_pgt for PVH entry because it is 64bit code - the
32bit entry code doesn't have page tables to use.

This is the straight forward implementation to make it work.  Other
approaches could be pursued.

checkpatch.pl gives an error: "ERROR: Macros with multiple statements
should be enclosed in a do - while loop" about the moved PMDS macro.
But PMDS is an assembler macro, so its not applicable.  There are some
false positive warnings "WARNING: space prohibited between function name
and open parenthesis '('" about the macro, too.

Jason Andryuk (5):
  xen: sync elfnote.h from xen tree
  x86/pvh: Make PVH entrypoint PIC for x86-64
  x86/pvh: Set phys_base when calling xen_prepare_pvh()
  x86/kernel: Move page table macros to new header
  x86/pvh: Add 64bit relocation page tables

 arch/x86/kernel/head_64.S            |  22 +---
 arch/x86/kernel/pgtable_64_helpers.h |  28 +++++
 arch/x86/platform/pvh/head.S         | 157 +++++++++++++++++++++++++--
 include/xen/interface/elfnote.h      |  93 +++++++++++++++-
 4 files changed, 265 insertions(+), 35 deletions(-)
 create mode 100644 arch/x86/kernel/pgtable_64_helpers.h

-- 
2.44.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] xen: sync elfnote.h from xen tree
  2024-04-10 19:48 [PATCH 0/5] x86/pvh: Make PVH entry relocatable Jason Andryuk
@ 2024-04-10 19:48 ` Jason Andryuk
  2024-05-10  8:09   ` Jürgen Groß
  2024-04-10 19:48 ` [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64 Jason Andryuk
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Andryuk @ 2024-04-10 19:48 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

Sync Xen's elfnote.h header from xen.git to pull in the
XEN_ELFNOTE_PHYS32_RELOC define.

xen commit dfc9fab00378 ("x86/PVH: Support relocatable dom0 kernels")

This is a copy except for the removal of the emacs editor config at the
end of the file.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 include/xen/interface/elfnote.h | 93 +++++++++++++++++++++++++++++++--
 1 file changed, 88 insertions(+), 5 deletions(-)

diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index 38deb1214613..918f47d87d7a 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -11,7 +11,9 @@
 #define __XEN_PUBLIC_ELFNOTE_H__
 
 /*
- * The notes should live in a SHT_NOTE segment and have "Xen" in the
+ * `incontents 200 elfnotes ELF notes
+ *
+ * The notes should live in a PT_NOTE segment and have "Xen" in the
  * name field.
  *
  * Numeric types are either 4 or 8 bytes depending on the content of
@@ -22,6 +24,8 @@
  *
  * String values (for non-legacy) are NULL terminated ASCII, also known
  * as ASCIZ type.
+ *
+ * Xen only uses ELF Notes contained in x86 binaries.
  */
 
 /*
@@ -52,7 +56,7 @@
 #define XEN_ELFNOTE_VIRT_BASE      3
 
 /*
- * The offset of the ELF paddr field from the acutal required
+ * The offset of the ELF paddr field from the actual required
  * pseudo-physical address (numeric).
  *
  * This is used to maintain backwards compatibility with older kernels
@@ -92,7 +96,12 @@
 #define XEN_ELFNOTE_LOADER         8
 
 /*
- * The kernel supports PAE (x86/32 only, string = "yes" or "no").
+ * The kernel supports PAE (x86/32 only, string = "yes", "no" or
+ * "bimodal").
+ *
+ * For compatibility with Xen 3.0.3 and earlier the "bimodal" setting
+ * may be given as "yes,bimodal" which will cause older Xen to treat
+ * this kernel as PAE.
  *
  * LEGACY: PAE (n.b. The legacy interface included a provision to
  * indicate 'extended-cr3' support allowing L3 page tables to be
@@ -149,7 +158,9 @@
  * The (non-default) location the initial phys-to-machine map should be
  * placed at by the hypervisor (Dom0) or the tools (DomU).
  * The kernel must be prepared for this mapping to be established using
- * large pages, despite such otherwise not being available to guests.
+ * large pages, despite such otherwise not being available to guests. Note
+ * that these large pages may be misaligned in PFN space (they'll obviously
+ * be aligned in MFN and virtual address spaces).
  * The kernel must also be able to handle the page table pages used for
  * this mapping not being accessible through the initial mapping.
  * (Only x86-64 supports this at present.)
@@ -185,9 +196,81 @@
  */
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * The presence of this note indicates the kernel supports relocating itself.
+ *
+ * The note may include up to three 32bit values to place constraints on the
+ * guest physical loading addresses and alignment for a PVH kernel.  Values
+ * are read in the following order:
+ *  - a required start alignment (default 0x200000)
+ *  - a minimum address for the start of the image (default 0; see below)
+ *  - a maximum address for the last byte of the image (default 0xffffffff)
+ *
+ * When this note specifies an alignment value, it is used.  Otherwise the
+ * maximum p_align value from loadable ELF Program Headers is used, if it is
+ * greater than or equal to 4k (0x1000).  Otherwise, the default is used.
+ */
+#define XEN_ELFNOTE_PHYS32_RELOC 19
+
 /*
  * The number of the highest elfnote defined.
  */
-#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_ENTRY
+#define XEN_ELFNOTE_MAX XEN_ELFNOTE_PHYS32_RELOC
+
+/*
+ * System information exported through crash notes.
+ *
+ * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_INFO
+ * note in case of a system crash. This note will contain various
+ * information about the system, see xen/include/xen/elfcore.h.
+ */
+#define XEN_ELFNOTE_CRASH_INFO 0x1000001
+
+/*
+ * System registers exported through crash notes.
+ *
+ * The kexec / kdump code will create one XEN_ELFNOTE_CRASH_REGS
+ * note per cpu in case of a system crash. This note is architecture
+ * specific and will contain registers not saved in the "CORE" note.
+ * See xen/include/xen/elfcore.h for more information.
+ */
+#define XEN_ELFNOTE_CRASH_REGS 0x1000002
+
+
+/*
+ * xen dump-core none note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_NONE
+ * in its dump file to indicate that the file is xen dump-core
+ * file. This note doesn't have any other information.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_NONE               0x2000000
+
+/*
+ * xen dump-core header note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_HEADER
+ * in its dump file.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_HEADER             0x2000001
+
+/*
+ * xen dump-core xen version note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_XEN_VERSION
+ * in its dump file. It contains the xen version obtained via the
+ * XENVER hypercall.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_XEN_VERSION        0x2000002
+
+/*
+ * xen dump-core format version note.
+ * xm dump-core code will create one XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION
+ * in its dump file. It contains a format version identifier.
+ * See tools/libxc/xc_core.h for more information.
+ */
+#define XEN_ELFNOTE_DUMPCORE_FORMAT_VERSION     0x2000003
 
 #endif /* __XEN_PUBLIC_ELFNOTE_H__ */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64
  2024-04-10 19:48 [PATCH 0/5] x86/pvh: Make PVH entry relocatable Jason Andryuk
  2024-04-10 19:48 ` [PATCH 1/5] xen: sync elfnote.h from xen tree Jason Andryuk
@ 2024-04-10 19:48 ` Jason Andryuk
  2024-04-10 21:00   ` Brian Gerst
  2024-04-10 19:48 ` [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh() Jason Andryuk
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Andryuk @ 2024-04-10 19:48 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

The PVH entrypoint is 32bit non-PIC code running the uncompressed
vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
(16MB).  The kernel is loaded at that physical address inside the VM by
the VMM software (Xen/QEMU).

When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
into the PVH container.  There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB.  This creates a conflict where the PVH
kernel cannot be loaded at that address.

Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address.  Only the 64bit entry path is converted.  A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.

This makes the code PIC, but the page tables need to be updated as well
to handle running from the kernel high map.

The UNWIND_HINT_END_OF_STACK is to silence:
vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
after the lret into 64bit code.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
---
 arch/x86/platform/pvh/head.S | 44 ++++++++++++++++++++++++++++--------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index f7235ef87bc3..bb1e582e32b1 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -7,6 +7,7 @@
 	.code32
 	.text
 #define _pa(x)          ((x) - __START_KERNEL_map)
+#define rva(x)          ((x) - pvh_start_xen)
 
 #include <linux/elfnote.h>
 #include <linux/init.h>
@@ -54,7 +55,25 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	UNWIND_HINT_END_OF_STACK
 	cld
 
-	lgdt (_pa(gdt))
+	/*
+	 * See the comment for startup_32 for more details.  We need to
+	 * execute a call to get the execution address to be position
+	 * independent, but we don't have a stack.  Save and restore the
+	 * magic field of start_info in ebx, and use that as the stack.
+	 */
+	mov  (%ebx), %eax
+	leal 4(%ebx), %esp
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call 1f
+1:	popl %ebp
+	mov  %eax, (%ebx)
+	subl $rva(1b), %ebp
+	movl $0, %esp
+
+	leal rva(gdt)(%ebp), %eax
+	leal rva(gdt_start)(%ebp), %ecx
+	movl %ecx, 2(%eax)
+	lgdt (%eax)
 
 	mov $PVH_DS_SEL,%eax
 	mov %eax,%ds
@@ -62,14 +81,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	mov %eax,%ss
 
 	/* Stash hvm_start_info. */
-	mov $_pa(pvh_start_info), %edi
+	leal rva(pvh_start_info)(%ebp), %edi
 	mov %ebx, %esi
-	mov _pa(pvh_start_info_sz), %ecx
+	movl rva(pvh_start_info_sz)(%ebp), %ecx
 	shr $2,%ecx
 	rep
 	movsl
 
-	mov $_pa(early_stack_end), %esp
+	leal rva(early_stack_end)(%ebp), %esp
 
 	/* Enable PAE mode. */
 	mov %cr4, %eax
@@ -84,28 +103,33 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	wrmsr
 
 	/* Enable pre-constructed page tables. */
-	mov $_pa(init_top_pgt), %eax
+	leal rva(init_top_pgt)(%ebp), %eax
 	mov %eax, %cr3
 	mov $(X86_CR0_PG | X86_CR0_PE), %eax
 	mov %eax, %cr0
 
 	/* Jump to 64-bit mode. */
-	ljmp $PVH_CS_SEL, $_pa(1f)
+	pushl $PVH_CS_SEL
+	leal  rva(1f)(%ebp), %eax
+	pushl %eax
+	lretl
 
 	/* 64-bit entry point. */
 	.code64
 1:
+	UNWIND_HINT_END_OF_STACK
+
 	/* Set base address in stack canary descriptor. */
 	mov $MSR_GS_BASE,%ecx
-	mov $_pa(canary), %eax
+	leal rva(canary)(%ebp), %eax
 	xor %edx, %edx
 	wrmsr
 
 	call xen_prepare_pvh
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	lea rva(pvh_bootparams)(%ebp), %rsi
+	lea rva(startup_64)(%ebp), %rax
 	ANNOTATE_RETPOLINE_SAFE
 	jmp *%rax
 
@@ -143,7 +167,7 @@ SYM_CODE_END(pvh_start_xen)
 	.balign 8
 SYM_DATA_START_LOCAL(gdt)
 	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
+	.long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
 	.word 0
 SYM_DATA_END(gdt)
 SYM_DATA_START_LOCAL(gdt_start)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh()
  2024-04-10 19:48 [PATCH 0/5] x86/pvh: Make PVH entry relocatable Jason Andryuk
  2024-04-10 19:48 ` [PATCH 1/5] xen: sync elfnote.h from xen tree Jason Andryuk
  2024-04-10 19:48 ` [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64 Jason Andryuk
@ 2024-04-10 19:48 ` Jason Andryuk
  2024-05-23 11:14   ` Jürgen Groß
  2024-04-10 19:48 ` [PATCH 4/5] x86/kernel: Move page table macros to new header Jason Andryuk
  2024-04-10 19:48 ` [PATCH 5/5] x86/pvh: Add 64bit relocation page tables Jason Andryuk
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Andryuk @ 2024-04-10 19:48 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

phys_base needs to be set for __pa() to work in xen_pvh_init() when
finding the hypercall page.  Set it before calling into
xen_prepare_pvh(), which calls xen_pvh_init().  Clear it afterward to
avoid __startup_64() adding to it and creating an incorrect value.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
Instead of setting and clearing phys_base, a dedicated variable could be
used just for the hypercall page.  Having phys_base set properly may
avoid further issues if the use of phys_base or __pa() grows.
---
 arch/x86/platform/pvh/head.S | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index bb1e582e32b1..c08d08d8cc92 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -125,7 +125,17 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	xor %edx, %edx
 	wrmsr
 
+	/* Calculate load offset from LOAD_PHYSICAL_ADDR and store in
+	 * phys_base.  __pa() needs phys_base set to calculate the
+	 * hypercall page in xen_pvh_init(). */
+	movq %rbp, %rbx
+	subq $LOAD_PHYSICAL_ADDR, %rbx
+	movq %rbx, phys_base(%rip)
 	call xen_prepare_pvh
+	/* Clear phys_base.  __startup_64 will *add* to its value,
+	 * so reset to 0. */
+	xor  %rbx, %rbx
+	movq %rbx, phys_base(%rip)
 
 	/* startup_64 expects boot_params in %rsi. */
 	lea rva(pvh_bootparams)(%ebp), %rsi
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] x86/kernel: Move page table macros to new header
  2024-04-10 19:48 [PATCH 0/5] x86/pvh: Make PVH entry relocatable Jason Andryuk
                   ` (2 preceding siblings ...)
  2024-04-10 19:48 ` [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh() Jason Andryuk
@ 2024-04-10 19:48 ` Jason Andryuk
  2024-05-23 11:40   ` Juergen Gross
  2024-05-23 13:59   ` Thomas Gleixner
  2024-04-10 19:48 ` [PATCH 5/5] x86/pvh: Add 64bit relocation page tables Jason Andryuk
  4 siblings, 2 replies; 15+ messages in thread
From: Jason Andryuk @ 2024-04-10 19:48 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to a new header so they can be re-used.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
checkpatch.pl gives an error: "ERROR: Macros with multiple statements
should be enclosed in a do - while loop" about the moved PMDS macro.
But PMDS is an assembler macro, so its not applicable.
---
 arch/x86/kernel/head_64.S            | 22 ++--------------------
 arch/x86/kernel/pgtable_64_helpers.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+), 20 deletions(-)
 create mode 100644 arch/x86/kernel/pgtable_64_helpers.h

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d4918d03efb4..4b036f3220f2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -27,17 +27,12 @@
 #include <asm/fixmap.h>
 #include <asm/smp.h>
 
+#include "pgtable_64_helpers.h"
+
 /*
  * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
  * because we need identity-mapped pages.
  */
-#define l4_index(x)	(((x) >> 39) & 511)
-#define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
-
-L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4)
-L4_START_KERNEL = l4_index(__START_KERNEL_map)
-
-L3_START_KERNEL = pud_index(__START_KERNEL_map)
 
 	.text
 	__HEAD
@@ -619,9 +614,6 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
 SYM_CODE_END(vc_no_ghcb)
 #endif
 
-#define SYM_DATA_START_PAGE_ALIGNED(name)			\
-	SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
-
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
  * Each PGD needs to be 8k long and 8k aligned.  We do not
@@ -643,14 +635,6 @@ SYM_CODE_END(vc_no_ghcb)
 #define PTI_USER_PGD_FILL	0
 #endif
 
-/* Automate the creation of 1 to 1 mapping pmd entries */
-#define PMDS(START, PERM, COUNT)			\
-	i = 0 ;						\
-	.rept (COUNT) ;					\
-	.quad	(START) + (i << PMD_SHIFT) + (PERM) ;	\
-	i = i + 1 ;					\
-	.endr
-
 	__INITDATA
 	.balign 4
 
@@ -749,8 +733,6 @@ SYM_DATA_START_PAGE_ALIGNED(level1_fixmap_pgt)
 	.endr
 SYM_DATA_END(level1_fixmap_pgt)
 
-#undef PMDS
-
 	.data
 	.align 16
 
diff --git a/arch/x86/kernel/pgtable_64_helpers.h b/arch/x86/kernel/pgtable_64_helpers.h
new file mode 100644
index 000000000000..0ae87d768ce2
--- /dev/null
+++ b/arch/x86/kernel/pgtable_64_helpers.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PGTABLES_64_H__
+#define __PGTABLES_64_H__
+
+#ifdef __ASSEMBLY__
+
+#define l4_index(x)	(((x) >> 39) & 511)
+#define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+
+L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4)
+L4_START_KERNEL = l4_index(__START_KERNEL_map)
+
+L3_START_KERNEL = pud_index(__START_KERNEL_map)
+
+#define SYM_DATA_START_PAGE_ALIGNED(name)			\
+	SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
+
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)			\
+	i = 0 ;						\
+	.rept (COUNT) ;					\
+	.quad	(START) + (i << PMD_SHIFT) + (PERM) ;	\
+	i = i + 1 ;					\
+	.endr
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __PGTABLES_64_H__ */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] x86/pvh: Add 64bit relocation page tables
  2024-04-10 19:48 [PATCH 0/5] x86/pvh: Make PVH entry relocatable Jason Andryuk
                   ` (3 preceding siblings ...)
  2024-04-10 19:48 ` [PATCH 4/5] x86/kernel: Move page table macros to new header Jason Andryuk
@ 2024-04-10 19:48 ` Jason Andryuk
  2024-05-23 12:11   ` Juergen Gross
  4 siblings, 1 reply; 15+ messages in thread
From: Jason Andryuk @ 2024-04-10 19:48 UTC (permalink / raw)
  To: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

The PVH entry point is 32bit.  For a 64bit kernel, the entry point must
switch to 64bit mode, which requires a set of page tables.  In the past,
PVH used init_top_pgt.

This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
page tables are prebuilt for this address.  If the kernel is loaded at a
different address, they need to be adjusted.

__startup_64() adjusts the prebuilt page tables for the physical load
address, but it is 64bit code.  The 32bit PVH entry code can't call it
to adjust the page tables, so it can't readily be re-used.

64bit PVH entry needs page tables set up for identity map, the kernel
high map and the direct map.  pvh_start_xen() enters identity mapped.
Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
into the highmap.  The direct map is used for __va() on the initramfs
and other guest physical addresses.

Add a dedicated set of prebuild page tables for PVH entry.  They are
adjusted in assembly before loading.

Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
along with the kernel's loading constraints.  The maximum load address,
KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
page.  It could be larger with more pages.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
Instead of adding 5 pages of prebuilt page tables, they could be
contructed dynamically in the .bss area.  They are then only used for
PVH entry and until transitioning to init_top_pgt.  The .bss is later
cleared.  It's safer to add the dedicated pages, so that is done here.
---
 arch/x86/platform/pvh/head.S | 105 ++++++++++++++++++++++++++++++++++-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index c08d08d8cc92..4af3cfbcf2f8 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -21,6 +21,8 @@
 #include <asm/nospec-branch.h>
 #include <xen/interface/elfnote.h>
 
+#include "../kernel/pgtable_64_helpers.h"
+
 	__HEAD
 
 /*
@@ -102,8 +104,47 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	btsl $_EFER_LME, %eax
 	wrmsr
 
+	mov %ebp, %ebx
+	subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
+	jz .Lpagetable_done
+
+	/* Fixup page-tables for relocation. */
+	leal rva(pvh_init_top_pgt)(%ebp), %edi
+	movl $512, %ecx
+2:
+	testl $_PAGE_PRESENT, 0x00(%edi)
+	jz 1f
+	addl %ebx, 0x00(%edi)
+1:
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+	/* L3 ident has a single entry. */
+	leal rva(pvh_level3_ident_pgt)(%ebp), %edi
+	addl %ebx, 0x00(%edi)
+
+	leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
+	addl %ebx, (4096 - 16)(%edi)
+	addl %ebx, (4096 - 8)(%edi)
+
+	/* pvh_level2_ident_pgt is fine - large pages */
+
+	/* pvh_level2_kernel_pgt needs adjustment - large pages */
+	leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
+	movl $512, %ecx
+2:
+	testl $_PAGE_PRESENT, 0x00(%edi)
+	jz 1f
+	addl %ebx, 0x00(%edi)
+1:
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+.Lpagetable_done:
 	/* Enable pre-constructed page tables. */
-	leal rva(init_top_pgt)(%ebp), %eax
+	leal rva(pvh_init_top_pgt)(%ebp), %eax
 	mov %eax, %cr3
 	mov $(X86_CR0_PG | X86_CR0_PE), %eax
 	mov %eax, %cr0
@@ -197,5 +238,67 @@ SYM_DATA_START_LOCAL(early_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
 SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
 
+#ifdef CONFIG_X86_64
+/*
+ * Xen PVH needs a set of identity mapped and kernel high mapping
+ * page tables.  pvh_start_xen starts running on the identity mapped
+ * page tables, but xen_prepare_pvh calls into the high mapping.
+ * These page tables need to be relocatable and are only used until
+ * startup_64 transitions to init_top_pgt.
+ */
+SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
+	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.org    pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0
+	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.org    pvh_init_top_pgt + L4_START_KERNEL*8, 0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad   pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+SYM_DATA_END(pvh_init_top_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
+	.quad	pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.fill	511, 8, 0
+SYM_DATA_END(pvh_level3_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
+	/*
+	 * Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 *
+	 * Note: This sets _PAGE_GLOBAL despite whether
+	 * the CPU supports it or it is enabled.  But,
+	 * the CPU should ignore the bit.
+	 */
+	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+SYM_DATA_END(pvh_level2_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
+	.fill	L3_START_KERNEL,8,0
+	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
+	.quad	pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.quad	0 /* no fixmap */
+SYM_DATA_END(pvh_level3_kernel_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
+	/*
+	 * Kernel high mapping.
+	 *
+	 * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
+	 * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
+	 * 512 MiB otherwise.
+	 *
+	 * (NOTE: after that starts the module area, see MODULES_VADDR.)
+	 *
+	 * This table is eventually used by the kernel during normal runtime.
+	 * Care must be taken to clear out undesired bits later, like _PAGE_RW
+	 * or _PAGE_GLOBAL in some cases.
+	 */
+	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
+SYM_DATA_END(pvh_level2_kernel_pgt)
+
+	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_RELOC,
+		     .long CONFIG_PHYSICAL_ALIGN;
+		     .long LOAD_PHYSICAL_ADDR;
+		     .long KERNEL_IMAGE_SIZE - 1)
+#endif
+
 	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
 	             _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64
  2024-04-10 19:48 ` [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64 Jason Andryuk
@ 2024-04-10 21:00   ` Brian Gerst
  2024-04-11 15:26     ` Jason Andryuk
  0 siblings, 1 reply; 15+ messages in thread
From: Brian Gerst @ 2024-04-10 21:00 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini,
	xen-devel, linux-kernel

On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk <jason.andryuk@amd.com> wrote:
>
> The PVH entrypoint is 32bit non-PIC code running the uncompressed
> vmlinux at its load address CONFIG_PHYSICAL_START - default 0x1000000
> (16MB).  The kernel is loaded at that physical address inside the VM by
> the VMM software (Xen/QEMU).
>
> When running a Xen PVH Dom0, the host reserved addresses are mapped 1-1
> into the PVH container.  There exist system firmwares (Coreboot/EDK2)
> with reserved memory at 16MB.  This creates a conflict where the PVH
> kernel cannot be loaded at that address.
>
> Modify the PVH entrypoint to be position-indepedent to allow flexibility
> in load address.  Only the 64bit entry path is converted.  A 32bit
> kernel is not PIC, so calling into other parts of the kernel, like
> xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
> relocated.
>
> This makes the code PIC, but the page tables need to be updated as well
> to handle running from the kernel high map.
>
> The UNWIND_HINT_END_OF_STACK is to silence:
> vmlinux.o: warning: objtool: pvh_start_xen+0x7f: unreachable instruction
> after the lret into 64bit code.
>
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> ---
>  arch/x86/platform/pvh/head.S | 44 ++++++++++++++++++++++++++++--------
>  1 file changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index f7235ef87bc3..bb1e582e32b1 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -7,6 +7,7 @@
>         .code32
>         .text
>  #define _pa(x)          ((x) - __START_KERNEL_map)
> +#define rva(x)          ((x) - pvh_start_xen)
>
>  #include <linux/elfnote.h>
>  #include <linux/init.h>
> @@ -54,7 +55,25 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>         UNWIND_HINT_END_OF_STACK
>         cld
>
> -       lgdt (_pa(gdt))
> +       /*
> +        * See the comment for startup_32 for more details.  We need to
> +        * execute a call to get the execution address to be position
> +        * independent, but we don't have a stack.  Save and restore the
> +        * magic field of start_info in ebx, and use that as the stack.
> +        */
> +       mov  (%ebx), %eax
> +       leal 4(%ebx), %esp
> +       ANNOTATE_INTRA_FUNCTION_CALL
> +       call 1f
> +1:     popl %ebp
> +       mov  %eax, (%ebx)
> +       subl $rva(1b), %ebp
> +       movl $0, %esp
> +
> +       leal rva(gdt)(%ebp), %eax
> +       leal rva(gdt_start)(%ebp), %ecx
> +       movl %ecx, 2(%eax)
> +       lgdt (%eax)
>
>         mov $PVH_DS_SEL,%eax
>         mov %eax,%ds
> @@ -62,14 +81,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>         mov %eax,%ss
>
>         /* Stash hvm_start_info. */
> -       mov $_pa(pvh_start_info), %edi
> +       leal rva(pvh_start_info)(%ebp), %edi
>         mov %ebx, %esi
> -       mov _pa(pvh_start_info_sz), %ecx
> +       movl rva(pvh_start_info_sz)(%ebp), %ecx
>         shr $2,%ecx
>         rep
>         movsl
>
> -       mov $_pa(early_stack_end), %esp
> +       leal rva(early_stack_end)(%ebp), %esp
>
>         /* Enable PAE mode. */
>         mov %cr4, %eax
> @@ -84,28 +103,33 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>         wrmsr
>
>         /* Enable pre-constructed page tables. */
> -       mov $_pa(init_top_pgt), %eax
> +       leal rva(init_top_pgt)(%ebp), %eax
>         mov %eax, %cr3
>         mov $(X86_CR0_PG | X86_CR0_PE), %eax
>         mov %eax, %cr0
>
>         /* Jump to 64-bit mode. */
> -       ljmp $PVH_CS_SEL, $_pa(1f)
> +       pushl $PVH_CS_SEL
> +       leal  rva(1f)(%ebp), %eax
> +       pushl %eax
> +       lretl
>
>         /* 64-bit entry point. */
>         .code64
>  1:
> +       UNWIND_HINT_END_OF_STACK
> +
>         /* Set base address in stack canary descriptor. */
>         mov $MSR_GS_BASE,%ecx
> -       mov $_pa(canary), %eax
> +       leal rva(canary)(%ebp), %eax

Since this is in 64-bit mode, RIP-relative addressing can be used.

>         xor %edx, %edx
>         wrmsr
>
>         call xen_prepare_pvh
>
>         /* startup_64 expects boot_params in %rsi. */
> -       mov $_pa(pvh_bootparams), %rsi
> -       mov $_pa(startup_64), %rax
> +       lea rva(pvh_bootparams)(%ebp), %rsi
> +       lea rva(startup_64)(%ebp), %rax

RIP-relative here too.

>         ANNOTATE_RETPOLINE_SAFE
>         jmp *%rax
>
> @@ -143,7 +167,7 @@ SYM_CODE_END(pvh_start_xen)
>         .balign 8
>  SYM_DATA_START_LOCAL(gdt)
>         .word gdt_end - gdt_start
> -       .long _pa(gdt_start)
> +       .long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
>         .word 0
>  SYM_DATA_END(gdt)
>  SYM_DATA_START_LOCAL(gdt_start)
> --
> 2.44.0
>
>

Brian Gerst

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64
  2024-04-10 21:00   ` Brian Gerst
@ 2024-04-11 15:26     ` Jason Andryuk
  2024-04-11 18:15       ` Brian Gerst
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Andryuk @ 2024-04-11 15:26 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini,
	xen-devel, linux-kernel

On 2024-04-10 17:00, Brian Gerst wrote:
> On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk <jason.andryuk@amd.com> wrote:

>>          /* 64-bit entry point. */
>>          .code64
>>   1:
>> +       UNWIND_HINT_END_OF_STACK
>> +
>>          /* Set base address in stack canary descriptor. */
>>          mov $MSR_GS_BASE,%ecx
>> -       mov $_pa(canary), %eax
>> +       leal rva(canary)(%ebp), %eax
> 
> Since this is in 64-bit mode, RIP-relative addressing can be used.
> 
>>          xor %edx, %edx
>>          wrmsr
>>
>>          call xen_prepare_pvh
>>
>>          /* startup_64 expects boot_params in %rsi. */
>> -       mov $_pa(pvh_bootparams), %rsi
>> -       mov $_pa(startup_64), %rax
>> +       lea rva(pvh_bootparams)(%ebp), %rsi
>> +       lea rva(startup_64)(%ebp), %rax
> 
> RIP-relative here too.

Yes, thanks for catching that.  With the RIP-relative conversion, there 
is now:
vmlinux.o: warning: objtool: pvh_start_xen+0x10d: relocation to !ENDBR: 
startup_64+0x0

I guess RIP-relative made it visible.  That can be quieted by adding 
ANNOTATE_NOENDBR to startup_64.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64
  2024-04-11 15:26     ` Jason Andryuk
@ 2024-04-11 18:15       ` Brian Gerst
  0 siblings, 0 replies; 15+ messages in thread
From: Brian Gerst @ 2024-04-11 18:15 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Juergen Gross, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini,
	xen-devel, linux-kernel

On Thu, Apr 11, 2024 at 11:26 AM Jason Andryuk <jason.andryuk@amd.com> wrote:
>
> On 2024-04-10 17:00, Brian Gerst wrote:
> > On Wed, Apr 10, 2024 at 3:50 PM Jason Andryuk <jason.andryuk@amd.com> wrote:
>
> >>          /* 64-bit entry point. */
> >>          .code64
> >>   1:
> >> +       UNWIND_HINT_END_OF_STACK
> >> +
> >>          /* Set base address in stack canary descriptor. */
> >>          mov $MSR_GS_BASE,%ecx
> >> -       mov $_pa(canary), %eax
> >> +       leal rva(canary)(%ebp), %eax
> >
> > Since this is in 64-bit mode, RIP-relative addressing can be used.
> >
> >>          xor %edx, %edx
> >>          wrmsr
> >>
> >>          call xen_prepare_pvh
> >>
> >>          /* startup_64 expects boot_params in %rsi. */
> >> -       mov $_pa(pvh_bootparams), %rsi
> >> -       mov $_pa(startup_64), %rax
> >> +       lea rva(pvh_bootparams)(%ebp), %rsi
> >> +       lea rva(startup_64)(%ebp), %rax
> >
> > RIP-relative here too.
>
> Yes, thanks for catching that.  With the RIP-relative conversion, there
> is now:
> vmlinux.o: warning: objtool: pvh_start_xen+0x10d: relocation to !ENDBR:
> startup_64+0x0
>
> I guess RIP-relative made it visible.  That can be quieted by adding
> ANNOTATE_NOENDBR to startup_64.

Change it to a direct jump, since branches are always RIP-relative.

Brian Gerst

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] xen: sync elfnote.h from xen tree
  2024-04-10 19:48 ` [PATCH 1/5] xen: sync elfnote.h from xen tree Jason Andryuk
@ 2024-05-10  8:09   ` Jürgen Groß
  0 siblings, 0 replies; 15+ messages in thread
From: Jürgen Groß @ 2024-05-10  8:09 UTC (permalink / raw)
  To: Jason Andryuk, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel

On 10.04.24 21:48, Jason Andryuk wrote:
> Sync Xen's elfnote.h header from xen.git to pull in the
> XEN_ELFNOTE_PHYS32_RELOC define.
> 
> xen commit dfc9fab00378 ("x86/PVH: Support relocatable dom0 kernels")
> 
> This is a copy except for the removal of the emacs editor config at the
> end of the file.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh()
  2024-04-10 19:48 ` [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh() Jason Andryuk
@ 2024-05-23 11:14   ` Jürgen Groß
  0 siblings, 0 replies; 15+ messages in thread
From: Jürgen Groß @ 2024-05-23 11:14 UTC (permalink / raw)
  To: Jason Andryuk, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel

On 10.04.24 21:48, Jason Andryuk wrote:
> phys_base needs to be set for __pa() to work in xen_pvh_init() when
> finding the hypercall page.  Set it before calling into
> xen_prepare_pvh(), which calls xen_pvh_init().  Clear it afterward to
> avoid __startup_64() adding to it and creating an incorrect value.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> Instead of setting and clearing phys_base, a dedicated variable could be
> used just for the hypercall page.  Having phys_base set properly may
> avoid further issues if the use of phys_base or __pa() grows.
> ---
>   arch/x86/platform/pvh/head.S | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index bb1e582e32b1..c08d08d8cc92 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -125,7 +125,17 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>   	xor %edx, %edx
>   	wrmsr
>   
> +	/* Calculate load offset from LOAD_PHYSICAL_ADDR and store in
> +	 * phys_base.  __pa() needs phys_base set to calculate the
> +	 * hypercall page in xen_pvh_init(). */

Please use the correct style for multi-line comments:

	/*
	 * comment lines
	 * comment lines
	 */

> +	movq %rbp, %rbx
> +	subq $LOAD_PHYSICAL_ADDR, %rbx
> +	movq %rbx, phys_base(%rip)
>   	call xen_prepare_pvh
> +	/* Clear phys_base.  __startup_64 will *add* to its value,
> +	 * so reset to 0. */

Comment style again.

> +	xor  %rbx, %rbx
> +	movq %rbx, phys_base(%rip)
>   
>   	/* startup_64 expects boot_params in %rsi. */
>   	lea rva(pvh_bootparams)(%ebp), %rsi

With above fixed:

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] x86/kernel: Move page table macros to new header
  2024-04-10 19:48 ` [PATCH 4/5] x86/kernel: Move page table macros to new header Jason Andryuk
@ 2024-05-23 11:40   ` Juergen Gross
  2024-05-23 13:59   ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2024-05-23 11:40 UTC (permalink / raw)
  To: Jason Andryuk, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 897 bytes --]

On 10.04.24 21:48, Jason Andryuk wrote:
> The PVH entry point will need an additional set of prebuild page tables.
> Move the macros and defines to a new header so they can be re-used.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>

With the one nit below addressed:

Reviewed-by: Juergen Gross <jgross@suse.com>

...

> diff --git a/arch/x86/kernel/pgtable_64_helpers.h b/arch/x86/kernel/pgtable_64_helpers.h
> new file mode 100644
> index 000000000000..0ae87d768ce2
> --- /dev/null
> +++ b/arch/x86/kernel/pgtable_64_helpers.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PGTABLES_64_H__
> +#define __PGTABLES_64_H__
> +
> +#ifdef __ASSEMBLY__
> +
> +#define l4_index(x)	(((x) >> 39) & 511)
> +#define pud_index(x)	(((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))

Please fix the minor style issue in this line by s/-/ - /


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] x86/pvh: Add 64bit relocation page tables
  2024-04-10 19:48 ` [PATCH 5/5] x86/pvh: Add 64bit relocation page tables Jason Andryuk
@ 2024-05-23 12:11   ` Juergen Gross
  0 siblings, 0 replies; 15+ messages in thread
From: Juergen Gross @ 2024-05-23 12:11 UTC (permalink / raw)
  To: Jason Andryuk, Boris Ostrovsky, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel


[-- Attachment #1.1.1: Type: text/plain, Size: 6613 bytes --]

On 10.04.24 21:48, Jason Andryuk wrote:
> The PVH entry point is 32bit.  For a 64bit kernel, the entry point must
> switch to 64bit mode, which requires a set of page tables.  In the past,
> PVH used init_top_pgt.
> 
> This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
> page tables are prebuilt for this address.  If the kernel is loaded at a
> different address, they need to be adjusted.
> 
> __startup_64() adjusts the prebuilt page tables for the physical load
> address, but it is 64bit code.  The 32bit PVH entry code can't call it
> to adjust the page tables, so it can't readily be re-used.
> 
> 64bit PVH entry needs page tables set up for identity map, the kernel
> high map and the direct map.  pvh_start_xen() enters identity mapped.
> Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
> into the highmap.  The direct map is used for __va() on the initramfs
> and other guest physical addresses.
> 
> Add a dedicated set of prebuild page tables for PVH entry.  They are
> adjusted in assembly before loading.
> 
> Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
> along with the kernel's loading constraints.  The maximum load address,
> KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
> page.  It could be larger with more pages.
> 
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> Instead of adding 5 pages of prebuilt page tables, they could be
> contructed dynamically in the .bss area.  They are then only used for
> PVH entry and until transitioning to init_top_pgt.  The .bss is later
> cleared.  It's safer to add the dedicated pages, so that is done here.
> ---
>   arch/x86/platform/pvh/head.S | 105 ++++++++++++++++++++++++++++++++++-
>   1 file changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
> index c08d08d8cc92..4af3cfbcf2f8 100644
> --- a/arch/x86/platform/pvh/head.S
> +++ b/arch/x86/platform/pvh/head.S
> @@ -21,6 +21,8 @@
>   #include <asm/nospec-branch.h>
>   #include <xen/interface/elfnote.h>
>   
> +#include "../kernel/pgtable_64_helpers.h"
> +
>   	__HEAD
>   
>   /*
> @@ -102,8 +104,47 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
>   	btsl $_EFER_LME, %eax
>   	wrmsr
>   
> +	mov %ebp, %ebx
> +	subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
> +	jz .Lpagetable_done
> +
> +	/* Fixup page-tables for relocation. */
> +	leal rva(pvh_init_top_pgt)(%ebp), %edi
> +	movl $512, %ecx

Please use PTRS_PER_PGD instead of the literal 512. Similar issue below.

> +2:
> +	testl $_PAGE_PRESENT, 0x00(%edi)
> +	jz 1f
> +	addl %ebx, 0x00(%edi)
> +1:
> +	addl $8, %edi
> +	decl %ecx
> +	jnz 2b
> +
> +	/* L3 ident has a single entry. */
> +	leal rva(pvh_level3_ident_pgt)(%ebp), %edi
> +	addl %ebx, 0x00(%edi)
> +
> +	leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
> +	addl %ebx, (4096 - 16)(%edi)
> +	addl %ebx, (4096 - 8)(%edi)

PAGE_SIZE instead of 4096, please.

> +
> +	/* pvh_level2_ident_pgt is fine - large pages */
> +
> +	/* pvh_level2_kernel_pgt needs adjustment - large pages */
> +	leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
> +	movl $512, %ecx
> +2:
> +	testl $_PAGE_PRESENT, 0x00(%edi)
> +	jz 1f
> +	addl %ebx, 0x00(%edi)
> +1:
> +	addl $8, %edi
> +	decl %ecx
> +	jnz 2b
> +
> +.Lpagetable_done:
>   	/* Enable pre-constructed page tables. */
> -	leal rva(init_top_pgt)(%ebp), %eax
> +	leal rva(pvh_init_top_pgt)(%ebp), %eax
>   	mov %eax, %cr3
>   	mov $(X86_CR0_PG | X86_CR0_PE), %eax
>   	mov %eax, %cr0
> @@ -197,5 +238,67 @@ SYM_DATA_START_LOCAL(early_stack)
>   	.fill BOOT_STACK_SIZE, 1, 0
>   SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
>   
> +#ifdef CONFIG_X86_64
> +/*
> + * Xen PVH needs a set of identity mapped and kernel high mapping
> + * page tables.  pvh_start_xen starts running on the identity mapped
> + * page tables, but xen_prepare_pvh calls into the high mapping.
> + * These page tables need to be relocatable and are only used until
> + * startup_64 transitions to init_top_pgt.
> + */
> +SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
> +	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> +	.org    pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0

Please add a space before and after the '*'.

> +	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> +	.org    pvh_init_top_pgt + L4_START_KERNEL*8, 0
> +	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
> +	.quad   pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
> +SYM_DATA_END(pvh_init_top_pgt)
> +
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
> +	.quad	pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> +	.fill	511, 8, 0
> +SYM_DATA_END(pvh_level3_ident_pgt)
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
> +	/*
> +	 * Since I easily can, map the first 1G.
> +	 * Don't set NX because code runs from these pages.
> +	 *
> +	 * Note: This sets _PAGE_GLOBAL despite whether
> +	 * the CPU supports it or it is enabled.  But,
> +	 * the CPU should ignore the bit.
> +	 */
> +	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
> +SYM_DATA_END(pvh_level2_ident_pgt)
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
> +	.fill	L3_START_KERNEL,8,0

Spaces after the commas.

> +	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
> +	.quad	pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
> +	.quad	0 /* no fixmap */
> +SYM_DATA_END(pvh_level3_kernel_pgt)
> +
> +SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
> +	/*
> +	 * Kernel high mapping.
> +	 *
> +	 * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
> +	 * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
> +	 * 512 MiB otherwise.
> +	 *
> +	 * (NOTE: after that starts the module area, see MODULES_VADDR.)
> +	 *
> +	 * This table is eventually used by the kernel during normal runtime.
> +	 * Care must be taken to clear out undesired bits later, like _PAGE_RW
> +	 * or _PAGE_GLOBAL in some cases.
> +	 */
> +	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)

Spaces around '/'.

> +SYM_DATA_END(pvh_level2_kernel_pgt)
> +
> +	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_RELOC,
> +		     .long CONFIG_PHYSICAL_ALIGN;
> +		     .long LOAD_PHYSICAL_ADDR;
> +		     .long KERNEL_IMAGE_SIZE - 1)
> +#endif
> +
>   	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
>   	             _ASM_PTR (pvh_start_xen - __START_KERNEL_map))


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] x86/kernel: Move page table macros to new header
  2024-04-10 19:48 ` [PATCH 4/5] x86/kernel: Move page table macros to new header Jason Andryuk
  2024-05-23 11:40   ` Juergen Gross
@ 2024-05-23 13:59   ` Thomas Gleixner
  2024-05-23 14:07     ` Borislav Petkov
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2024-05-23 13:59 UTC (permalink / raw)
  To: Jason Andryuk, Juergen Gross, Boris Ostrovsky, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
	Stefano Stabellini, Oleksandr Tyshchenko, Paolo Bonzini
  Cc: xen-devel, linux-kernel, Jason Andryuk

On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> ---
>  arch/x86/kernel/head_64.S            | 22 ++--------------------
>  arch/x86/kernel/pgtable_64_helpers.h | 28 ++++++++++++++++++++++++++++

That's the wrong place as you want to include it from arch/x86/platform.

arch/x86/include/asm/....

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/5] x86/kernel: Move page table macros to new header
  2024-05-23 13:59   ` Thomas Gleixner
@ 2024-05-23 14:07     ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2024-05-23 14:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jason Andryuk, Juergen Gross, Boris Ostrovsky, Ingo Molnar,
	Dave Hansen, x86, H. Peter Anvin, Stefano Stabellini,
	Oleksandr Tyshchenko, Paolo Bonzini, xen-devel, linux-kernel

On Thu, May 23, 2024 at 03:59:43PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> > ---
> >  arch/x86/kernel/head_64.S            | 22 ++--------------------
> >  arch/x86/kernel/pgtable_64_helpers.h | 28 ++++++++++++++++++++++++++++
> 
> That's the wrong place as you want to include it from arch/x86/platform.
> 
> arch/x86/include/asm/....

... and there already is a header waiting:

arch/x86/include/asm/pgtable_64.h

so no need for a new one.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-05-23 14:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-10 19:48 [PATCH 0/5] x86/pvh: Make PVH entry relocatable Jason Andryuk
2024-04-10 19:48 ` [PATCH 1/5] xen: sync elfnote.h from xen tree Jason Andryuk
2024-05-10  8:09   ` Jürgen Groß
2024-04-10 19:48 ` [PATCH 2/5] x86/pvh: Make PVH entrypoint PIC for x86-64 Jason Andryuk
2024-04-10 21:00   ` Brian Gerst
2024-04-11 15:26     ` Jason Andryuk
2024-04-11 18:15       ` Brian Gerst
2024-04-10 19:48 ` [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh() Jason Andryuk
2024-05-23 11:14   ` Jürgen Groß
2024-04-10 19:48 ` [PATCH 4/5] x86/kernel: Move page table macros to new header Jason Andryuk
2024-05-23 11:40   ` Juergen Gross
2024-05-23 13:59   ` Thomas Gleixner
2024-05-23 14:07     ` Borislav Petkov
2024-04-10 19:48 ` [PATCH 5/5] x86/pvh: Add 64bit relocation page tables Jason Andryuk
2024-05-23 12:11   ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox