All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel
@ 2024-03-13 19:30 Jason Andryuk
  2024-03-13 19:30 ` [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-13 19:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini

Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The other issue is that the Linux PVH entry point is not
position-independent.  It expects to run at the compiled
CONFIG_PHYSICAL_ADDRESS.

This patch set expands the PVH dom0 builder to try to relocate the
kernel if needed and possible.  XEN_ELFNOTE_PVH_RELOCATION is added for
kernels to indicate they are relocatable and their acceptable address
range and alignment.

The first patch reverts "xen/x86: bzImage parse kernel_alignment" since
the alignment will be specified by the ELF note.

The second patch expands ELF note printing so arrays of values can be
handled.

The third patch expands the pvh dom0 kernel placement code.

I'll post an additional patch showing the Linux changes to make PVH
relocatable.

Jason Andryuk (3):
  Revert "xen/x86: bzImage parse kernel_alignment"
  libelf: Expand ELF note printing
  x86/PVH: Support relocatable dom0 kernels

 xen/arch/x86/bzimage.c             |   4 +-
 xen/arch/x86/hvm/dom0_build.c      | 112 ++++++++++++++++++++++++++++-
 xen/arch/x86/include/asm/bzimage.h |   2 +-
 xen/arch/x86/pv/dom0_build.c       |   2 +-
 xen/common/libelf/libelf-dominfo.c |  72 +++++++++++++------
 xen/include/public/elfnote.h       |  11 +++
 xen/include/xen/libelf.h           |   3 +
 7 files changed, 177 insertions(+), 29 deletions(-)

-- 
2.44.0



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-13 19:30 [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk
@ 2024-03-13 19:30 ` Jason Andryuk
  2024-03-14  7:11   ` Jan Beulich
  2024-03-13 19:30 ` [PATCH v2 2/3] libelf: Expand ELF note printing Jason Andryuk
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-13 19:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu

A new ELF note will specify the alignment for a relocatable PVH kernel.
ELF notes are suitable for vmlinux and other ELF files, so this
Linux-specific bzImage parsing in unnecessary.

This reverts commit c44cac229067faeec8f49247d1cf281723ac2d40.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/arch/x86/bzimage.c             | 4 +---
 xen/arch/x86/hvm/dom0_build.c      | 4 +---
 xen/arch/x86/include/asm/bzimage.h | 2 +-
 xen/arch/x86/pv/dom0_build.c       | 2 +-
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index 0f4cfc49f7..ac4fd428be 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -105,7 +105,7 @@ unsigned long __init bzimage_headroom(void *image_start,
 }
 
 int __init bzimage_parse(void *image_base, void **image_start,
-                         unsigned long *image_len, unsigned int *align)
+                         unsigned long *image_len)
 {
     struct setup_header *hdr = (struct setup_header *)(*image_start);
     int err = bzimage_check(hdr, *image_len);
@@ -118,8 +118,6 @@ int __init bzimage_parse(void *image_base, void **image_start,
     {
         *image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset;
         *image_len = hdr->payload_length;
-        if ( align )
-            *align = hdr->kernel_alignment;
     }
 
     if ( elf_is_elfbinary(*image_start, *image_len) )
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bbae8a5645..0ceda4140b 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -548,14 +548,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     struct elf_binary elf;
     struct elf_dom_parms parms;
     paddr_t last_addr;
-    unsigned int align = 0;
     struct hvm_start_info start_info = { 0 };
     struct hvm_modlist_entry mod = { 0 };
     struct vcpu *v = d->vcpu[0];
     int rc;
 
-    rc = bzimage_parse(image_base, &image_start, &image_len, &align);
-    if ( rc != 0 )
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
     {
         printk("Error trying to detect bz compressed kernel\n");
         return rc;
diff --git a/xen/arch/x86/include/asm/bzimage.h b/xen/arch/x86/include/asm/bzimage.h
index 2e04f5cc7b..7ed69d3910 100644
--- a/xen/arch/x86/include/asm/bzimage.h
+++ b/xen/arch/x86/include/asm/bzimage.h
@@ -6,6 +6,6 @@
 unsigned long bzimage_headroom(void *image_start, unsigned long image_length);
 
 int bzimage_parse(void *image_base, void **image_start,
-                  unsigned long *image_len, unsigned int *align);
+                  unsigned long *image_len);
 
 #endif /* __X86_BZIMAGE_H__ */
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index e9fa8a9a82..d8043fa58a 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -416,7 +416,7 @@ int __init dom0_construct_pv(struct domain *d,
 
     d->max_pages = ~0U;
 
-    if ( (rc = bzimage_parse(image_base, &image_start, &image_len, NULL)) != 0 )
+    if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
         return rc;
 
     if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/3] libelf: Expand ELF note printing
  2024-03-13 19:30 [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk
  2024-03-13 19:30 ` [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
@ 2024-03-13 19:30 ` Jason Andryuk
  2024-03-14 13:16   ` Jan Beulich
  2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
  2024-03-13 19:46 ` [PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk
  3 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-13 19:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich,
	Julien Grall, Stefano Stabellini, Wei Liu

The XEN_ELFNOTE_L1_MFN_VALID is an arrray of two values, but it is
printed as:
(XEN) ELF: note: L1_MFN_VALID = 0

Expand the printing to handle an array.
(XEN) ELF: note: L1_MFN_VALID = mask: 0x1 val: 0x1

ELFNOTE_OTHER prints the name and " = ", but not the value.
Implementing a switch case is needed to show the value, which should
include a newline.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
 xen/common/libelf/libelf-dominfo.c | 59 +++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 21 deletions(-)

diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index a13a5e4db6..7cc7b18a51 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -101,26 +101,30 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
 /* *INDENT-OFF* */
     static const struct {
         const char *name;
-        bool str;
+        enum {
+            ELFNOTE_INT,
+            ELFNOTE_STRING,
+            ELFNOTE_OTHER
+        } type;
     } note_desc[] = {
-        [XEN_ELFNOTE_ENTRY] = { "ENTRY", 0},
-        [XEN_ELFNOTE_HYPERCALL_PAGE] = { "HYPERCALL_PAGE", 0},
-        [XEN_ELFNOTE_VIRT_BASE] = { "VIRT_BASE", 0},
-        [XEN_ELFNOTE_INIT_P2M] = { "INIT_P2M", 0},
-        [XEN_ELFNOTE_PADDR_OFFSET] = { "PADDR_OFFSET", 0},
-        [XEN_ELFNOTE_HV_START_LOW] = { "HV_START_LOW", 0},
-        [XEN_ELFNOTE_XEN_VERSION] = { "XEN_VERSION", 1},
-        [XEN_ELFNOTE_GUEST_OS] = { "GUEST_OS", 1},
-        [XEN_ELFNOTE_GUEST_VERSION] = { "GUEST_VERSION", 1},
-        [XEN_ELFNOTE_LOADER] = { "LOADER", 1},
-        [XEN_ELFNOTE_PAE_MODE] = { "PAE_MODE", 1},
-        [XEN_ELFNOTE_FEATURES] = { "FEATURES", 1},
-        [XEN_ELFNOTE_SUPPORTED_FEATURES] = { "SUPPORTED_FEATURES", 0},
-        [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", 1},
-        [XEN_ELFNOTE_L1_MFN_VALID] = { "L1_MFN_VALID", false },
-        [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", 0 },
-        [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", 0 },
-        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", 0 },
+        [XEN_ELFNOTE_ENTRY] = { "ENTRY", ELFNOTE_INT },
+        [XEN_ELFNOTE_HYPERCALL_PAGE] = { "HYPERCALL_PAGE", ELFNOTE_INT },
+        [XEN_ELFNOTE_VIRT_BASE] = { "VIRT_BASE", ELFNOTE_INT },
+        [XEN_ELFNOTE_INIT_P2M] = { "INIT_P2M", ELFNOTE_INT },
+        [XEN_ELFNOTE_PADDR_OFFSET] = { "PADDR_OFFSET", ELFNOTE_INT },
+        [XEN_ELFNOTE_HV_START_LOW] = { "HV_START_LOW", ELFNOTE_INT },
+        [XEN_ELFNOTE_XEN_VERSION] = { "XEN_VERSION", ELFNOTE_STRING },
+        [XEN_ELFNOTE_GUEST_OS] = { "GUEST_OS", ELFNOTE_STRING },
+        [XEN_ELFNOTE_GUEST_VERSION] = { "GUEST_VERSION", ELFNOTE_STRING },
+        [XEN_ELFNOTE_LOADER] = { "LOADER", ELFNOTE_STRING },
+        [XEN_ELFNOTE_PAE_MODE] = { "PAE_MODE", ELFNOTE_STRING },
+        [XEN_ELFNOTE_FEATURES] = { "FEATURES", ELFNOTE_STRING },
+        [XEN_ELFNOTE_SUPPORTED_FEATURES] = { "SUPPORTED_FEATURES", ELFNOTE_INT },
+        [XEN_ELFNOTE_BSD_SYMTAB] = { "BSD_SYMTAB", ELFNOTE_STRING },
+        [XEN_ELFNOTE_L1_MFN_VALID] = { "L1_MFN_VALID", ELFNOTE_OTHER },
+        [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
+        [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
+        [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
     };
 /* *INDENT-ON* */
 
@@ -136,7 +140,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         return 0;
     }
 
-    if ( note_desc[type].str )
+    if ( note_desc[type].type == ELFNOTE_STRING )
     {
         str = elf_strval(elf, elf_note_desc(elf, note));
         if (str == NULL)
@@ -146,13 +150,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         parms->elf_notes[type].type = XEN_ENT_STR;
         parms->elf_notes[type].data.str = str;
     }
-    else
+    else if ( note_desc[type].type == ELFNOTE_INT )
     {
         val = elf_note_numeric(elf, note);
         elf_msg(elf, "ELF: note: %s = %#" PRIx64 "\n", note_desc[type].name, val);
         parms->elf_notes[type].type = XEN_ENT_LONG;
         parms->elf_notes[type].data.num = val;
     }
+    else
+    {
+        elf_msg(elf, "ELF: note: %s = ", note_desc[type].name);
+    }
     parms->elf_notes[type].name = note_desc[type].name;
 
     switch ( type )
@@ -217,6 +225,15 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
     case XEN_ELFNOTE_PHYS32_ENTRY:
         parms->phys_entry = val;
         break;
+
+    case XEN_ELFNOTE_L1_MFN_VALID:
+        if ( elf_uval(elf, note, descsz) != 2 * sizeof(uint64_t) )
+            return -1;
+
+        elf_msg(elf, "mask: %#"PRIx64" val: %#"PRIx64"\n",
+                elf_note_numeric_array(elf, note, 8, 0),
+                elf_note_numeric_array(elf, note, 8, 1));
+        break;
     }
     return 0;
 }
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-13 19:30 [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk
  2024-03-13 19:30 ` [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
  2024-03-13 19:30 ` [PATCH v2 2/3] libelf: Expand ELF note printing Jason Andryuk
@ 2024-03-13 19:30 ` Jason Andryuk
  2024-03-13 21:02   ` Jason Andryuk
                     ` (3 more replies)
  2024-03-13 19:46 ` [PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk
  3 siblings, 4 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-13 19:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné,
	Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini

Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.

Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.

The PVH entry code is not relocatable - it loads from absolute
addresses, which fail when the kernel is loaded at a different address.
With a suitably modified kernel, a reloctable entry point is possible.

Add XEN_ELFNOTE_PVH_RELOCATION which specifies the minimum, maximum and
alignment needed for the kernel.  The presence of the NOTE indicates the
kernel supports a relocatable entry path.

Change the loading to check for an acceptable load address.  If the
kernel is relocatable, support finding an alternate load address.

Link: https://gitlab.com/xen-project/xen/-/issues/180
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
ELF Note printing looks like:
(XEN) ELF: note: PVH_RELOCATION = min: 0x1000000 max: 0xffffffff align: 0x200000

v2:
Use elfnote for min, max & align - use 64bit values.
Print original and relocated memory addresses
Use check_and_adjust_load_address() name
Return relocated base instead of offset
Use PAGE_ALIGN
Don't load above max_phys (expected to be 4GB in kernel elf note)
Use single line comments
Exit check_load_address loop earlier
Add __init to find_kernel_memory()
---
 xen/arch/x86/hvm/dom0_build.c      | 108 +++++++++++++++++++++++++++++
 xen/common/libelf/libelf-dominfo.c |  13 ++++
 xen/include/public/elfnote.h       |  11 +++
 xen/include/xen/libelf.h           |   3 +
 4 files changed, 135 insertions(+)

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 0ceda4140b..5c6c0d2db3 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -537,6 +537,108 @@ static paddr_t __init find_memory(
     return INVALID_PADDR;
 }
 
+static bool __init check_load_address(
+    const struct domain *d, const struct elf_binary *elf)
+{
+    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
+    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
+    unsigned int i;
+
+    /*
+     * The memory map is sorted and all RAM regions starts and sizes are
+     * aligned to page boundaries.
+     */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
+
+        if ( start >= kernel_end )
+            return false;
+
+        if ( start <= kernel_start &&
+             end >= kernel_end &&
+             d->arch.e820[i].type == E820_RAM )
+            return true;
+    }
+
+    return false;
+}
+
+/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
+static paddr_t __init find_kernel_memory(
+    const struct domain *d, struct elf_binary *elf,
+    const struct elf_dom_parms *parms)
+{
+    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
+    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
+    paddr_t kernel_size = kernel_end - kernel_start;
+    unsigned int i;
+
+    /*
+     * The memory map is sorted and all RAM regions starts and sizes are
+     * aligned to page boundaries.
+     */
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
+        paddr_t kstart, kend;
+
+        if ( d->arch.e820[i].type != E820_RAM )
+            continue;
+
+        if ( d->arch.e820[i].size < kernel_size )
+            continue;
+
+        kstart = ROUNDUP(start, parms->phys_align);
+        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;
+        kend = kstart + kernel_size;
+
+        if ( kend > parms->phys_max )
+            return 0;
+
+        if ( kend <= end )
+            return kstart;
+    }
+
+    return 0;
+}
+
+/* Check the kernel load address, and adjust if necessary and possible. */
+static bool __init check_and_adjust_load_address(
+    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
+{
+    paddr_t reloc_base;
+
+    if ( check_load_address(d, elf) )
+        return true;
+
+    if ( parms->phys_align == UNSET_ADDR )
+    {
+        printk("Address conflict and %pd kernel is not relocatable\n", d);
+        return false;
+    }
+
+    reloc_base = find_kernel_memory(d, elf, parms);
+    if ( reloc_base == 0 )
+    {
+        printk("Failed find a load address for the kernel\n");
+        return false;
+    }
+
+    if ( opt_dom0_verbose )
+        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
+               (paddr_t)elf->dest_base,
+               (paddr_t)elf->dest_base + elf->dest_size,
+               reloc_base, reloc_base + elf->dest_size);
+
+    parms->phys_entry += (reloc_base - (paddr_t)elf->dest_base);
+    elf->dest_base = (char *)reloc_base;
+
+    return true;
+}
+
 static int __init pvh_load_kernel(struct domain *d, const module_t *image,
                                   unsigned long image_headroom,
                                   module_t *initrd, void *image_base,
@@ -585,6 +687,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
     elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
     elf.dest_size = parms.virt_kend - parms.virt_kstart;
 
+    if ( !check_and_adjust_load_address(d, &elf, &parms) )
+    {
+        printk("Unable to load kernel\n");
+        return -ENOMEM;
+    }
+
     elf_set_vcpu(&elf, v);
     rc = elf_load_binary(&elf);
     if ( rc < 0 )
diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
index 7cc7b18a51..837a1b0f21 100644
--- a/xen/common/libelf/libelf-dominfo.c
+++ b/xen/common/libelf/libelf-dominfo.c
@@ -125,6 +125,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
         [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
         [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
         [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
+        [XEN_ELFNOTE_PVH_RELOCATION] = { "PVH_RELOCATION", ELFNOTE_OTHER },
     };
 /* *INDENT-ON* */
 
@@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
                 elf_note_numeric_array(elf, note, 8, 0),
                 elf_note_numeric_array(elf, note, 8, 1));
         break;
+
+    case XEN_ELFNOTE_PVH_RELOCATION:
+        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
+            return -1;
+
+        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
+        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
+        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
+        elf_msg(elf, "min: %#"PRIx64" max: %#"PRIx64" align: %#"PRIx64"\n",
+                parms->phys_min, parms->phys_max, parms->phys_align);
+        break;
     }
     return 0;
 }
@@ -545,6 +557,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
     parms->p2m_base = UNSET_ADDR;
     parms->elf_paddr_offset = UNSET_ADDR;
     parms->phys_entry = UNSET_ADDR32;
+    parms->phys_align = UNSET_ADDR;
 
     /* Find and parse elf notes. */
     count = elf_phdr_count(elf);
diff --git a/xen/include/public/elfnote.h b/xen/include/public/elfnote.h
index 8bf54d035b..70bb93499c 100644
--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -194,6 +194,17 @@
  */
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * Used to place constraints on the guest physical loading addresses and
+ * alignment for a PVH kernel.  This note's value is 3 64bit values in
+ * the following order: minimum, maximum and alignment.
+ *
+ * The presence of this note indicates the kernel is relocatable.
+ */
+#define XEN_ELFNOTE_PVH_RELOCATION 19
+
 /*
  * The number of the highest elfnote defined.
  */
diff --git a/xen/include/xen/libelf.h b/xen/include/xen/libelf.h
index 1c77e3df31..d55a839411 100644
--- a/xen/include/xen/libelf.h
+++ b/xen/include/xen/libelf.h
@@ -431,6 +431,9 @@ struct elf_dom_parms {
     uint64_t virt_hv_start_low;
     uint64_t p2m_base;
     uint64_t elf_paddr_offset;
+    uint64_t phys_min;
+    uint64_t phys_max;
+    uint64_t phys_align;
     uint32_t f_supported[XENFEAT_NR_SUBMAPS];
     uint32_t f_required[XENFEAT_NR_SUBMAPS];
     uint32_t phys_entry;
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64
  2024-03-13 19:30 [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk
                   ` (2 preceding siblings ...)
  2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
@ 2024-03-13 19:46 ` Jason Andryuk
  3 siblings, 0 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-13 19:46 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Roger Pau Monné, Andrew Cooper, Juergen Gross,
	Boris Ostrovsky, Jason Andryuk

The Xen PVH entrypoint is 32bit non-PIC code running at a default load
address of 0x1000000 (16MB) (CONFIG_PHYSICAL_START).  Xen loads the
kernel at that physical address inside the PVH container.

When running a PVH Dom0, the system reserved addresses are mapped 1-1
into the PVH container.  There exist system firmwares (Coreboot/EDK2)
with reserved memory at 16MB.  This creates a conflict where the PVH
kernel cannot be loaded at that address.

Modify the PVH entrypoint to be position-indepedent to allow flexibility
in load address.  Only the 64bit entry path is converted.  A 32bit
kernel is not PIC, so calling into other parts of the kernel, like
xen_prepare_pvh() and mk_pgtable_32(), don't work properly when
relocated.

Initial PVH entry runs at the physical addresses and then transitions to
the identity mapped address.  While executing xen_prepare_pvh() calls
through pv_ops function pointers transition to the high mapped
addresses.  Additionally, __va() is called on some hvm_start_info
physical addresses, we need the directmap address range is used.  So we
need to run page tables with all of those ranges mapped.

Modifying init_top_pgt tables ran into issue since
startup_64/__startup_64() will modify those page tables again.  Use a
dedicated set of page tables - pvh_init_top_pgt  - for the PVH entry to
avoid unwanted interactions.

In xen_pvh_init(), __pa() is called to find the physical address of the
hypercall page.  Set phys_base temporarily before calling into
xen_prepare_pvh(), which calls xen_pvh_init(), and clear it afterwards.
__startup_64() assumes phys_base is zero and adds load_delta to it.  If
phys_base is already set, the calculation results in an incorrect cr3.

TODO: Sync elfnote.h from xen.git commit xxxxxxxxxx when it is
commited.

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
---
Put this out as an example for the Xen modifications

Instead of setting and clearing phys_base, add a dedicated variable?
Clearing phys_base is a little weird, but it leaves the kernel more
consistent when running non-entry code.

Make __startup_64() exit if phys_base is already set to allow calling
multiple times, and use that and init_top_pgt instead of adding
additional page tables?  That won't work.  __startup_64 is 64bit code,
and pvh needs to create page tables in 32bit code before it can
transition to 64bit long mode.  Hence it can't be re-used to relocate
page tables.
---
 arch/x86/platform/pvh/head.S    | 182 ++++++++++++++++++++++++++++++--
 include/xen/interface/elfnote.h |  10 ++
 2 files changed, 181 insertions(+), 11 deletions(-)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index f7235ef87bc3..cfb7fdc8c784 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -50,11 +50,32 @@
 #define PVH_CS_SEL		(PVH_GDT_ENTRY_CS * 8)
 #define PVH_DS_SEL		(PVH_GDT_ENTRY_DS * 8)
 
+#define rva(x) ((x) - pvh_start_xen)
+
 SYM_CODE_START_LOCAL(pvh_start_xen)
 	UNWIND_HINT_END_OF_STACK
 	cld
 
-	lgdt (_pa(gdt))
+	/*
+	 * See the comment for startup_32 for more details.  We need to
+	 * execute a call to get the execution address to be position
+	 * independent, but we don't have a stack.  Save and restore the
+	 * magic field of start_info in ebx, and use that as the stack.
+	 */
+	mov	(%ebx), %eax
+	leal	4(%ebx), %esp
+	ANNOTATE_INTRA_FUNCTION_CALL
+	call	1f
+1:	popl	%ebp
+	mov	%eax, (%ebx)
+	subl	$ rva(1b), %ebp
+	movl	$0, %esp
+
+	leal	rva(gdt)(%ebp), %eax
+	movl	%eax, %ecx
+	leal	rva(gdt_start)(%ebp), %ecx
+	movl	%ecx, 2(%eax)
+	lgdt	(%eax)
 
 	mov $PVH_DS_SEL,%eax
 	mov %eax,%ds
@@ -62,14 +83,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	mov %eax,%ss
 
 	/* Stash hvm_start_info. */
-	mov $_pa(pvh_start_info), %edi
+	leal rva(pvh_start_info)(%ebp), %edi
 	mov %ebx, %esi
-	mov _pa(pvh_start_info_sz), %ecx
+	movl rva(pvh_start_info_sz)(%ebp), %ecx
 	shr $2,%ecx
 	rep
 	movsl
 
-	mov $_pa(early_stack_end), %esp
+	leal rva(early_stack_end)(%ebp), %esp
 
 	/* Enable PAE mode. */
 	mov %cr4, %eax
@@ -83,29 +104,83 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 	btsl $_EFER_LME, %eax
 	wrmsr
 
+	mov %ebp, %ebx
+	subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
+	jz .Lpagetable_done
+
+	/* Fixup page-tables for relocation. */
+	leal rva(pvh_init_top_pgt)(%ebp), %edi
+	movl $512, %ecx
+2:
+	movl 0x00(%edi), %eax
+	addl 0x04(%edi), %eax
+	jz 1f
+	addl %ebx, 0x00(%edi)
+1:
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+	/* L3 ident has a single entry. */
+	leal rva(pvh_level3_ident_pgt)(%ebp), %edi
+	addl %ebx, 0x00(%edi)
+
+	leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
+	addl %ebx, (4096 - 16)(%edi)
+	addl %ebx, (4096 - 8)(%edi)
+
+	/* pvh_level2_ident_pgt is fine - large pages */
+
+	/* pvh_level2_kernel_pgt needs adjustment - large pages */
+	leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
+	movl $512, %ecx
+2:
+	movl 0x00(%edi), %eax
+	addl 0x04(%edi), %eax
+	jz 1f
+	addl %ebx, 0x00(%edi)
+1:
+	addl $8, %edi
+	decl %ecx
+	jnz 2b
+
+.Lpagetable_done:
 	/* Enable pre-constructed page tables. */
-	mov $_pa(init_top_pgt), %eax
+	leal rva(pvh_init_top_pgt)(%ebp), %eax
 	mov %eax, %cr3
 	mov $(X86_CR0_PG | X86_CR0_PE), %eax
 	mov %eax, %cr0
 
 	/* Jump to 64-bit mode. */
-	ljmp $PVH_CS_SEL, $_pa(1f)
+	pushl $PVH_CS_SEL
+	leal  rva(1f)(%ebp), %eax
+	pushl %eax
+	lretl
 
 	/* 64-bit entry point. */
 	.code64
 1:
 	/* Set base address in stack canary descriptor. */
 	mov $MSR_GS_BASE,%ecx
-	mov $_pa(canary), %eax
+	leal rva(canary)(%ebp), %eax
 	xor %edx, %edx
 	wrmsr
 
+	/* Calculate load offset from LOAD_PHYSICAL_ADDR and store in
+	 * phys_base.  __pa() needs phys_base set to calculate the the
+	 * hypercall page in xen_pvh_init(). */
+	movq %rbp, %rbx
+	subq $LOAD_PHYSICAL_ADDR, %rbx
+	movq %rbx, phys_base(%rip)
 	call xen_prepare_pvh
+	/* Clear phys_base.  startup_64/__startup_64 will *add* to its value,
+	   so start from 0. */
+	xor  %rbx, %rbx
+	movq %rbx, phys_base(%rip)
 
 	/* startup_64 expects boot_params in %rsi. */
-	mov $_pa(pvh_bootparams), %rsi
-	mov $_pa(startup_64), %rax
+	lea rva(pvh_bootparams)(%ebp), %rsi
+	lea rva(startup_64)(%ebp), %rax
 	ANNOTATE_RETPOLINE_SAFE
 	jmp *%rax
 
@@ -137,13 +212,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
 
 	ljmp $PVH_CS_SEL, $_pa(startup_32)
 #endif
+
 SYM_CODE_END(pvh_start_xen)
 
 	.section ".init.data","aw"
 	.balign 8
 SYM_DATA_START_LOCAL(gdt)
-	.word gdt_end - gdt_start
-	.long _pa(gdt_start)
+	.word gdt_end - gdt_start - 1
+	.long _pa(gdt_start) /* x86-64 will overwrite if relocated. */
 	.word 0
 SYM_DATA_END(gdt)
 SYM_DATA_START_LOCAL(gdt_start)
@@ -163,5 +239,89 @@ SYM_DATA_START_LOCAL(early_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
 SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
 
+#ifdef CONFIG_X86_64
+/*
+ * We are not able to switch in one step to the final KERNEL ADDRESS SPACE
+ * because we need identity-mapped pages.
+ */
+#define l4_index(x)     (((x) >> 39) & 511)
+#define pud_index(x)    (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))
+
+L4_PAGE_OFFSET  = l4_index(__PAGE_OFFSET_BASE_L4)
+L4_START_KERNEL = l4_index(__START_KERNEL_map)
+L3_START_KERNEL = pud_index(__START_KERNEL_map)
+
+#define SYM_DATA_START_PAGE_ALIGNED(name)			\
+	SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE)
+
+/* Automate the creation of 1 to 1 mapping pmd entries */
+#define PMDS(START, PERM, COUNT)			\
+	i = 0 ;						\
+	.rept (COUNT) ;					\
+	.quad	(START) + (i << PMD_SHIFT) + (PERM) ;	\
+	i = i + 1 ;					\
+	.endr
+
+/*
+ * Xen PVH needs a set of identity mapped and kernel high mapping
+ * page tables.  pvh_start_xen starts running on the identity mapped
+ * page tables, but xen_prepare_pvh calls into the high mapping.
+ * These page tables need to be relocatable and are only used until
+ * startup_64 transitions to init_top_pgt.
+ */
+SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
+	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.org    pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0
+	.quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.org    pvh_init_top_pgt + L4_START_KERNEL*8, 0
+	/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+	.quad   pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+SYM_DATA_END(pvh_init_top_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
+	.quad	pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.fill	511, 8, 0
+SYM_DATA_END(pvh_level3_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
+	/*
+	 * Since I easily can, map the first 1G.
+	 * Don't set NX because code runs from these pages.
+	 *
+	 * Note: This sets _PAGE_GLOBAL despite whether
+	 * the CPU supports it or it is enabled.  But,
+	 * the CPU should ignore the bit.
+	 */
+	PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD)
+SYM_DATA_END(pvh_level2_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt)
+	.fill	L3_START_KERNEL,8,0
+	/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
+	.quad	pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+	.quad	0 /* no fixmap */
+SYM_DATA_END(pvh_level3_kernel_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt)
+	/*
+	 * Kernel high mapping.
+	 *
+	 * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
+	 * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
+	 * 512 MiB otherwise.
+	 *
+	 * (NOTE: after that starts the module area, see MODULES_VADDR.)
+	 *
+	 * This table is eventually used by the kernel during normal runtime.
+	 * Care must be taken to clear out undesired bits later, like _PAGE_RW
+	 * or _PAGE_GLOBAL in some cases.
+	 */
+	PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
+SYM_DATA_END(pvh_level2_kernel_pgt)
+
+	ELFNOTE(Xen, XEN_ELFNOTE_PVH_RELOCATION,
+		     .quad LOAD_PHYSICAL_ADDR;
+		     .quad 0xffffffff;
+		     .quad CONFIG_PHYSICAL_ALIGN)
+#endif
+
 	ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY,
 	             _ASM_PTR (pvh_start_xen - __START_KERNEL_map))
diff --git a/include/xen/interface/elfnote.h b/include/xen/interface/elfnote.h
index 38deb1214613..f2b418db4261 100644
--- a/include/xen/interface/elfnote.h
+++ b/include/xen/interface/elfnote.h
@@ -185,6 +185,16 @@
  */
 #define XEN_ELFNOTE_PHYS32_ENTRY 18
 
+/*
+ * Physical loading constraints for PVH kernels
+ *
+ * Used to place constraints on the guest physical loading addresses and
+ * alignment for a PVH kernel.  This note's value is 3 64bit values in
+ * the following order: minimum, maximum and alignment.
+ *
+ * The presence of this note indicates the kernel is relocatable.
+ */
+#define XEN_ELFNOTE_PVH_RELOCATION 19
 /*
  * The number of the highest elfnote defined.
  */
-- 
2.44.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
@ 2024-03-13 21:02   ` Jason Andryuk
  2024-03-14  7:12     ` Jan Beulich
  2024-03-14  9:48   ` Roger Pau Monné
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-13 21:02 UTC (permalink / raw)
  To: xen-devel
  Cc: Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu,
	George Dunlap, Julien Grall, Stefano Stabellini

On 2024-03-13 15:30, Jason Andryuk wrote:
> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
> +static paddr_t __init find_kernel_memory(
> +    const struct domain *d, struct elf_binary *elf,
> +    const struct elf_dom_parms *parms)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> +    paddr_t kernel_size = kernel_end - kernel_start;
> +    unsigned int i;
> +
> +    /*
> +     * The memory map is sorted and all RAM regions starts and sizes are
> +     * aligned to page boundaries.
> +     */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> +        paddr_t kstart, kend;
> +
> +        if ( d->arch.e820[i].type != E820_RAM )
> +            continue;
> +
> +        if ( d->arch.e820[i].size < kernel_size )
> +            continue;
> +
> +        kstart = ROUNDUP(start, parms->phys_align);
> +        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;

This should be
         kstart = MAX(kstart, parms->phys_min);

Regards,
Jason

> +        kend = kstart + kernel_size;
> +
> +        if ( kend > parms->phys_max )
> +            return 0;
> +
> +        if ( kend <= end )
> +            return kstart;
> +    }
> +
> +    return 0;
> +}


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-13 19:30 ` [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
@ 2024-03-14  7:11   ` Jan Beulich
  2024-03-14 13:01     ` Jason Andryuk
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-14  7:11 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Andrew Cooper, Roger Pau Monné, xen-devel, Wei Liu

On 13.03.2024 20:30, Jason Andryuk wrote:
> A new ELF note will specify the alignment for a relocatable PVH kernel.
> ELF notes are suitable for vmlinux and other ELF files, so this
> Linux-specific bzImage parsing in unnecessary.

Hmm, shouldn't the order of attempts to figure the alignment be ELF note,
ELF header, and then 2Mb?

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-13 21:02   ` Jason Andryuk
@ 2024-03-14  7:12     ` Jan Beulich
  2024-03-14 12:46       ` Jason Andryuk
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-14  7:12 UTC (permalink / raw)
  To: Jason Andryuk, xen-devel
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini

On 13.03.2024 22:02, Jason Andryuk wrote:
> On 2024-03-13 15:30, Jason Andryuk wrote:
>> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
>> +static paddr_t __init find_kernel_memory(
>> +    const struct domain *d, struct elf_binary *elf,
>> +    const struct elf_dom_parms *parms)
>> +{
>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>> +    paddr_t kernel_size = kernel_end - kernel_start;
>> +    unsigned int i;
>> +
>> +    /*
>> +     * The memory map is sorted and all RAM regions starts and sizes are
>> +     * aligned to page boundaries.
>> +     */
>> +    for ( i = 0; i < d->arch.nr_e820; i++ )
>> +    {
>> +        paddr_t start = d->arch.e820[i].addr;
>> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
>> +        paddr_t kstart, kend;
>> +
>> +        if ( d->arch.e820[i].type != E820_RAM )
>> +            continue;
>> +
>> +        if ( d->arch.e820[i].size < kernel_size )
>> +            continue;
>> +
>> +        kstart = ROUNDUP(start, parms->phys_align);
>> +        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;
> 
> This should be
>          kstart = MAX(kstart, parms->phys_min);

Except that MAX() really ought to be the last resort; max() and max_t()
want considering in preference.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
  2024-03-13 21:02   ` Jason Andryuk
@ 2024-03-14  9:48   ` Roger Pau Monné
  2024-03-14 13:51     ` Jason Andryuk
  2024-03-14 13:21   ` Jan Beulich
  2024-03-14 13:31   ` Jan Beulich
  3 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-14  9:48 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini

On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
> Xen tries to load a PVH dom0 kernel at the fixed guest physical address
> from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
> it can be configured.
> 
> Unfortunately there exist firmwares that have reserved regions at this
> address, so Xen fails to load the dom0 kernel since it's not RAM.
> 
> The PVH entry code is not relocatable - it loads from absolute
> addresses, which fail when the kernel is loaded at a different address.
> With a suitably modified kernel, a reloctable entry point is possible.
> 
> Add XEN_ELFNOTE_PVH_RELOCATION which specifies the minimum, maximum and
> alignment needed for the kernel.  The presence of the NOTE indicates the
> kernel supports a relocatable entry path.
> 
> Change the loading to check for an acceptable load address.  If the
> kernel is relocatable, support finding an alternate load address.
> 
> Link: https://gitlab.com/xen-project/xen/-/issues/180
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> ELF Note printing looks like:
> (XEN) ELF: note: PVH_RELOCATION = min: 0x1000000 max: 0xffffffff align: 0x200000
> 
> v2:
> Use elfnote for min, max & align - use 64bit values.
> Print original and relocated memory addresses
> Use check_and_adjust_load_address() name
> Return relocated base instead of offset
> Use PAGE_ALIGN
> Don't load above max_phys (expected to be 4GB in kernel elf note)
> Use single line comments
> Exit check_load_address loop earlier
> Add __init to find_kernel_memory()
> ---
>  xen/arch/x86/hvm/dom0_build.c      | 108 +++++++++++++++++++++++++++++
>  xen/common/libelf/libelf-dominfo.c |  13 ++++
>  xen/include/public/elfnote.h       |  11 +++
>  xen/include/xen/libelf.h           |   3 +
>  4 files changed, 135 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index 0ceda4140b..5c6c0d2db3 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>      return INVALID_PADDR;
>  }
>  
> +static bool __init check_load_address(
> +    const struct domain *d, const struct elf_binary *elf)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;

Are you sure this is correct?  If a program header specifies a non-4K
aligned load address we should still try to honor it.  I think this is
very unlikely, but still we shouldn't apply non-requested alignments
to addresses coming from the ELF headers.

> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> +    unsigned int i;
> +
> +    /*
> +     * The memory map is sorted and all RAM regions starts and sizes are
> +     * aligned to page boundaries.

Relying on sizes to be page aligned seems fragile: it might work now
because of the order in which pvh_setup_vmx_realmode_helpers() first
reserves memory for the TSS and afterwards for the identity page
tables, but it's not a property this code should assume.

> +     */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> +
> +        if ( start >= kernel_end )
> +            return false;
> +
> +        if ( start <= kernel_start &&
> +             end >= kernel_end &&
> +             d->arch.e820[i].type == E820_RAM )

Nit: I would maybe do the type check before the boundary ones, as it's
pointless to do boundary checking on a region of a non-RAM type.  But
I guess you could also see it the other way around.

> +            return true;
> +    }
> +
> +    return false;
> +}
> +
> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
> +static paddr_t __init find_kernel_memory(
> +    const struct domain *d, struct elf_binary *elf,

const for elf also.

> +    const struct elf_dom_parms *parms)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> +    paddr_t kernel_size = kernel_end - kernel_start;

Hm, I'm again unsure about the alignments applied here.

I think if anything we want to assert that dest_base is aligned to phys_align.

> +    unsigned int i;
> +
> +    /*
> +     * The memory map is sorted and all RAM regions starts and sizes are
> +     * aligned to page boundaries.
> +     */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> +        paddr_t kstart, kend;
> +
> +        if ( d->arch.e820[i].type != E820_RAM )
> +            continue;
> +
> +        if ( d->arch.e820[i].size < kernel_size )
> +            continue;

You can unify both checks in a single condition.

> +
> +        kstart = ROUNDUP(start, parms->phys_align);
> +        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;
> +        kend = kstart + kernel_size;
> +
> +        if ( kend > parms->phys_max )
> +            return 0;
> +
> +        if ( kend <= end )
> +            return kstart;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Check the kernel load address, and adjust if necessary and possible. */
> +static bool __init check_and_adjust_load_address(
> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
> +{
> +    paddr_t reloc_base;
> +
> +    if ( check_load_address(d, elf) )
> +        return true;
> +
> +    if ( parms->phys_align == UNSET_ADDR )
> +    {
> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
> +        return false;
> +    }
> +
> +    reloc_base = find_kernel_memory(d, elf, parms);
> +    if ( reloc_base == 0 )
> +    {
> +        printk("Failed find a load address for the kernel\n");

Since you print the domain in the error message prior to this one, I
would consider also printing it here (or not printing it in both
cases).

> +        return false;
> +    }
> +
> +    if ( opt_dom0_verbose )
> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
> +               (paddr_t)elf->dest_base,
> +               (paddr_t)elf->dest_base + elf->dest_size,
> +               reloc_base, reloc_base + elf->dest_size);

I think you need `- 1` for the end calculation if you use inclusive
ranges [, ].  Otherwise [, ) should be used.

> +
> +    parms->phys_entry += (reloc_base - (paddr_t)elf->dest_base);

This seems to assume that the image is always relocated at a higher
address that the original one?

parms->phys_entry = reloc_base + (parms->phys_entry - elf->dest_base);

> +    elf->dest_base = (char *)reloc_base;
> +
> +    return true;
> +}
> +
>  static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>                                    unsigned long image_headroom,
>                                    module_t *initrd, void *image_base,
> @@ -585,6 +687,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>      elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
>      elf.dest_size = parms.virt_kend - parms.virt_kstart;
>  
> +    if ( !check_and_adjust_load_address(d, &elf, &parms) )
> +    {
> +        printk("Unable to load kernel\n");

check_and_adjust_load_address() already prints an error message,
probably no need to print another message.

> +        return -ENOMEM;
> +    }
> +
>      elf_set_vcpu(&elf, v);
>      rc = elf_load_binary(&elf);
>      if ( rc < 0 )
> diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
> index 7cc7b18a51..837a1b0f21 100644
> --- a/xen/common/libelf/libelf-dominfo.c
> +++ b/xen/common/libelf/libelf-dominfo.c
> @@ -125,6 +125,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>          [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
>          [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
>          [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
> +        [XEN_ELFNOTE_PVH_RELOCATION] = { "PVH_RELOCATION", ELFNOTE_OTHER },
>      };
>  /* *INDENT-ON* */
>  
> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>                  elf_note_numeric_array(elf, note, 8, 0),
>                  elf_note_numeric_array(elf, note, 8, 1));
>          break;
> +
> +    case XEN_ELFNOTE_PVH_RELOCATION:
> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
> +            return -1;
> +
> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);

Size for those needs to be 4 (32bits) as the entry point is in 32bit
mode?  I don't see how we can start past the 4GB boundary.

> +        elf_msg(elf, "min: %#"PRIx64" max: %#"PRIx64" align: %#"PRIx64"\n",
> +                parms->phys_min, parms->phys_max, parms->phys_align);
> +        break;
>      }
>      return 0;
>  }
> @@ -545,6 +557,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
>      parms->p2m_base = UNSET_ADDR;
>      parms->elf_paddr_offset = UNSET_ADDR;
>      parms->phys_entry = UNSET_ADDR32;
> +    parms->phys_align = UNSET_ADDR;

For correctness I would also init phys_{min,max}.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14  7:12     ` Jan Beulich
@ 2024-03-14 12:46       ` Jason Andryuk
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 12:46 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini

On 2024-03-14 03:12, Jan Beulich wrote:
> On 13.03.2024 22:02, Jason Andryuk wrote:
>> On 2024-03-13 15:30, Jason Andryuk wrote:
>>> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
>>> +static paddr_t __init find_kernel_memory(
>>> +    const struct domain *d, struct elf_binary *elf,
>>> +    const struct elf_dom_parms *parms)
>>> +{
>>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>>> +    paddr_t kernel_size = kernel_end - kernel_start;
>>> +    unsigned int i;
>>> +
>>> +    /*
>>> +     * The memory map is sorted and all RAM regions starts and sizes are
>>> +     * aligned to page boundaries.
>>> +     */
>>> +    for ( i = 0; i < d->arch.nr_e820; i++ )
>>> +    {
>>> +        paddr_t start = d->arch.e820[i].addr;
>>> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
>>> +        paddr_t kstart, kend;
>>> +
>>> +        if ( d->arch.e820[i].type != E820_RAM )
>>> +            continue;
>>> +
>>> +        if ( d->arch.e820[i].size < kernel_size )
>>> +            continue;
>>> +
>>> +        kstart = ROUNDUP(start, parms->phys_align);
>>> +        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;
>>
>> This should be
>>           kstart = MAX(kstart, parms->phys_min);
> 
> Except that MAX() really ought to be the last resort; max() and max_t()
> want considering in preference.

Yes, thanks.  max() will do.

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-14  7:11   ` Jan Beulich
@ 2024-03-14 13:01     ` Jason Andryuk
  2024-03-14 13:33       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 13:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Roger Pau Monné, xen-devel, Wei Liu

On 2024-03-14 03:11, Jan Beulich wrote:
> On 13.03.2024 20:30, Jason Andryuk wrote:
>> A new ELF note will specify the alignment for a relocatable PVH kernel.
>> ELF notes are suitable for vmlinux and other ELF files, so this
>> Linux-specific bzImage parsing in unnecessary.
> 
> Hmm, shouldn't the order of attempts to figure the alignment be ELF note,
> ELF header, and then 2Mb?

This patch series makes Xen only relocate when the ELF Note is present. 
Linux PVH entry points today are not relocatable, so that needs to be 
specified some way.  This new note also includes the alignment, so there 
is no need to try other means of obtaining the alignment.

Also, the ELF header values didn't change with CONFIG_PHYSICAL_ALIGN, so 
that didn't seem reliable.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/3] libelf: Expand ELF note printing
  2024-03-13 19:30 ` [PATCH v2 2/3] libelf: Expand ELF note printing Jason Andryuk
@ 2024-03-14 13:16   ` Jan Beulich
  2024-03-14 20:36     ` Jason Andryuk
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 13:16 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On 13.03.2024 20:30, Jason Andryuk wrote:
> @@ -217,6 +225,15 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>      case XEN_ELFNOTE_PHYS32_ENTRY:
>          parms->phys_entry = val;
>          break;
> +
> +    case XEN_ELFNOTE_L1_MFN_VALID:
> +        if ( elf_uval(elf, note, descsz) != 2 * sizeof(uint64_t) )
> +            return -1;

elf_note_numeric() use sites don't have such a check. Why would we need
one here, and even more so causing a error to be raised when in reality
the supplied values (still) aren't consumed? Furthermore the documentation
says "pairs" (plural) for a reason. Finally maddr_t-sized only happens to
mean uint64_t on all architectures we presently support.

Jan

> +        elf_msg(elf, "mask: %#"PRIx64" val: %#"PRIx64"\n",
> +                elf_note_numeric_array(elf, note, 8, 0),
> +                elf_note_numeric_array(elf, note, 8, 1));
> +        break;
>      }
>      return 0;
>  }



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
  2024-03-13 21:02   ` Jason Andryuk
  2024-03-14  9:48   ` Roger Pau Monné
@ 2024-03-14 13:21   ` Jan Beulich
  2024-03-14 14:13     ` Jason Andryuk
  2024-03-14 13:31   ` Jan Beulich
  3 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 13:21 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 13.03.2024 20:30, Jason Andryuk wrote:
> --- a/xen/include/public/elfnote.h
> +++ b/xen/include/public/elfnote.h
> @@ -194,6 +194,17 @@
>   */
>  #define XEN_ELFNOTE_PHYS32_ENTRY 18
>  
> +/*
> + * Physical loading constraints for PVH kernels
> + *
> + * Used to place constraints on the guest physical loading addresses and
> + * alignment for a PVH kernel.  This note's value is 3 64bit values in
> + * the following order: minimum, maximum and alignment.

Along the lines of what I said on another sub-thread, I think at least
alignment wants to be optional here. Perhaps, with max going first, min
could also be optional.

As indicated in different context by Roger, the values being uniformly
64-bit ones also is questionable.

> + * The presence of this note indicates the kernel is relocatable.

I think it wants making explicit here that the act of relocating is still
left to the kernel.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
                     ` (2 preceding siblings ...)
  2024-03-14 13:21   ` Jan Beulich
@ 2024-03-14 13:31   ` Jan Beulich
  2024-03-14 19:19     ` Jason Andryuk
  3 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 13:31 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 13.03.2024 20:30, Jason Andryuk wrote:
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>      return INVALID_PADDR;
>  }
>  
> +static bool __init check_load_address(
> +    const struct domain *d, const struct elf_binary *elf)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);

Both casts act on a pointer value. Such cannot legitimately be converted
to paddr_t; it only so happens that paddr_t is effectively the same as
uintptr_t right now. (Same issue again further down.) That said, I notice
we have pre-existing examples of this ...

> +/* Check the kernel load address, and adjust if necessary and possible. */
> +static bool __init check_and_adjust_load_address(
> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
> +{
> +    paddr_t reloc_base;
> +
> +    if ( check_load_address(d, elf) )
> +        return true;
> +
> +    if ( parms->phys_align == UNSET_ADDR )
> +    {
> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
> +        return false;
> +    }
> +
> +    reloc_base = find_kernel_memory(d, elf, parms);
> +    if ( reloc_base == 0 )
> +    {
> +        printk("Failed find a load address for the kernel\n");
> +        return false;
> +    }
> +
> +    if ( opt_dom0_verbose )
> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
> +               (paddr_t)elf->dest_base,
> +               (paddr_t)elf->dest_base + elf->dest_size,

By using %p you clearly can avoid the casts here.

> +               reloc_base, reloc_base + elf->dest_size);

I'm not convinced %lx is really appropriate for paddr_t.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment"
  2024-03-14 13:01     ` Jason Andryuk
@ 2024-03-14 13:33       ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 13:33 UTC (permalink / raw)
  To: Jason Andryuk; +Cc: Andrew Cooper, Roger Pau Monné, xen-devel, Wei Liu

On 14.03.2024 14:01, Jason Andryuk wrote:
> On 2024-03-14 03:11, Jan Beulich wrote:
>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>> A new ELF note will specify the alignment for a relocatable PVH kernel.
>>> ELF notes are suitable for vmlinux and other ELF files, so this
>>> Linux-specific bzImage parsing in unnecessary.
>>
>> Hmm, shouldn't the order of attempts to figure the alignment be ELF note,
>> ELF header, and then 2Mb?
> 
> This patch series makes Xen only relocate when the ELF Note is present. 
> Linux PVH entry points today are not relocatable, so that needs to be 
> specified some way.  This new note also includes the alignment, so there 
> is no need to try other means of obtaining the alignment.

Going yet beyond what I said in reply to the patch: The mere presence of
the note, with no data at all, could be sufficient to indicate the
kernel is relocatable, when no other restrictions (min, max, align) apply.

> Also, the ELF header values didn't change with CONFIG_PHYSICAL_ALIGN, so 
> that didn't seem reliable.

Well, that's an observation for Linux. But the interface here ought to be
OS-agnostic.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14  9:48   ` Roger Pau Monné
@ 2024-03-14 13:51     ` Jason Andryuk
  2024-03-14 14:33       ` Roger Pau Monné
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 13:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini

On 2024-03-14 05:48, Roger Pau Monné wrote:
> On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
>> Xen tries to load a PVH dom0 kernel at the fixed guest physical address
>> from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
>> it can be configured.
>>
>> Unfortunately there exist firmwares that have reserved regions at this
>> address, so Xen fails to load the dom0 kernel since it's not RAM.
>>
>> The PVH entry code is not relocatable - it loads from absolute
>> addresses, which fail when the kernel is loaded at a different address.
>> With a suitably modified kernel, a reloctable entry point is possible.
>>
>> Add XEN_ELFNOTE_PVH_RELOCATION which specifies the minimum, maximum and
>> alignment needed for the kernel.  The presence of the NOTE indicates the
>> kernel supports a relocatable entry path.
>>
>> Change the loading to check for an acceptable load address.  If the
>> kernel is relocatable, support finding an alternate load address.
>>
>> Link: https://gitlab.com/xen-project/xen/-/issues/180
>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
>> ---
>> ELF Note printing looks like:
>> (XEN) ELF: note: PVH_RELOCATION = min: 0x1000000 max: 0xffffffff align: 0x200000
>>
>> v2:
>> Use elfnote for min, max & align - use 64bit values.
>> Print original and relocated memory addresses
>> Use check_and_adjust_load_address() name
>> Return relocated base instead of offset
>> Use PAGE_ALIGN
>> Don't load above max_phys (expected to be 4GB in kernel elf note)
>> Use single line comments
>> Exit check_load_address loop earlier
>> Add __init to find_kernel_memory()
>> ---
>>   xen/arch/x86/hvm/dom0_build.c      | 108 +++++++++++++++++++++++++++++
>>   xen/common/libelf/libelf-dominfo.c |  13 ++++
>>   xen/include/public/elfnote.h       |  11 +++
>>   xen/include/xen/libelf.h           |   3 +
>>   4 files changed, 135 insertions(+)
>>
>> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
>> index 0ceda4140b..5c6c0d2db3 100644
>> --- a/xen/arch/x86/hvm/dom0_build.c
>> +++ b/xen/arch/x86/hvm/dom0_build.c
>> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>>       return INVALID_PADDR;
>>   }
>>   
>> +static bool __init check_load_address(
>> +    const struct domain *d, const struct elf_binary *elf)
>> +{
>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> 
> Are you sure this is correct?  If a program header specifies a non-4K
> aligned load address we should still try to honor it.  I think this is
> very unlikely, but still we shouldn't apply non-requested alignments
> to addresses coming from the ELF headers.

I think it's correct in terms of checking the e820 table.  Since the 
memory map is limited to 4k granularity, the bounds need to be rounded 
accordingly.

>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>> +    unsigned int i;
>> +
>> +    /*
>> +     * The memory map is sorted and all RAM regions starts and sizes are
>> +     * aligned to page boundaries.
> 
> Relying on sizes to be page aligned seems fragile: it might work now
> because of the order in which pvh_setup_vmx_realmode_helpers() first
> reserves memory for the TSS and afterwards for the identity page
> tables, but it's not a property this code should assume.

That can be removed.  It would just eliminate the early exit...

>> +     */
>> +    for ( i = 0; i < d->arch.nr_e820; i++ )
>> +    {
>> +        paddr_t start = d->arch.e820[i].addr;
>> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
>> +
>> +        if ( start >= kernel_end )
>> +            return false;

... here.

>> +        if ( start <= kernel_start &&
>> +             end >= kernel_end &&
>> +             d->arch.e820[i].type == E820_RAM )
> 
> Nit: I would maybe do the type check before the boundary ones, as it's
> pointless to do boundary checking on a region of a non-RAM type.  But
> I guess you could also see it the other way around.
> 
>> +            return true;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
>> +static paddr_t __init find_kernel_memory(
>> +    const struct domain *d, struct elf_binary *elf,
> 
> const for elf also.

Yes, thanks.

>> +    const struct elf_dom_parms *parms)
>> +{
>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>> +    paddr_t kernel_size = kernel_end - kernel_start;
> 
> Hm, I'm again unsure about the alignments applied here.

Same as above regarding 4k granularity.

> I think if anything we want to assert that dest_base is aligned to phys_align.

That would indicate the kernel image is inconsistent.

>> +    unsigned int i;
>> +
>> +    /*
>> +     * The memory map is sorted and all RAM regions starts and sizes are
>> +     * aligned to page boundaries.
>> +     */
>> +    for ( i = 0; i < d->arch.nr_e820; i++ )
>> +    {
>> +        paddr_t start = d->arch.e820[i].addr;
>> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
>> +        paddr_t kstart, kend;
>> +
>> +        if ( d->arch.e820[i].type != E820_RAM )
>> +            continue;
>> +
>> +        if ( d->arch.e820[i].size < kernel_size )
>> +            continue;
> 
> You can unify both checks in a single condition.

Ok.

>> +
>> +        kstart = ROUNDUP(start, parms->phys_align);
>> +        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;
>> +        kend = kstart + kernel_size;
>> +
>> +        if ( kend > parms->phys_max )
>> +            return 0;
>> +
>> +        if ( kend <= end )
>> +            return kstart;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* Check the kernel load address, and adjust if necessary and possible. */
>> +static bool __init check_and_adjust_load_address(
>> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
>> +{
>> +    paddr_t reloc_base;
>> +
>> +    if ( check_load_address(d, elf) )
>> +        return true;
>> +
>> +    if ( parms->phys_align == UNSET_ADDR )
>> +    {
>> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
>> +        return false;
>> +    }
>> +
>> +    reloc_base = find_kernel_memory(d, elf, parms);
>> +    if ( reloc_base == 0 )
>> +    {
>> +        printk("Failed find a load address for the kernel\n");
> 
> Since you print the domain in the error message prior to this one, I
> would consider also printing it here (or not printing it in both
> cases).

I'll add the %pd.

>> +        return false;
>> +    }
>> +
>> +    if ( opt_dom0_verbose )
>> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
>> +               (paddr_t)elf->dest_base,
>> +               (paddr_t)elf->dest_base + elf->dest_size,
>> +               reloc_base, reloc_base + elf->dest_size);
> 
> I think you need `- 1` for the end calculation if you use inclusive
> ranges [, ].  Otherwise [, ) should be used.

Ok, I'll do [, ] with the -1.

>> +
>> +    parms->phys_entry += (reloc_base - (paddr_t)elf->dest_base);
> 
> This seems to assume that the image is always relocated at a higher
> address that the original one?
> 
> parms->phys_entry = reloc_base + (parms->phys_entry - elf->dest_base);

Ok

>> +    elf->dest_base = (char *)reloc_base;
>> +
>> +    return true;
>> +}
>> +
>>   static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>>                                     unsigned long image_headroom,
>>                                     module_t *initrd, void *image_base,
>> @@ -585,6 +687,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>>       elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
>>       elf.dest_size = parms.virt_kend - parms.virt_kstart;
>>   
>> +    if ( !check_and_adjust_load_address(d, &elf, &parms) )
>> +    {
>> +        printk("Unable to load kernel\n");
> 
> check_and_adjust_load_address() already prints an error message,
> probably no need to print another message.

Ok

>> +        return -ENOMEM;
>> +    }
>> +
>>       elf_set_vcpu(&elf, v);
>>       rc = elf_load_binary(&elf);
>>       if ( rc < 0 )
>> diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
>> index 7cc7b18a51..837a1b0f21 100644
>> --- a/xen/common/libelf/libelf-dominfo.c
>> +++ b/xen/common/libelf/libelf-dominfo.c
>> @@ -125,6 +125,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>           [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
>>           [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
>>           [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
>> +        [XEN_ELFNOTE_PVH_RELOCATION] = { "PVH_RELOCATION", ELFNOTE_OTHER },
>>       };
>>   /* *INDENT-ON* */
>>   
>> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>                   elf_note_numeric_array(elf, note, 8, 0),
>>                   elf_note_numeric_array(elf, note, 8, 1));
>>           break;
>> +
>> +    case XEN_ELFNOTE_PVH_RELOCATION:
>> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
>> +            return -1;
>> +
>> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
>> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
>> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
> 
> Size for those needs to be 4 (32bits) as the entry point is in 32bit
> mode?  I don't see how we can start past the 4GB boundary.

I specified the note as 3x 64bit values.  It seemed simpler than trying 
to support both 32bit and 64bit depending on the kernel arch.  Also, 
just using 64bit provides room in case it is needed in the future.

Do you want the note to be changed to 3x 32bit values?

>> +        elf_msg(elf, "min: %#"PRIx64" max: %#"PRIx64" align: %#"PRIx64"\n",
>> +                parms->phys_min, parms->phys_max, parms->phys_align);
>> +        break;
>>       }
>>       return 0;
>>   }
>> @@ -545,6 +557,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
>>       parms->p2m_base = UNSET_ADDR;
>>       parms->elf_paddr_offset = UNSET_ADDR;
>>       parms->phys_entry = UNSET_ADDR32;
>> +    parms->phys_align = UNSET_ADDR;
> 
> For correctness I would also init phys_{min,max}.

There is a memset() out of context above to zero the structure.  I 
thought leaving them both 0 would be fine.

I chose to initialize phys_align as the 64bit UNSET_ADDR since that is 
clearly invalid...  Though we don't have any checking that phys_align is 
a power of 2.

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 13:21   ` Jan Beulich
@ 2024-03-14 14:13     ` Jason Andryuk
  2024-03-14 14:19       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 14:13 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-14 09:21, Jan Beulich wrote:
> On 13.03.2024 20:30, Jason Andryuk wrote:
>> --- a/xen/include/public/elfnote.h
>> +++ b/xen/include/public/elfnote.h
>> @@ -194,6 +194,17 @@
>>    */
>>   #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>   
>> +/*
>> + * Physical loading constraints for PVH kernels
>> + *
>> + * Used to place constraints on the guest physical loading addresses and
>> + * alignment for a PVH kernel.  This note's value is 3 64bit values in
>> + * the following order: minimum, maximum and alignment.
> 
> Along the lines of what I said on another sub-thread, I think at least
> alignment wants to be optional here. Perhaps, with max going first, min
> could also be optional.

Interesting idea.

> As indicated in different context by Roger, the values being uniformly
> 64-bit ones also is questionable.
> 
>> + * The presence of this note indicates the kernel is relocatable.
> 
> I think it wants making explicit here that the act of relocating is still
> left to the kernel.

Ok.

How is this for a new description?

"""
Physical loading constraints for PVH kernels

Used to place constraints on the guest physical loading addresses and 
alignment for a PVH kernel.

The presence of this note indicates the kernel supports relocating itself.

The note may include up to three 32bit values.
  - a maximum address for the entire image to be loaded below (default 
0xfffffff)
  - a minimum address for the start of the image (default 0)
  - a required start alignment (default 1)
"""

I think if we can agree on the ELF note, the rest will fall into place.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 14:13     ` Jason Andryuk
@ 2024-03-14 14:19       ` Jan Beulich
  2024-03-18 21:19         ` Jason Andryuk
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 14:19 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 14.03.2024 15:13, Jason Andryuk wrote:
> On 2024-03-14 09:21, Jan Beulich wrote:
>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>> --- a/xen/include/public/elfnote.h
>>> +++ b/xen/include/public/elfnote.h
>>> @@ -194,6 +194,17 @@
>>>    */
>>>   #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>   
>>> +/*
>>> + * Physical loading constraints for PVH kernels
>>> + *
>>> + * Used to place constraints on the guest physical loading addresses and
>>> + * alignment for a PVH kernel.  This note's value is 3 64bit values in
>>> + * the following order: minimum, maximum and alignment.
>>
>> Along the lines of what I said on another sub-thread, I think at least
>> alignment wants to be optional here. Perhaps, with max going first, min
>> could also be optional.
> 
> Interesting idea.
> 
>> As indicated in different context by Roger, the values being uniformly
>> 64-bit ones also is questionable.
>>
>>> + * The presence of this note indicates the kernel is relocatable.
>>
>> I think it wants making explicit here that the act of relocating is still
>> left to the kernel.
> 
> Ok.
> 
> How is this for a new description?
> 
> """
> Physical loading constraints for PVH kernels
> 
> Used to place constraints on the guest physical loading addresses and 
> alignment for a PVH kernel.
> 
> The presence of this note indicates the kernel supports relocating itself.
> 
> The note may include up to three 32bit values.

I'm as unsure about always 32-bit as I am on it being uniformly 64-bit.
One question here is whether this note is intended to be x86-specific.

>   - a maximum address for the entire image to be loaded below (default 
> 0xfffffff)

One f too few?

>   - a minimum address for the start of the image (default 0)
>   - a required start alignment (default 1)
> """
> 
> I think if we can agree on the ELF note, the rest will fall into place.

Presumably, yes.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 13:51     ` Jason Andryuk
@ 2024-03-14 14:33       ` Roger Pau Monné
  2024-03-14 15:30         ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-14 14:33 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini

On Thu, Mar 14, 2024 at 09:51:22AM -0400, Jason Andryuk wrote:
> On 2024-03-14 05:48, Roger Pau Monné wrote:
> > On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
> > > Xen tries to load a PVH dom0 kernel at the fixed guest physical address
> > > from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
> > > it can be configured.
> > > 
> > > Unfortunately there exist firmwares that have reserved regions at this
> > > address, so Xen fails to load the dom0 kernel since it's not RAM.
> > > 
> > > The PVH entry code is not relocatable - it loads from absolute
> > > addresses, which fail when the kernel is loaded at a different address.
> > > With a suitably modified kernel, a reloctable entry point is possible.
> > > 
> > > Add XEN_ELFNOTE_PVH_RELOCATION which specifies the minimum, maximum and
> > > alignment needed for the kernel.  The presence of the NOTE indicates the
> > > kernel supports a relocatable entry path.
> > > 
> > > Change the loading to check for an acceptable load address.  If the
> > > kernel is relocatable, support finding an alternate load address.
> > > 
> > > Link: https://gitlab.com/xen-project/xen/-/issues/180
> > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> > > ---
> > > ELF Note printing looks like:
> > > (XEN) ELF: note: PVH_RELOCATION = min: 0x1000000 max: 0xffffffff align: 0x200000
> > > 
> > > v2:
> > > Use elfnote for min, max & align - use 64bit values.
> > > Print original and relocated memory addresses
> > > Use check_and_adjust_load_address() name
> > > Return relocated base instead of offset
> > > Use PAGE_ALIGN
> > > Don't load above max_phys (expected to be 4GB in kernel elf note)
> > > Use single line comments
> > > Exit check_load_address loop earlier
> > > Add __init to find_kernel_memory()
> > > ---
> > >   xen/arch/x86/hvm/dom0_build.c      | 108 +++++++++++++++++++++++++++++
> > >   xen/common/libelf/libelf-dominfo.c |  13 ++++
> > >   xen/include/public/elfnote.h       |  11 +++
> > >   xen/include/xen/libelf.h           |   3 +
> > >   4 files changed, 135 insertions(+)
> > > 
> > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> > > index 0ceda4140b..5c6c0d2db3 100644
> > > --- a/xen/arch/x86/hvm/dom0_build.c
> > > +++ b/xen/arch/x86/hvm/dom0_build.c
> > > @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
> > >       return INVALID_PADDR;
> > >   }
> > > +static bool __init check_load_address(
> > > +    const struct domain *d, const struct elf_binary *elf)
> > > +{
> > > +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> > 
> > Are you sure this is correct?  If a program header specifies a non-4K
> > aligned load address we should still try to honor it.  I think this is
> > very unlikely, but still we shouldn't apply non-requested alignments
> > to addresses coming from the ELF headers.
> 
> I think it's correct in terms of checking the e820 table.  Since the memory
> map is limited to 4k granularity, the bounds need to be rounded accordingly.

That's for populating the p2m, but I don't see why the kernel load
area should be limited by this?

There's AFAICt no issue from a kernel requesting that it's start load
address is not page aligned (granted that's very unlikely), but I
don't see why we would impose an unneeded restriction here.

The kernel load area doesn't affect how the p2m is populated, that's
mandated by the e820.

> > > +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> > > +    unsigned int i;
> > > +
> > > +    /*
> > > +     * The memory map is sorted and all RAM regions starts and sizes are
> > > +     * aligned to page boundaries.
> > 
> > Relying on sizes to be page aligned seems fragile: it might work now
> > because of the order in which pvh_setup_vmx_realmode_helpers() first
> > reserves memory for the TSS and afterwards for the identity page
> > tables, but it's not a property this code should assume.
> 
> That can be removed.  It would just eliminate the early exit...
> 
> > > +     */
> > > +    for ( i = 0; i < d->arch.nr_e820; i++ )
> > > +    {
> > > +        paddr_t start = d->arch.e820[i].addr;
> > > +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> > > +
> > > +        if ( start >= kernel_end )
> > > +            return false;
> 
> ... here.

I think the sorted aspect is fine, the aligned part is the one I'm
complaining about, so the check above can stay.

> > > +    const struct elf_dom_parms *parms)
> > > +{
> > > +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> > > +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> > > +    paddr_t kernel_size = kernel_end - kernel_start;
> > 
> > Hm, I'm again unsure about the alignments applied here.
> 
> Same as above regarding 4k granularity.
> 
> > I think if anything we want to assert that dest_base is aligned to phys_align.
> 
> That would indicate the kernel image is inconsistent.

Indeed.  I think doing that sanity check would be worth.

> > > diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
> > > index 7cc7b18a51..837a1b0f21 100644
> > > --- a/xen/common/libelf/libelf-dominfo.c
> > > +++ b/xen/common/libelf/libelf-dominfo.c
> > > @@ -125,6 +125,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
> > >           [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
> > >           [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
> > >           [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
> > > +        [XEN_ELFNOTE_PVH_RELOCATION] = { "PVH_RELOCATION", ELFNOTE_OTHER },
> > >       };
> > >   /* *INDENT-ON* */
> > > @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
> > >                   elf_note_numeric_array(elf, note, 8, 0),
> > >                   elf_note_numeric_array(elf, note, 8, 1));
> > >           break;
> > > +
> > > +    case XEN_ELFNOTE_PVH_RELOCATION:
> > > +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
> > > +            return -1;
> > > +
> > > +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
> > > +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
> > > +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
> > 
> > Size for those needs to be 4 (32bits) as the entry point is in 32bit
> > mode?  I don't see how we can start past the 4GB boundary.
> 
> I specified the note as 3x 64bit values.  It seemed simpler than trying to
> support both 32bit and 64bit depending on the kernel arch.  Also, just using
> 64bit provides room in case it is needed in the future.

Why do you say depending on the kernel arch?

PVH doesn't know the bitness of the kernel, as the kernel entry point
is always started in protected 32bit mode.  We should just support
32bit values, regardless of the kernel bitness, because that's the
only range that's suitable in order to jump into the entry point.

Note how XEN_ELFNOTE_PHYS32_ENTRY is also unconditionally a 32bit
integer.

> Do you want the note to be changed to 3x 32bit values?

Unless anyone objects, yes, that's would be my preference.

> > > +        elf_msg(elf, "min: %#"PRIx64" max: %#"PRIx64" align: %#"PRIx64"\n",
> > > +                parms->phys_min, parms->phys_max, parms->phys_align);
> > > +        break;
> > >       }
> > >       return 0;
> > >   }
> > > @@ -545,6 +557,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
> > >       parms->p2m_base = UNSET_ADDR;
> > >       parms->elf_paddr_offset = UNSET_ADDR;
> > >       parms->phys_entry = UNSET_ADDR32;
> > > +    parms->phys_align = UNSET_ADDR;
> > 
> > For correctness I would also init phys_{min,max}.
> 
> There is a memset() out of context above to zero the structure.  I thought
> leaving them both 0 would be fine.

0 would be a valid value, hence it's best to use UNSET_ADDR to clearly
notice when a value has been provided by the parsed binary or not.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 14:33       ` Roger Pau Monné
@ 2024-03-14 15:30         ` Jan Beulich
  2024-03-14 16:48           ` Roger Pau Monné
  2024-03-14 16:59           ` Jason Andryuk
  0 siblings, 2 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 15:30 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Jason Andryuk

On 14.03.2024 15:33, Roger Pau Monné wrote:
> On Thu, Mar 14, 2024 at 09:51:22AM -0400, Jason Andryuk wrote:
>> On 2024-03-14 05:48, Roger Pau Monné wrote:
>>> On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
>>>> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>>>                   elf_note_numeric_array(elf, note, 8, 0),
>>>>                   elf_note_numeric_array(elf, note, 8, 1));
>>>>           break;
>>>> +
>>>> +    case XEN_ELFNOTE_PVH_RELOCATION:
>>>> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
>>>> +            return -1;
>>>> +
>>>> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
>>>> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
>>>> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
>>>
>>> Size for those needs to be 4 (32bits) as the entry point is in 32bit
>>> mode?  I don't see how we can start past the 4GB boundary.
>>
>> I specified the note as 3x 64bit values.  It seemed simpler than trying to
>> support both 32bit and 64bit depending on the kernel arch.  Also, just using
>> 64bit provides room in case it is needed in the future.
> 
> Why do you say depending on the kernel arch?
> 
> PVH doesn't know the bitness of the kernel, as the kernel entry point
> is always started in protected 32bit mode.  We should just support
> 32bit values, regardless of the kernel bitness, because that's the
> only range that's suitable in order to jump into the entry point.
> 
> Note how XEN_ELFNOTE_PHYS32_ENTRY is also unconditionally a 32bit
> integer.
> 
>> Do you want the note to be changed to 3x 32bit values?
> 
> Unless anyone objects, yes, that's would be my preference.

As mentioned elsewhere, unless the entire note is meant to be x86-specific,
this fixed-32-bit property then would want limiting to x86.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 15:30         ` Jan Beulich
@ 2024-03-14 16:48           ` Roger Pau Monné
  2024-03-14 16:59           ` Jason Andryuk
  1 sibling, 0 replies; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-14 16:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini, Jason Andryuk

On Thu, Mar 14, 2024 at 04:30:05PM +0100, Jan Beulich wrote:
> On 14.03.2024 15:33, Roger Pau Monné wrote:
> > On Thu, Mar 14, 2024 at 09:51:22AM -0400, Jason Andryuk wrote:
> >> On 2024-03-14 05:48, Roger Pau Monné wrote:
> >>> On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
> >>>> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
> >>>>                   elf_note_numeric_array(elf, note, 8, 0),
> >>>>                   elf_note_numeric_array(elf, note, 8, 1));
> >>>>           break;
> >>>> +
> >>>> +    case XEN_ELFNOTE_PVH_RELOCATION:
> >>>> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
> >>>> +            return -1;
> >>>> +
> >>>> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
> >>>> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
> >>>> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
> >>>
> >>> Size for those needs to be 4 (32bits) as the entry point is in 32bit
> >>> mode?  I don't see how we can start past the 4GB boundary.
> >>
> >> I specified the note as 3x 64bit values.  It seemed simpler than trying to
> >> support both 32bit and 64bit depending on the kernel arch.  Also, just using
> >> 64bit provides room in case it is needed in the future.
> > 
> > Why do you say depending on the kernel arch?
> > 
> > PVH doesn't know the bitness of the kernel, as the kernel entry point
> > is always started in protected 32bit mode.  We should just support
> > 32bit values, regardless of the kernel bitness, because that's the
> > only range that's suitable in order to jump into the entry point.
> > 
> > Note how XEN_ELFNOTE_PHYS32_ENTRY is also unconditionally a 32bit
> > integer.
> > 
> >> Do you want the note to be changed to 3x 32bit values?
> > 
> > Unless anyone objects, yes, that's would be my preference.
> 
> As mentioned elsewhere, unless the entire note is meant to be x86-specific,
> this fixed-32-bit property then would want limiting to x86.

Elfnotes are used only on x86 so far.  I don't see why if/when another
architecture wants to use the same elfnotes names with different field
sizes that would be an issue.  When such a need arises we could
clarify that 32-bit size is only for x86 and also specify the size for
the other architecture.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 15:30         ` Jan Beulich
  2024-03-14 16:48           ` Roger Pau Monné
@ 2024-03-14 16:59           ` Jason Andryuk
  2024-03-14 17:02             ` Jan Beulich
  2024-03-15  8:45             ` Roger Pau Monné
  1 sibling, 2 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 16:59 UTC (permalink / raw)
  To: Jan Beulich, Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini

On 2024-03-14 11:30, Jan Beulich wrote:
> On 14.03.2024 15:33, Roger Pau Monné wrote:
>> On Thu, Mar 14, 2024 at 09:51:22AM -0400, Jason Andryuk wrote:
>>> On 2024-03-14 05:48, Roger Pau Monné wrote:
>>>> On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
>>>>> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>>>>                    elf_note_numeric_array(elf, note, 8, 0),
>>>>>                    elf_note_numeric_array(elf, note, 8, 1));
>>>>>            break;
>>>>> +
>>>>> +    case XEN_ELFNOTE_PVH_RELOCATION:
>>>>> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
>>>>> +            return -1;
>>>>> +
>>>>> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
>>>>> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
>>>>> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
>>>>
>>>> Size for those needs to be 4 (32bits) as the entry point is in 32bit
>>>> mode?  I don't see how we can start past the 4GB boundary.
>>>
>>> I specified the note as 3x 64bit values.  It seemed simpler than trying to
>>> support both 32bit and 64bit depending on the kernel arch.  Also, just using
>>> 64bit provides room in case it is needed in the future.
>>
>> Why do you say depending on the kernel arch?
>>
>> PVH doesn't know the bitness of the kernel, as the kernel entry point
>> is always started in protected 32bit mode.  We should just support
>> 32bit values, regardless of the kernel bitness, because that's the
>> only range that's suitable in order to jump into the entry point.
>>
>> Note how XEN_ELFNOTE_PHYS32_ENTRY is also unconditionally a 32bit
>> integer.

Linux defines PHYS32_ENTRY with _ASM_PTR, so it's 32 or 64 bit to match 
how the kernel is compiled.  The Xen code parses the integer according 
to the size of the note.

>>> Do you want the note to be changed to 3x 32bit values?
>>
>> Unless anyone objects, yes, that's would be my preference.
> 
> As mentioned elsewhere, unless the entire note is meant to be x86-specific,
> this fixed-32-bit property then would want limiting to x86.

PVH is only implemented for x86 today.  Are you saying that the comment 
should just specify the values are 32bit for x86?  If the note is reused 
for other arches, then they can specify their usage?

If this note is to be a variably sized array of values, then the 
elements should be of fixed size.  Otherwise parsing is ambiguous 
without, say, another field specifying element size.

Maybe XEN_ELFNOTE_PHYS32_RELOC would be a better name to complement the 
PHYS32_ENTRY?

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 16:59           ` Jason Andryuk
@ 2024-03-14 17:02             ` Jan Beulich
  2024-03-15  8:45             ` Roger Pau Monné
  1 sibling, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-14 17:02 UTC (permalink / raw)
  To: Jason Andryuk, Roger Pau Monné
  Cc: xen-devel, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall,
	Stefano Stabellini

On 14.03.2024 17:59, Jason Andryuk wrote:
> On 2024-03-14 11:30, Jan Beulich wrote:
>> On 14.03.2024 15:33, Roger Pau Monné wrote:
>>> On Thu, Mar 14, 2024 at 09:51:22AM -0400, Jason Andryuk wrote:
>>>> On 2024-03-14 05:48, Roger Pau Monné wrote:
>>>>> On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
>>>>>> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>>>>>                    elf_note_numeric_array(elf, note, 8, 0),
>>>>>>                    elf_note_numeric_array(elf, note, 8, 1));
>>>>>>            break;
>>>>>> +
>>>>>> +    case XEN_ELFNOTE_PVH_RELOCATION:
>>>>>> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
>>>>>> +            return -1;
>>>>>> +
>>>>>> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
>>>>>> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
>>>>>> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
>>>>>
>>>>> Size for those needs to be 4 (32bits) as the entry point is in 32bit
>>>>> mode?  I don't see how we can start past the 4GB boundary.
>>>>
>>>> I specified the note as 3x 64bit values.  It seemed simpler than trying to
>>>> support both 32bit and 64bit depending on the kernel arch.  Also, just using
>>>> 64bit provides room in case it is needed in the future.
>>>
>>> Why do you say depending on the kernel arch?
>>>
>>> PVH doesn't know the bitness of the kernel, as the kernel entry point
>>> is always started in protected 32bit mode.  We should just support
>>> 32bit values, regardless of the kernel bitness, because that's the
>>> only range that's suitable in order to jump into the entry point.
>>>
>>> Note how XEN_ELFNOTE_PHYS32_ENTRY is also unconditionally a 32bit
>>> integer.
> 
> Linux defines PHYS32_ENTRY with _ASM_PTR, so it's 32 or 64 bit to match 
> how the kernel is compiled.  The Xen code parses the integer according 
> to the size of the note.
> 
>>>> Do you want the note to be changed to 3x 32bit values?
>>>
>>> Unless anyone objects, yes, that's would be my preference.
>>
>> As mentioned elsewhere, unless the entire note is meant to be x86-specific,
>> this fixed-32-bit property then would want limiting to x86.
> 
> PVH is only implemented for x86 today.  Are you saying that the comment 
> should just specify the values are 32bit for x86?  If the note is reused 
> for other arches, then they can specify their usage?

Along these lines. But looks like Roger isn't concerned and would be
happy to leave that to the future.

> If this note is to be a variably sized array of values, then the 
> elements should be of fixed size.  Otherwise parsing is ambiguous 
> without, say, another field specifying element size.
> 
> Maybe XEN_ELFNOTE_PHYS32_RELOC would be a better name to complement the 
> PHYS32_ENTRY?

Perhaps, yes.

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 13:31   ` Jan Beulich
@ 2024-03-14 19:19     ` Jason Andryuk
  2024-03-15  9:48       ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 19:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-14 09:31, Jan Beulich wrote:
> On 13.03.2024 20:30, Jason Andryuk wrote:
>> --- a/xen/arch/x86/hvm/dom0_build.c
>> +++ b/xen/arch/x86/hvm/dom0_build.c
>> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>>       return INVALID_PADDR;
>>   }
>>   
>> +static bool __init check_load_address(
>> +    const struct domain *d, const struct elf_binary *elf)
>> +{
>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> 
> Both casts act on a pointer value. Such cannot legitimately be converted
> to paddr_t; it only so happens that paddr_t is effectively the same as
> uintptr_t right now. (Same issue again further down.) That said, I notice
> we have pre-existing examples of this ...

Yes, I followed existing code.  Do you want dest_base to be switched to 
a uintptr_t?

>> +/* Check the kernel load address, and adjust if necessary and possible. */
>> +static bool __init check_and_adjust_load_address(
>> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
>> +{
>> +    paddr_t reloc_base;
>> +
>> +    if ( check_load_address(d, elf) )
>> +        return true;
>> +
>> +    if ( parms->phys_align == UNSET_ADDR )
>> +    {
>> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
>> +        return false;
>> +    }
>> +
>> +    reloc_base = find_kernel_memory(d, elf, parms);
>> +    if ( reloc_base == 0 )
>> +    {
>> +        printk("Failed find a load address for the kernel\n");
>> +        return false;
>> +    }
>> +
>> +    if ( opt_dom0_verbose )
>> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
>> +               (paddr_t)elf->dest_base,
>> +               (paddr_t)elf->dest_base + elf->dest_size,
> 
> By using %p you clearly can avoid the casts here.

Ok.

>> +               reloc_base, reloc_base + elf->dest_size);
> 
> I'm not convinced %lx is really appropriate for paddr_t.

PRIpaddr exists.  It's "016lx" for x86.  Using that and %p add lots of 0s:
(XEN) Relocating kernel from [0000000001000000, 000000000202ffff] -> 
[0000000002200000, 000000000322ffff]

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/3] libelf: Expand ELF note printing
  2024-03-14 13:16   ` Jan Beulich
@ 2024-03-14 20:36     ` Jason Andryuk
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-14 20:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini,
	Wei Liu, xen-devel

On 2024-03-14 09:16, Jan Beulich wrote:
> On 13.03.2024 20:30, Jason Andryuk wrote:
>> @@ -217,6 +225,15 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>>       case XEN_ELFNOTE_PHYS32_ENTRY:
>>           parms->phys_entry = val;
>>           break;
>> +
>> +    case XEN_ELFNOTE_L1_MFN_VALID:
>> +        if ( elf_uval(elf, note, descsz) != 2 * sizeof(uint64_t) )
>> +            return -1;
> 
> elf_note_numeric() use sites don't have such a check. Why would we need
> one here, and even more so causing a error to be raised when in reality
> the supplied values (still) aren't consumed? Furthermore the documentation
> says "pairs" (plural) for a reason. Finally maddr_t-sized only happens to
> mean uint64_t on all architectures we presently support.

I failed to pay attention to the definition stating plural pairs.  I saw 
Linux stored two 64bit values and printed them.

I added the size check to avoid going out of bounds.  elf_note_numeric() 
handles 1, 2, 4 or 8 bytes and returns 0 otherwise, so it won't overrun 
boundaries.  That's why it was getting printed as "ELF: note: 
L1_MFN_VALID = 0"

What motivated this was seeing "PVH_RELOCATION = 0".  That confusingly 
looks like relocation is disabled.

I'm fine dropping this attempt at printing the L1_MFN_VALID note. 
maddr_t is not defined, and it looks the note values were 32bit in 
non-PAE kernels.  It could just be printed as "ELF: note: L1_MFN_VALID" 
with no details.

Regards,
Jason

>> +        elf_msg(elf, "mask: %#"PRIx64" val: %#"PRIx64"\n",
>> +                elf_note_numeric_array(elf, note, 8, 0),
>> +                elf_note_numeric_array(elf, note, 8, 1));
>> +        break;
>>       }
>>       return 0;
>>   }
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 16:59           ` Jason Andryuk
  2024-03-14 17:02             ` Jan Beulich
@ 2024-03-15  8:45             ` Roger Pau Monné
  1 sibling, 0 replies; 32+ messages in thread
From: Roger Pau Monné @ 2024-03-15  8:45 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Jan Beulich, xen-devel, Andrew Cooper, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini

On Thu, Mar 14, 2024 at 12:59:19PM -0400, Jason Andryuk wrote:
> On 2024-03-14 11:30, Jan Beulich wrote:
> > On 14.03.2024 15:33, Roger Pau Monné wrote:
> > > On Thu, Mar 14, 2024 at 09:51:22AM -0400, Jason Andryuk wrote:
> > > > On 2024-03-14 05:48, Roger Pau Monné wrote:
> > > > > On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
> > > > > > @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
> > > > > >                    elf_note_numeric_array(elf, note, 8, 0),
> > > > > >                    elf_note_numeric_array(elf, note, 8, 1));
> > > > > >            break;
> > > > > > +
> > > > > > +    case XEN_ELFNOTE_PVH_RELOCATION:
> > > > > > +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
> > > > > > +            return -1;
> > > > > > +
> > > > > > +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
> > > > > > +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
> > > > > > +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);
> > > > > 
> > > > > Size for those needs to be 4 (32bits) as the entry point is in 32bit
> > > > > mode?  I don't see how we can start past the 4GB boundary.
> > > > 
> > > > I specified the note as 3x 64bit values.  It seemed simpler than trying to
> > > > support both 32bit and 64bit depending on the kernel arch.  Also, just using
> > > > 64bit provides room in case it is needed in the future.
> > > 
> > > Why do you say depending on the kernel arch?
> > > 
> > > PVH doesn't know the bitness of the kernel, as the kernel entry point
> > > is always started in protected 32bit mode.  We should just support
> > > 32bit values, regardless of the kernel bitness, because that's the
> > > only range that's suitable in order to jump into the entry point.
> > > 
> > > Note how XEN_ELFNOTE_PHYS32_ENTRY is also unconditionally a 32bit
> > > integer.
> 
> Linux defines PHYS32_ENTRY with _ASM_PTR, so it's 32 or 64 bit to match how
> the kernel is compiled.  The Xen code parses the integer according to the
> size of the note.

I think that's wrong, PHYS32_ENTRY should strictly be a 32bit integer,
in fact the field in struct elf_dom_parms is an uint32_t, so Linux
using _ASM_PTR seems bogus, it should unconditionally use .long
regardless of the kernel bitness.

> > > > Do you want the note to be changed to 3x 32bit values?
> > > 
> > > Unless anyone objects, yes, that's would be my preference.
> > 
> > As mentioned elsewhere, unless the entire note is meant to be x86-specific,
> > this fixed-32-bit property then would want limiting to x86.
> 
> PVH is only implemented for x86 today.  Are you saying that the comment
> should just specify the values are 32bit for x86?  If the note is reused for
> other arches, then they can specify their usage?
> 
> If this note is to be a variably sized array of values, then the elements
> should be of fixed size.  Otherwise parsing is ambiguous without, say,
> another field specifying element size.
> 
> Maybe XEN_ELFNOTE_PHYS32_RELOC would be a better name to complement the
> PHYS32_ENTRY?

IMO the '32' part of PHYS32_ENTRY is kind of redundant, given the CPU
state when using such entry point it's impossible to use 64bit
addresses.  I would be fine with using XEN_ELFNOTE_PHYS_RELOC or some
such.  Anyway, this is just a name so I'm not going to opposed if Jan
and yourself prefer to keep using PHYS32.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 19:19     ` Jason Andryuk
@ 2024-03-15  9:48       ` Jan Beulich
  2024-03-18 21:21         ` Jason Andryuk
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-15  9:48 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 14.03.2024 20:19, Jason Andryuk wrote:
> On 2024-03-14 09:31, Jan Beulich wrote:
>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>> --- a/xen/arch/x86/hvm/dom0_build.c
>>> +++ b/xen/arch/x86/hvm/dom0_build.c
>>> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>>>       return INVALID_PADDR;
>>>   }
>>>   
>>> +static bool __init check_load_address(
>>> +    const struct domain *d, const struct elf_binary *elf)
>>> +{
>>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>>
>> Both casts act on a pointer value. Such cannot legitimately be converted
>> to paddr_t; it only so happens that paddr_t is effectively the same as
>> uintptr_t right now. (Same issue again further down.) That said, I notice
>> we have pre-existing examples of this ...
> 
> Yes, I followed existing code.  Do you want dest_base to be switched to 
> a uintptr_t?

I think it was deliberately switched to being a pointer at some point,
maybe even in a security fix.

>>> +/* Check the kernel load address, and adjust if necessary and possible. */
>>> +static bool __init check_and_adjust_load_address(
>>> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
>>> +{
>>> +    paddr_t reloc_base;
>>> +
>>> +    if ( check_load_address(d, elf) )
>>> +        return true;
>>> +
>>> +    if ( parms->phys_align == UNSET_ADDR )
>>> +    {
>>> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
>>> +        return false;
>>> +    }
>>> +
>>> +    reloc_base = find_kernel_memory(d, elf, parms);
>>> +    if ( reloc_base == 0 )
>>> +    {
>>> +        printk("Failed find a load address for the kernel\n");
>>> +        return false;
>>> +    }
>>> +
>>> +    if ( opt_dom0_verbose )
>>> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
>>> +               (paddr_t)elf->dest_base,
>>> +               (paddr_t)elf->dest_base + elf->dest_size,
>>
>> By using %p you clearly can avoid the casts here.
> 
> Ok.
> 
>>> +               reloc_base, reloc_base + elf->dest_size);
>>
>> I'm not convinced %lx is really appropriate for paddr_t.
> 
> PRIpaddr exists.  It's "016lx" for x86.  Using that and %p add lots of 0s:
> (XEN) Relocating kernel from [0000000001000000, 000000000202ffff] -> 
> [0000000002200000, 000000000322ffff]

That's to be accepted, I guess.

Looking at it again, is "Relocating" in the log message perhaps misleading?
We don't relocate it, that's something the kernel itself has to do. We only
put it at other than the default position. Maybe "Moving" instead?

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-14 14:19       ` Jan Beulich
@ 2024-03-18 21:19         ` Jason Andryuk
  2024-03-19  8:11           ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-18 21:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-14 10:19, Jan Beulich wrote:
> On 14.03.2024 15:13, Jason Andryuk wrote:
>> On 2024-03-14 09:21, Jan Beulich wrote:
>>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>>> --- a/xen/include/public/elfnote.h
>>>> +++ b/xen/include/public/elfnote.h
>>>> @@ -194,6 +194,17 @@
>>>>     */
>>>>    #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>>    
>>>> +/*
>>>> + * Physical loading constraints for PVH kernels
>>>> + *
>>>> + * Used to place constraints on the guest physical loading addresses and
>>>> + * alignment for a PVH kernel.  This note's value is 3 64bit values in
>>>> + * the following order: minimum, maximum and alignment.
>>>
>>> Along the lines of what I said on another sub-thread, I think at least
>>> alignment wants to be optional here. Perhaps, with max going first, min
>>> could also be optional.
>>
>> Interesting idea.
>>
>>> As indicated in different context by Roger, the values being uniformly
>>> 64-bit ones also is questionable.
>>>
>>>> + * The presence of this note indicates the kernel is relocatable.
>>>
>>> I think it wants making explicit here that the act of relocating is still
>>> left to the kernel.
>>
>> Ok.
>>
>> How is this for a new description?
>>
>> """
>> Physical loading constraints for PVH kernels
>>
>> Used to place constraints on the guest physical loading addresses and
>> alignment for a PVH kernel.
>>
>> The presence of this note indicates the kernel supports relocating itself.
>>
>> The note may include up to three 32bit values.
> 
> I'm as unsure about always 32-bit as I am on it being uniformly 64-bit.
> One question here is whether this note is intended to be x86-specific.
> 
>>    - a maximum address for the entire image to be loaded below (default
>> 0xfffffff)
> 
> One f too few?

Whoops - yes.

>>    - a minimum address for the start of the image (default 0)
>>    - a required start alignment (default 1)

Jan, in the discussion of patch 1, you wrote "Hmm, shouldn't the order 
of attempts to figure the alignment be ELF note, ELF header, and then 
2Mb?"  My latest revision initializes phys_alignment to 1 and updates 
that if PHYS32_RELOC specifies an alignment.  Do you still want these 
other locations checked for alignment values?

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-15  9:48       ` Jan Beulich
@ 2024-03-18 21:21         ` Jason Andryuk
  2024-03-19  8:15           ` Jan Beulich
  0 siblings, 1 reply; 32+ messages in thread
From: Jason Andryuk @ 2024-03-18 21:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-15 05:48, Jan Beulich wrote:
> On 14.03.2024 20:19, Jason Andryuk wrote:
>> On 2024-03-14 09:31, Jan Beulich wrote:
>>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>>> --- a/xen/arch/x86/hvm/dom0_build.c
>>>> +++ b/xen/arch/x86/hvm/dom0_build.c
>>>> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>>>>        return INVALID_PADDR;
>>>>    }
>>>>    
>>>> +static bool __init check_load_address(
>>>> +    const struct domain *d, const struct elf_binary *elf)
>>>> +{
>>>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>>>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>>>
>>> Both casts act on a pointer value. Such cannot legitimately be converted
>>> to paddr_t; it only so happens that paddr_t is effectively the same as
>>> uintptr_t right now. (Same issue again further down.) That said, I notice
>>> we have pre-existing examples of this ...
>>
>> Yes, I followed existing code.  Do you want dest_base to be switched to
>> a uintptr_t?
> 
> I think it was deliberately switched to being a pointer at some point,
> maybe even in a security fix.

commit 65808a8ed41cc7c044f588bd6cab5af0fdc0e029 "libelf: check all 
pointer accesses", part of XSA-55, switched from a single dest pointer 
to dest_base & dest_size.

For libxenguest, it's a pointer to guest-mapped memory to copy in the 
kernel.  For PV dom0 creation, it's a pointer - Xen switches to the dom0 
page tables and performs the copy with it as-is.  For PVH dom0, it's a 
guest physical address.

Are you looking for this to be addressed in this series?

>>>> +/* Check the kernel load address, and adjust if necessary and possible. */
>>>> +static bool __init check_and_adjust_load_address(
>>>> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
>>>> +{
>>>> +    paddr_t reloc_base;
>>>> +
>>>> +    if ( check_load_address(d, elf) )
>>>> +        return true;
>>>> +
>>>> +    if ( parms->phys_align == UNSET_ADDR )
>>>> +    {
>>>> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    reloc_base = find_kernel_memory(d, elf, parms);
>>>> +    if ( reloc_base == 0 )
>>>> +    {
>>>> +        printk("Failed find a load address for the kernel\n");
>>>> +        return false;
>>>> +    }
>>>> +
>>>> +    if ( opt_dom0_verbose )
>>>> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
>>>> +               (paddr_t)elf->dest_base,
>>>> +               (paddr_t)elf->dest_base + elf->dest_size,
>>>
>>> By using %p you clearly can avoid the casts here.
>>
>> Ok.
>>
>>>> +               reloc_base, reloc_base + elf->dest_size);
>>>
>>> I'm not convinced %lx is really appropriate for paddr_t.
>>
>> PRIpaddr exists.  It's "016lx" for x86.  Using that and %p add lots of 0s:
>> (XEN) Relocating kernel from [0000000001000000, 000000000202ffff] ->
>> [0000000002200000, 000000000322ffff]
> 
> That's to be accepted, I guess.
> 
> Looking at it again, is "Relocating" in the log message perhaps misleading?
> We don't relocate it, that's something the kernel itself has to do. We only
> put it at other than the default position. Maybe "Moving" instead?

Yes, "Moving" sounds better.  I guess I'll drop the "from" and insert a 
%pd for:

(XEN) Moving d0 kernel [0000000001000000, 000000000202ffff] -> 
[0000000002200000, 000000000322ffff]

Regards,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-18 21:19         ` Jason Andryuk
@ 2024-03-19  8:11           ` Jan Beulich
  0 siblings, 0 replies; 32+ messages in thread
From: Jan Beulich @ 2024-03-19  8:11 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 18.03.2024 22:19, Jason Andryuk wrote:
> On 2024-03-14 10:19, Jan Beulich wrote:
>> On 14.03.2024 15:13, Jason Andryuk wrote:
>>> On 2024-03-14 09:21, Jan Beulich wrote:
>>>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>>>> --- a/xen/include/public/elfnote.h
>>>>> +++ b/xen/include/public/elfnote.h
>>>>> @@ -194,6 +194,17 @@
>>>>>     */
>>>>>    #define XEN_ELFNOTE_PHYS32_ENTRY 18
>>>>>    
>>>>> +/*
>>>>> + * Physical loading constraints for PVH kernels
>>>>> + *
>>>>> + * Used to place constraints on the guest physical loading addresses and
>>>>> + * alignment for a PVH kernel.  This note's value is 3 64bit values in
>>>>> + * the following order: minimum, maximum and alignment.
>>>>
>>>> Along the lines of what I said on another sub-thread, I think at least
>>>> alignment wants to be optional here. Perhaps, with max going first, min
>>>> could also be optional.
>>>
>>> Interesting idea.
>>>
>>>> As indicated in different context by Roger, the values being uniformly
>>>> 64-bit ones also is questionable.
>>>>
>>>>> + * The presence of this note indicates the kernel is relocatable.
>>>>
>>>> I think it wants making explicit here that the act of relocating is still
>>>> left to the kernel.
>>>
>>> Ok.
>>>
>>> How is this for a new description?
>>>
>>> """
>>> Physical loading constraints for PVH kernels
>>>
>>> Used to place constraints on the guest physical loading addresses and
>>> alignment for a PVH kernel.
>>>
>>> The presence of this note indicates the kernel supports relocating itself.
>>>
>>> The note may include up to three 32bit values.
>>
>> I'm as unsure about always 32-bit as I am on it being uniformly 64-bit.
>> One question here is whether this note is intended to be x86-specific.
>>
>>>    - a maximum address for the entire image to be loaded below (default
>>> 0xfffffff)
>>
>> One f too few?
> 
> Whoops - yes.
> 
>>>    - a minimum address for the start of the image (default 0)
>>>    - a required start alignment (default 1)
> 
> Jan, in the discussion of patch 1, you wrote "Hmm, shouldn't the order 
> of attempts to figure the alignment be ELF note, ELF header, and then 
> 2Mb?"  My latest revision initializes phys_alignment to 1 and updates 
> that if PHYS32_RELOC specifies an alignment.  Do you still want these 
> other locations checked for alignment values?

I think it would be prudent to do so, yet at the same time I guess I won't
insist. Defaulting to 1, though, looks overly lax. In order for the
alignment value to be sensible to omit, the default needs to be sensible
(no lower than 4k, and quite likely better 2M).

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-18 21:21         ` Jason Andryuk
@ 2024-03-19  8:15           ` Jan Beulich
  2024-03-19 13:50             ` Jason Andryuk
  0 siblings, 1 reply; 32+ messages in thread
From: Jan Beulich @ 2024-03-19  8:15 UTC (permalink / raw)
  To: Jason Andryuk
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 18.03.2024 22:21, Jason Andryuk wrote:
> On 2024-03-15 05:48, Jan Beulich wrote:
>> On 14.03.2024 20:19, Jason Andryuk wrote:
>>> On 2024-03-14 09:31, Jan Beulich wrote:
>>>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>>>> --- a/xen/arch/x86/hvm/dom0_build.c
>>>>> +++ b/xen/arch/x86/hvm/dom0_build.c
>>>>> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>>>>>        return INVALID_PADDR;
>>>>>    }
>>>>>    
>>>>> +static bool __init check_load_address(
>>>>> +    const struct domain *d, const struct elf_binary *elf)
>>>>> +{
>>>>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>>>>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>>>>
>>>> Both casts act on a pointer value. Such cannot legitimately be converted
>>>> to paddr_t; it only so happens that paddr_t is effectively the same as
>>>> uintptr_t right now. (Same issue again further down.) That said, I notice
>>>> we have pre-existing examples of this ...
>>>
>>> Yes, I followed existing code.  Do you want dest_base to be switched to
>>> a uintptr_t?
>>
>> I think it was deliberately switched to being a pointer at some point,
>> maybe even in a security fix.
> 
> commit 65808a8ed41cc7c044f588bd6cab5af0fdc0e029 "libelf: check all 
> pointer accesses", part of XSA-55, switched from a single dest pointer 
> to dest_base & dest_size.
> 
> For libxenguest, it's a pointer to guest-mapped memory to copy in the 
> kernel.  For PV dom0 creation, it's a pointer - Xen switches to the dom0 
> page tables and performs the copy with it as-is.  For PVH dom0, it's a 
> guest physical address.
> 
> Are you looking for this to be addressed in this series?

I'm sorry to ask, but what is "this" here? I'd like your change to leave
all types alone. I'd further like your change to preferably avoid cloning
questionable code (i.e. casts using the wrong type in particular). Plus,
where technically possible, adhere to the general principle of avoiding
casts (for typically being at least somewhat risky towards hiding
potential bugs).

Jan


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
  2024-03-19  8:15           ` Jan Beulich
@ 2024-03-19 13:50             ` Jason Andryuk
  0 siblings, 0 replies; 32+ messages in thread
From: Jason Andryuk @ 2024-03-19 13:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap,
	Julien Grall, Stefano Stabellini, xen-devel

On 2024-03-19 04:15, Jan Beulich wrote:
> On 18.03.2024 22:21, Jason Andryuk wrote:
>> On 2024-03-15 05:48, Jan Beulich wrote:
>>> On 14.03.2024 20:19, Jason Andryuk wrote:
>>>> On 2024-03-14 09:31, Jan Beulich wrote:
>>>>> On 13.03.2024 20:30, Jason Andryuk wrote:
>>>>>> --- a/xen/arch/x86/hvm/dom0_build.c
>>>>>> +++ b/xen/arch/x86/hvm/dom0_build.c
>>>>>> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>>>>>>         return INVALID_PADDR;
>>>>>>     }
>>>>>>     
>>>>>> +static bool __init check_load_address(
>>>>>> +    const struct domain *d, const struct elf_binary *elf)
>>>>>> +{
>>>>>> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
>>>>>> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
>>>>>
>>>>> Both casts act on a pointer value. Such cannot legitimately be converted
>>>>> to paddr_t; it only so happens that paddr_t is effectively the same as
>>>>> uintptr_t right now. (Same issue again further down.) That said, I notice
>>>>> we have pre-existing examples of this ...
>>>>
>>>> Yes, I followed existing code.  Do you want dest_base to be switched to
>>>> a uintptr_t?
>>>
>>> I think it was deliberately switched to being a pointer at some point,
>>> maybe even in a security fix.
>>
>> commit 65808a8ed41cc7c044f588bd6cab5af0fdc0e029 "libelf: check all
>> pointer accesses", part of XSA-55, switched from a single dest pointer
>> to dest_base & dest_size.
>>
>> For libxenguest, it's a pointer to guest-mapped memory to copy in the
>> kernel.  For PV dom0 creation, it's a pointer - Xen switches to the dom0
>> page tables and performs the copy with it as-is.  For PVH dom0, it's a
>> guest physical address.
>>
>> Are you looking for this to be addressed in this series?
> 
> I'm sorry to ask, but what is "this" here? I'd like your change to leave
> all types alone. I'd further like your change to preferably avoid cloning
> questionable code (i.e. casts using the wrong type in particular). Plus,
> where technically possible, adhere to the general principle of avoiding
> casts (for typically being at least somewhat risky towards hiding
> potential bugs).

I intended "this" to refer to changing dest_base's type.  I won't do 
that.  With some of your other suggestions, most castings are gone.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-03-19 13:51 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-13 19:30 [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk
2024-03-13 19:30 ` [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
2024-03-14  7:11   ` Jan Beulich
2024-03-14 13:01     ` Jason Andryuk
2024-03-14 13:33       ` Jan Beulich
2024-03-13 19:30 ` [PATCH v2 2/3] libelf: Expand ELF note printing Jason Andryuk
2024-03-14 13:16   ` Jan Beulich
2024-03-14 20:36     ` Jason Andryuk
2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
2024-03-13 21:02   ` Jason Andryuk
2024-03-14  7:12     ` Jan Beulich
2024-03-14 12:46       ` Jason Andryuk
2024-03-14  9:48   ` Roger Pau Monné
2024-03-14 13:51     ` Jason Andryuk
2024-03-14 14:33       ` Roger Pau Monné
2024-03-14 15:30         ` Jan Beulich
2024-03-14 16:48           ` Roger Pau Monné
2024-03-14 16:59           ` Jason Andryuk
2024-03-14 17:02             ` Jan Beulich
2024-03-15  8:45             ` Roger Pau Monné
2024-03-14 13:21   ` Jan Beulich
2024-03-14 14:13     ` Jason Andryuk
2024-03-14 14:19       ` Jan Beulich
2024-03-18 21:19         ` Jason Andryuk
2024-03-19  8:11           ` Jan Beulich
2024-03-14 13:31   ` Jan Beulich
2024-03-14 19:19     ` Jason Andryuk
2024-03-15  9:48       ` Jan Beulich
2024-03-18 21:21         ` Jason Andryuk
2024-03-19  8:15           ` Jan Beulich
2024-03-19 13:50             ` Jason Andryuk
2024-03-13 19:46 ` [PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.