* [PATCH 0/3] x86/pvh: Support relocating dom0 kernel
@ 2024-03-06 18:50 Jason Andryuk
2024-03-06 18:50 ` [PATCH 1/3] features.h: Replace hard tabs Jason Andryuk
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Jason Andryuk @ 2024-03-06 18:50 UTC (permalink / raw)
To: xen-devel
Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich,
Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné
Xen tries to load a PVH dom0 kernel at the fixed guest physical address
from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but
it can be configured.
Unfortunately there exist firmwares that have reserved regions at this
address, so Xen fails to load the dom0 kernel since it's not RAM.
The other issue is that the Linux PVH entry point is not
position-independent. It expects to run at the compiled
CONFIG_PHYSICAL_ADDRESS.
This patch set expands the PVH dom0 builder to try to relocate the
kernel if needed and possible. XENFEAT_pvh_relocatable is added for
kernels to indicate they are relocatable. However, we may want to
switch to an additional ELF note with the kernel alignment. Linux
specifies a kernel alignment in the bzImage boot_params.setup_header,
but that is not present the ELF vmlinux file.
The first patch fixes some whitespace in features.h
The second patch enhances bzimage_parse to pull the kernel_alignment into
an optional align pointer.
The third patch expands the pvh dom0 kernel placement code.
I'll post an additional patch showing the Linux changes to make PVH
relocatable.
Jason Andryuk (3):
features.h: Replace hard tabs
xen/x86: bzImage parse kernel_alignment
x86/PVH: Support relocatable dom0 kernels
xen/arch/x86/bzimage.c | 4 +-
xen/arch/x86/hvm/dom0_build.c | 113 ++++++++++++++++++++++++++++-
xen/arch/x86/include/asm/bzimage.h | 3 +-
xen/arch/x86/pv/dom0_build.c | 2 +-
xen/include/public/features.h | 9 ++-
5 files changed, 124 insertions(+), 7 deletions(-)
--
2.44.0
^ permalink raw reply [flat|nested] 21+ messages in thread* [PATCH 1/3] features.h: Replace hard tabs 2024-03-06 18:50 [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk @ 2024-03-06 18:50 ` Jason Andryuk 2024-03-06 20:41 ` Stefano Stabellini 2024-03-06 18:50 ` [PATCH 2/3] xen/x86: bzImage parse kernel_alignment Jason Andryuk ` (3 subsequent siblings) 4 siblings, 1 reply; 21+ messages in thread From: Jason Andryuk @ 2024-03-06 18:50 UTC (permalink / raw) To: xen-devel Cc: Jason Andryuk, Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu Replace hard tabs to match the rest of the file. Fixes: 48a3fd1432 ("domain: expose newly introduced hypercalls as XENFEAT") Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> --- xen/include/public/features.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/include/public/features.h b/xen/include/public/features.h index 36936f6a4e..4437f25d25 100644 --- a/xen/include/public/features.h +++ b/xen/include/public/features.h @@ -117,8 +117,8 @@ * VCPUOP_register_runstate_phys_area * VCPUOP_register_vcpu_time_phys_area */ -#define XENFEAT_runstate_phys_area 18 -#define XENFEAT_vcpu_time_phys_area 19 +#define XENFEAT_runstate_phys_area 18 +#define XENFEAT_vcpu_time_phys_area 19 #define XENFEAT_NR_SUBMAPS 1 -- 2.44.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/3] features.h: Replace hard tabs 2024-03-06 18:50 ` [PATCH 1/3] features.h: Replace hard tabs Jason Andryuk @ 2024-03-06 20:41 ` Stefano Stabellini 0 siblings, 0 replies; 21+ messages in thread From: Stefano Stabellini @ 2024-03-06 20:41 UTC (permalink / raw) To: Jason Andryuk Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu On Wed, 6 Mar 2024, Jason Andryuk wrote: > Replace hard tabs to match the rest of the file. > > Fixes: 48a3fd1432 ("domain: expose newly introduced hypercalls as XENFEAT") > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 2/3] xen/x86: bzImage parse kernel_alignment 2024-03-06 18:50 [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk 2024-03-06 18:50 ` [PATCH 1/3] features.h: Replace hard tabs Jason Andryuk @ 2024-03-06 18:50 ` Jason Andryuk 2024-03-07 2:09 ` Stefano Stabellini 2024-03-06 18:50 ` [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk ` (2 subsequent siblings) 4 siblings, 1 reply; 21+ messages in thread From: Jason Andryuk @ 2024-03-06 18:50 UTC (permalink / raw) To: xen-devel Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu Expand bzimage_parse() to return kernel_alignment from the setup_header. This will be needed if loading a PVH kernel at a physical offset to compensate for a reserved E820 region. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> --- xen/arch/x86/bzimage.c | 4 +++- xen/arch/x86/hvm/dom0_build.c | 4 +++- xen/arch/x86/include/asm/bzimage.h | 3 +-- xen/arch/x86/pv/dom0_build.c | 2 +- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c index ac4fd428be..0f4cfc49f7 100644 --- a/xen/arch/x86/bzimage.c +++ b/xen/arch/x86/bzimage.c @@ -105,7 +105,7 @@ unsigned long __init bzimage_headroom(void *image_start, } int __init bzimage_parse(void *image_base, void **image_start, - unsigned long *image_len) + unsigned long *image_len, unsigned int *align) { struct setup_header *hdr = (struct setup_header *)(*image_start); int err = bzimage_check(hdr, *image_len); @@ -118,6 +118,8 @@ int __init bzimage_parse(void *image_base, void **image_start, { *image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset; *image_len = hdr->payload_length; + if ( align ) + *align = hdr->kernel_alignment; } if ( elf_is_elfbinary(*image_start, *image_len) ) diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 0ceda4140b..bbae8a5645 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -548,12 +548,14 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, struct elf_binary elf; struct elf_dom_parms parms; paddr_t last_addr; + unsigned int align = 0; struct hvm_start_info start_info = { 0 }; struct hvm_modlist_entry mod = { 0 }; struct vcpu *v = d->vcpu[0]; int rc; - if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 ) + rc = bzimage_parse(image_base, &image_start, &image_len, &align); + if ( rc != 0 ) { printk("Error trying to detect bz compressed kernel\n"); return rc; diff --git a/xen/arch/x86/include/asm/bzimage.h b/xen/arch/x86/include/asm/bzimage.h index 7ed69d3910..de4e9a446f 100644 --- a/xen/arch/x86/include/asm/bzimage.h +++ b/xen/arch/x86/include/asm/bzimage.h @@ -4,8 +4,7 @@ #include <xen/init.h> unsigned long bzimage_headroom(void *image_start, unsigned long image_length); - int bzimage_parse(void *image_base, void **image_start, - unsigned long *image_len); + unsigned long *image_len, unsigned int *align); #endif /* __X86_BZIMAGE_H__ */ diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c index d8043fa58a..e9fa8a9a82 100644 --- a/xen/arch/x86/pv/dom0_build.c +++ b/xen/arch/x86/pv/dom0_build.c @@ -416,7 +416,7 @@ int __init dom0_construct_pv(struct domain *d, d->max_pages = ~0U; - if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 ) + if ( (rc = bzimage_parse(image_base, &image_start, &image_len, NULL)) != 0 ) return rc; if ( (rc = elf_init(&elf, image_start, image_len)) != 0 ) -- 2.44.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] xen/x86: bzImage parse kernel_alignment 2024-03-06 18:50 ` [PATCH 2/3] xen/x86: bzImage parse kernel_alignment Jason Andryuk @ 2024-03-07 2:09 ` Stefano Stabellini 2024-03-07 8:26 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Stefano Stabellini @ 2024-03-07 2:09 UTC (permalink / raw) To: Jason Andryuk Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu On Wed, 6 Mar 2024, Jason Andryuk wrote: > Expand bzimage_parse() to return kernel_alignment from the setup_header. > This will be needed if loading a PVH kernel at a physical offset to > compensate for a reserved E820 region. > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > --- > xen/arch/x86/bzimage.c | 4 +++- > xen/arch/x86/hvm/dom0_build.c | 4 +++- > xen/arch/x86/include/asm/bzimage.h | 3 +-- > xen/arch/x86/pv/dom0_build.c | 2 +- > 4 files changed, 8 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c > index ac4fd428be..0f4cfc49f7 100644 > --- a/xen/arch/x86/bzimage.c > +++ b/xen/arch/x86/bzimage.c > @@ -105,7 +105,7 @@ unsigned long __init bzimage_headroom(void *image_start, > } > > int __init bzimage_parse(void *image_base, void **image_start, > - unsigned long *image_len) > + unsigned long *image_len, unsigned int *align) > { > struct setup_header *hdr = (struct setup_header *)(*image_start); > int err = bzimage_check(hdr, *image_len); > @@ -118,6 +118,8 @@ int __init bzimage_parse(void *image_base, void **image_start, > { > *image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset; > *image_len = hdr->payload_length; > + if ( align ) > + *align = hdr->kernel_alignment; > } > > if ( elf_is_elfbinary(*image_start, *image_len) ) > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c > index 0ceda4140b..bbae8a5645 100644 > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -548,12 +548,14 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, > struct elf_binary elf; > struct elf_dom_parms parms; > paddr_t last_addr; > + unsigned int align = 0; > struct hvm_start_info start_info = { 0 }; > struct hvm_modlist_entry mod = { 0 }; > struct vcpu *v = d->vcpu[0]; > int rc; > > - if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 ) > + rc = bzimage_parse(image_base, &image_start, &image_len, &align); > + if ( rc != 0 ) > { > printk("Error trying to detect bz compressed kernel\n"); > return rc; > diff --git a/xen/arch/x86/include/asm/bzimage.h b/xen/arch/x86/include/asm/bzimage.h > index 7ed69d3910..de4e9a446f 100644 > --- a/xen/arch/x86/include/asm/bzimage.h > +++ b/xen/arch/x86/include/asm/bzimage.h > @@ -4,8 +4,7 @@ > #include <xen/init.h> > > unsigned long bzimage_headroom(void *image_start, unsigned long image_length); > - > int bzimage_parse(void *image_base, void **image_start, > - unsigned long *image_len); > + unsigned long *image_len, unsigned int *align); > > #endif /* __X86_BZIMAGE_H__ */ > diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c > index d8043fa58a..e9fa8a9a82 100644 > --- a/xen/arch/x86/pv/dom0_build.c > +++ b/xen/arch/x86/pv/dom0_build.c > @@ -416,7 +416,7 @@ int __init dom0_construct_pv(struct domain *d, > > d->max_pages = ~0U; > > - if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 ) > + if ( (rc = bzimage_parse(image_base, &image_start, &image_len, NULL)) != 0 ) > return rc; > > if ( (rc = elf_init(&elf, image_start, image_len)) != 0 ) > -- > 2.44.0 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] xen/x86: bzImage parse kernel_alignment 2024-03-07 2:09 ` Stefano Stabellini @ 2024-03-07 8:26 ` Jan Beulich 2024-03-07 15:06 ` Jason Andryuk 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2024-03-07 8:26 UTC (permalink / raw) To: Stefano Stabellini Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu, Jason Andryuk On 07.03.2024 03:09, Stefano Stabellini wrote: > On Wed, 6 Mar 2024, Jason Andryuk wrote: >> Expand bzimage_parse() to return kernel_alignment from the setup_header. >> This will be needed if loading a PVH kernel at a physical offset to >> compensate for a reserved E820 region. >> >> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Jan Beulich <jbeulich@suse.com> with two remarks: >> --- a/xen/arch/x86/hvm/dom0_build.c >> +++ b/xen/arch/x86/hvm/dom0_build.c >> @@ -548,12 +548,14 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, >> struct elf_binary elf; >> struct elf_dom_parms parms; >> paddr_t last_addr; >> + unsigned int align = 0; Strictly speaking this isn't needed here, yet, and would suffice when added in the next patch. But I'm okay with keeping it. >> --- a/xen/arch/x86/include/asm/bzimage.h >> +++ b/xen/arch/x86/include/asm/bzimage.h >> @@ -4,8 +4,7 @@ >> #include <xen/init.h> >> >> unsigned long bzimage_headroom(void *image_start, unsigned long image_length); >> - >> int bzimage_parse(void *image_base, void **image_start, >> - unsigned long *image_len); >> + unsigned long *image_len, unsigned int *align); Any particular reason for dropping the blank line? I'd prefer if it was kept, and I may take the liberty to respectively adjust the patch while committing. Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/3] xen/x86: bzImage parse kernel_alignment 2024-03-07 8:26 ` Jan Beulich @ 2024-03-07 15:06 ` Jason Andryuk 0 siblings, 0 replies; 21+ messages in thread From: Jason Andryuk @ 2024-03-07 15:06 UTC (permalink / raw) To: Jan Beulich, Stefano Stabellini Cc: xen-devel, Andrew Cooper, Roger Pau Monné, Wei Liu On 2024-03-07 03:26, Jan Beulich wrote: > On 07.03.2024 03:09, Stefano Stabellini wrote: >> On Wed, 6 Mar 2024, Jason Andryuk wrote: >>> Expand bzimage_parse() to return kernel_alignment from the setup_header. >>> This will be needed if loading a PVH kernel at a physical offset to >>> compensate for a reserved E820 region. >>> >>> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> >> >> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > > Acked-by: Jan Beulich <jbeulich@suse.com> > with two remarks: > >>> --- a/xen/arch/x86/hvm/dom0_build.c >>> +++ b/xen/arch/x86/hvm/dom0_build.c >>> @@ -548,12 +548,14 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, >>> struct elf_binary elf; >>> struct elf_dom_parms parms; >>> paddr_t last_addr; >>> + unsigned int align = 0; > > Strictly speaking this isn't needed here, yet, and would suffice when added > in the next patch. But I'm okay with keeping it. > >>> --- a/xen/arch/x86/include/asm/bzimage.h >>> +++ b/xen/arch/x86/include/asm/bzimage.h >>> @@ -4,8 +4,7 @@ >>> #include <xen/init.h> >>> >>> unsigned long bzimage_headroom(void *image_start, unsigned long image_length); >>> - >>> int bzimage_parse(void *image_base, void **image_start, >>> - unsigned long *image_len); >>> + unsigned long *image_len, unsigned int *align); > > Any particular reason for dropping the blank line? I'd prefer if it was kept, > and I may take the liberty to respectively adjust the patch while committing. No, no particular reason. The blank line can be retained. Thanks, Jason ^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-06 18:50 [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk 2024-03-06 18:50 ` [PATCH 1/3] features.h: Replace hard tabs Jason Andryuk 2024-03-06 18:50 ` [PATCH 2/3] xen/x86: bzImage parse kernel_alignment Jason Andryuk @ 2024-03-06 18:50 ` Jason Andryuk 2024-03-07 2:09 ` Stefano Stabellini ` (2 more replies) 2024-03-06 19:31 ` [LINUX PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC Jason Andryuk 2024-03-07 10:00 ` [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Roger Pau Monné 4 siblings, 3 replies; 21+ messages in thread From: Jason Andryuk @ 2024-03-06 18:50 UTC (permalink / raw) To: xen-devel Cc: Jason Andryuk, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini Xen tries to load a PVH dom0 kernel at the fixed guest physical address from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but it can be configured. Unfortunately there exist firmwares that have reserved regions at this address, so Xen fails to load the dom0 kernel since it's not RAM. The PVH entry code is not relocatable - it loads from absolute addresses, which fail when the kernel is loaded at a different address. With a suitably modified kernel, a reloctable entry point is possible. Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it supports a relocatable entry path. Change the loading to check for an acceptable load address. If the kernel is relocatable, support finding an alternate load address. Linux cares about its physical alignment. This can be pulled out of the bzImage header, but not from the vmlinux ELF file. If an alignment can't be found, use 2MB. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> --- Put alignment as a new ELF note? Presence of that note would indicate relocation support without a new XENFEAT flag. Default alignment to a multiple of 2MB to make more cases work? It has to be a power of two, and a multiple might allow loading a customized kernel. A larger alignment would limit the number of possible load locations. --- xen/arch/x86/hvm/dom0_build.c | 109 ++++++++++++++++++++++++++++++++++ xen/include/public/features.h | 5 ++ 2 files changed, 114 insertions(+) diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index bbae8a5645..34d68ee8fb 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -537,6 +537,109 @@ static paddr_t __init find_memory( return INVALID_PADDR; } +static bool __init check_load_address( + const struct domain *d, const struct elf_binary *elf) +{ + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, + PAGE_SIZE); + unsigned int i; + + /* + * The memory map is sorted and all RAM regions starts and sizes are + * aligned to page boundaries. + */ + for ( i = 0; i < d->arch.nr_e820; i++ ) + { + paddr_t start = d->arch.e820[i].addr; + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; + + if ( start <= kernel_start && + end >= kernel_end && + d->arch.e820[i].type == E820_RAM ) + return true; + } + + return false; +} + +/* + * Find an e820 RAM region that fits the kernel at a suitable alignment. + */ +static paddr_t find_kernel_memory( + const struct domain *d, struct elf_binary *elf, paddr_t align) +{ + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, + PAGE_SIZE); + unsigned int i; + + /* + * The memory map is sorted and all RAM regions starts and sizes are + * aligned to page boundaries. + */ + for ( i = 0; i < d->arch.nr_e820; i++ ) + { + paddr_t start = d->arch.e820[i].addr; + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; + paddr_t kstart, kend, offset; + + if ( d->arch.e820[i].type != E820_RAM ) + continue; + + if ( d->arch.e820[i].size < elf->dest_size ) + continue; + + if ( end < kernel_end ) + continue; + + kstart = ROUNDUP(start, align); + offset = kstart - kernel_start; + kend = kernel_end + offset; + + if ( kend <= end ) + return offset; + } + + return 0; +} + +/* + * Check the kernel load address, and adjust if necessary and possible. + */ +static bool __init adjust_load_address( + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms, + paddr_t align) +{ + paddr_t offset; + + /* Check load address */ + if ( check_load_address(d, elf) ) + return true; + + if ( !test_bit(XENFEAT_pvh_relocatable, parms->f_supported) ) + { + printk("Address conflict and %pd kernel is not relocatable\n", d); + return false; + } + + if ( align == 0 ) + align = MB(2); + + offset = find_kernel_memory(d, elf, align); + if ( offset == 0 ) + { + printk("Failed find a load offset for the kernel\n"); + return false; + } + + printk("Adjusting load address by %#lx\n", offset); + elf->dest_base += offset; + parms->phys_entry += offset; + + return true; +} + static int __init pvh_load_kernel(struct domain *d, const module_t *image, unsigned long image_headroom, module_t *initrd, void *image_base, @@ -587,6 +690,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base); elf.dest_size = parms.virt_kend - parms.virt_kstart; + if ( !adjust_load_address(d, &elf, &parms, align) ) + { + printk("Unable to load kernel\n"); + return -ENOMEM; + } + elf_set_vcpu(&elf, v); rc = elf_load_binary(&elf); if ( rc < 0 ) diff --git a/xen/include/public/features.h b/xen/include/public/features.h index 4437f25d25..300480cb22 100644 --- a/xen/include/public/features.h +++ b/xen/include/public/features.h @@ -120,6 +120,11 @@ #define XENFEAT_runstate_phys_area 18 #define XENFEAT_vcpu_time_phys_area 19 +/* + * PVH: If set, the guest supports relocation in load address. + */ +#define XENFEAT_pvh_relocatable 20 + #define XENFEAT_NR_SUBMAPS 1 #endif /* __XEN_PUBLIC_FEATURES_H__ */ -- 2.44.0 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-06 18:50 ` [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk @ 2024-03-07 2:09 ` Stefano Stabellini 2024-03-07 16:07 ` Jason Andryuk 2024-03-07 9:30 ` Roger Pau Monné 2024-03-11 16:53 ` Jan Beulich 2 siblings, 1 reply; 21+ messages in thread From: Stefano Stabellini @ 2024-03-07 2:09 UTC (permalink / raw) To: Jason Andryuk Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini On Wed, 6 Mar 2024, Jason Andryuk wrote: > Xen tries to load a PVH dom0 kernel at the fixed guest physical address > from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but > it can be configured. > > Unfortunately there exist firmwares that have reserved regions at this > address, so Xen fails to load the dom0 kernel since it's not RAM. > > The PVH entry code is not relocatable - it loads from absolute > addresses, which fail when the kernel is loaded at a different address. > With a suitably modified kernel, a reloctable entry point is possible. > > Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it > supports a relocatable entry path. > > Change the loading to check for an acceptable load address. If the > kernel is relocatable, support finding an alternate load address. > > Linux cares about its physical alignment. This can be pulled out of the > bzImage header, but not from the vmlinux ELF file. If an alignment > can't be found, use 2MB. > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > --- > Put alignment as a new ELF note? Presence of that note would indicate > relocation support without a new XENFEAT flag. > > Default alignment to a multiple of 2MB to make more cases work? It has > to be a power of two, and a multiple might allow loading a customized > kernel. A larger alignment would limit the number of possible load > locations. > --- > xen/arch/x86/hvm/dom0_build.c | 109 ++++++++++++++++++++++++++++++++++ > xen/include/public/features.h | 5 ++ > 2 files changed, 114 insertions(+) > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c > index bbae8a5645..34d68ee8fb 100644 > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -537,6 +537,109 @@ static paddr_t __init find_memory( > return INVALID_PADDR; > } > > +static bool __init check_load_address( > + const struct domain *d, const struct elf_binary *elf) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; > + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, > + PAGE_SIZE); > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + > + if ( start <= kernel_start && > + end >= kernel_end && > + d->arch.e820[i].type == E820_RAM ) > + return true; > + } > + > + return false; > +} > + > +/* > + * Find an e820 RAM region that fits the kernel at a suitable alignment. > + */ > +static paddr_t find_kernel_memory( > + const struct domain *d, struct elf_binary *elf, paddr_t align) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; > + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, > + PAGE_SIZE); > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + paddr_t kstart, kend, offset; > + > + if ( d->arch.e820[i].type != E820_RAM ) > + continue; > + > + if ( d->arch.e820[i].size < elf->dest_size ) > + continue; > + > + if ( end < kernel_end ) > + continue; Why this check? Is it to make sure we look for e820 regions that are higher in terms of addresses? If so, couldn't we start from d->arch.nr_e820 and go down instead of starting from 0 and going up? The PVH entry point is a 32-bit entry point if I remember right? Do we need a 32-bit check? If so then it might not be a good idea to start from arch.nr_e820 and go down. > + kstart = ROUNDUP(start, align); > + offset = kstart - kernel_start; > + kend = kernel_end + offset; > + > + if ( kend <= end ) > + return offset; > + } > + > + return 0; > +} > + > +/* > + * Check the kernel load address, and adjust if necessary and possible. > + */ > +static bool __init adjust_load_address( > + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms, > + paddr_t align) > +{ > + paddr_t offset; > + > + /* Check load address */ > + if ( check_load_address(d, elf) ) > + return true; > + > + if ( !test_bit(XENFEAT_pvh_relocatable, parms->f_supported) ) > + { > + printk("Address conflict and %pd kernel is not relocatable\n", d); > + return false; > + } > + > + if ( align == 0 ) > + align = MB(2); > + > + offset = find_kernel_memory(d, elf, align); > + if ( offset == 0 ) > + { > + printk("Failed find a load offset for the kernel\n"); > + return false; > + } > + > + printk("Adjusting load address by %#lx\n", offset); > + elf->dest_base += offset; > + parms->phys_entry += offset; > + > + return true; > +} > + > static int __init pvh_load_kernel(struct domain *d, const module_t *image, > unsigned long image_headroom, > module_t *initrd, void *image_base, > @@ -587,6 +690,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, > elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base); > elf.dest_size = parms.virt_kend - parms.virt_kstart; > > + if ( !adjust_load_address(d, &elf, &parms, align) ) > + { > + printk("Unable to load kernel\n"); > + return -ENOMEM; > + } > + > elf_set_vcpu(&elf, v); > rc = elf_load_binary(&elf); > if ( rc < 0 ) > diff --git a/xen/include/public/features.h b/xen/include/public/features.h > index 4437f25d25..300480cb22 100644 > --- a/xen/include/public/features.h > +++ b/xen/include/public/features.h > @@ -120,6 +120,11 @@ > #define XENFEAT_runstate_phys_area 18 > #define XENFEAT_vcpu_time_phys_area 19 > > +/* > + * PVH: If set, the guest supports relocation in load address. > + */ > +#define XENFEAT_pvh_relocatable 20 > + > #define XENFEAT_NR_SUBMAPS 1 > > #endif /* __XEN_PUBLIC_FEATURES_H__ */ > -- > 2.44.0 > > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-07 2:09 ` Stefano Stabellini @ 2024-03-07 16:07 ` Jason Andryuk 0 siblings, 0 replies; 21+ messages in thread From: Jason Andryuk @ 2024-03-07 16:07 UTC (permalink / raw) To: Stefano Stabellini Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall On 2024-03-06 21:09, Stefano Stabellini wrote: > On Wed, 6 Mar 2024, Jason Andryuk wrote: >> Xen tries to load a PVH dom0 kernel at the fixed guest physical address >> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but >> it can be configured. >> >> Unfortunately there exist firmwares that have reserved regions at this >> address, so Xen fails to load the dom0 kernel since it's not RAM. >> >> The PVH entry code is not relocatable - it loads from absolute >> addresses, which fail when the kernel is loaded at a different address. >> With a suitably modified kernel, a reloctable entry point is possible. >> >> Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it >> supports a relocatable entry path. >> >> Change the loading to check for an acceptable load address. If the >> kernel is relocatable, support finding an alternate load address. >> >> Linux cares about its physical alignment. This can be pulled out of the >> bzImage header, but not from the vmlinux ELF file. If an alignment >> can't be found, use 2MB. >> >> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> >> --- >> Put alignment as a new ELF note? Presence of that note would indicate >> relocation support without a new XENFEAT flag. >> >> Default alignment to a multiple of 2MB to make more cases work? It has >> to be a power of two, and a multiple might allow loading a customized >> kernel. A larger alignment would limit the number of possible load >> locations. >> --- >> xen/arch/x86/hvm/dom0_build.c | 109 ++++++++++++++++++++++++++++++++++ >> xen/include/public/features.h | 5 ++ >> 2 files changed, 114 insertions(+) >> >> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c >> index bbae8a5645..34d68ee8fb 100644 >> --- a/xen/arch/x86/hvm/dom0_build.c >> +++ b/xen/arch/x86/hvm/dom0_build.c >> @@ -537,6 +537,109 @@ static paddr_t __init find_memory( >> return INVALID_PADDR; >> } >> >> +static bool __init check_load_address( >> + const struct domain *d, const struct elf_binary *elf) >> +{ >> + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; >> + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, >> + PAGE_SIZE); >> + unsigned int i; >> + >> + /* >> + * The memory map is sorted and all RAM regions starts and sizes are >> + * aligned to page boundaries. >> + */ >> + for ( i = 0; i < d->arch.nr_e820; i++ ) >> + { >> + paddr_t start = d->arch.e820[i].addr; >> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; >> + >> + if ( start <= kernel_start && >> + end >= kernel_end && >> + d->arch.e820[i].type == E820_RAM ) >> + return true; >> + } >> + >> + return false; >> +} >> + >> +/* >> + * Find an e820 RAM region that fits the kernel at a suitable alignment. >> + */ >> +static paddr_t find_kernel_memory( >> + const struct domain *d, struct elf_binary *elf, paddr_t align) >> +{ >> + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; >> + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, >> + PAGE_SIZE); >> + unsigned int i; >> + >> + /* >> + * The memory map is sorted and all RAM regions starts and sizes are >> + * aligned to page boundaries. >> + */ >> + for ( i = 0; i < d->arch.nr_e820; i++ ) >> + { >> + paddr_t start = d->arch.e820[i].addr; >> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; >> + paddr_t kstart, kend, offset; >> + >> + if ( d->arch.e820[i].type != E820_RAM ) >> + continue; >> + >> + if ( d->arch.e820[i].size < elf->dest_size ) >> + continue; >> + >> + if ( end < kernel_end ) >> + continue; > > Why this check? Is it to make sure we look for e820 regions that are > higher in terms of addresses? If so, couldn't we start from > d->arch.nr_e820 and go down instead of starting from 0 and going up? Yes, I thought we only wanted a higher address. The Linux bzImage entry code uses the LOAD_PHYSICAL_ADDR (CONFIG_PHYSICAL_START/elf->dest_base) as a minimum when extracting vmlinux for a relocatable kernel. I'm not sure if that is strictly required though. I also thought a smaller adjustment would be better, so starting from the lower e820 entries would find the first acceptable one. But that may not matter. > The PVH entry point is a 32-bit entry point if I remember right? Do we > need a 32-bit check? If so then it might not be a good idea to start > from arch.nr_e820 and go down. Yes, the entry point is 32-bit, so you are correct that that the range should to be limited at 4GB. Regards, Jason ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-06 18:50 ` [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk 2024-03-07 2:09 ` Stefano Stabellini @ 2024-03-07 9:30 ` Roger Pau Monné 2024-03-07 17:01 ` Jason Andryuk 2024-03-11 16:53 ` Jan Beulich 2 siblings, 1 reply; 21+ messages in thread From: Roger Pau Monné @ 2024-03-07 9:30 UTC (permalink / raw) To: Jason Andryuk Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini On Wed, Mar 06, 2024 at 01:50:32PM -0500, Jason Andryuk wrote: > Xen tries to load a PVH dom0 kernel at the fixed guest physical address > from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but > it can be configured. > > Unfortunately there exist firmwares that have reserved regions at this > address, so Xen fails to load the dom0 kernel since it's not RAM. > > The PVH entry code is not relocatable - it loads from absolute > addresses, which fail when the kernel is loaded at a different address. > With a suitably modified kernel, a reloctable entry point is possible. > > Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it > supports a relocatable entry path. > > Change the loading to check for an acceptable load address. If the > kernel is relocatable, support finding an alternate load address. > > Linux cares about its physical alignment. This can be pulled out of the > bzImage header, but not from the vmlinux ELF file. If an alignment > can't be found, use 2MB. While I'm fine with having a Linux specific way, there needs to be a generic way of passing the alignment for non-bzImage kernels. ELF program headers have an align field, would that be suitable to use? In elf_parse_binary() where the p{start,end} is calculated, you could also fetch the p_offset from the lower found program header and use it as the required alignment. Would that be OK for Linux and maybe avoid having to fiddle with the bzImage header? FWIW it is likely fine for FreeBSD. > > Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> I created a gitlab ticket for this: https://gitlab.com/xen-project/xen/-/issues/180 If you want to reference it. > --- > Put alignment as a new ELF note? Presence of that note would indicate > relocation support without a new XENFEAT flag. > > Default alignment to a multiple of 2MB to make more cases work? It has > to be a power of two, and a multiple might allow loading a customized > kernel. A larger alignment would limit the number of possible load > locations. > --- > xen/arch/x86/hvm/dom0_build.c | 109 ++++++++++++++++++++++++++++++++++ > xen/include/public/features.h | 5 ++ > 2 files changed, 114 insertions(+) > > diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c > index bbae8a5645..34d68ee8fb 100644 > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -537,6 +537,109 @@ static paddr_t __init find_memory( > return INVALID_PADDR; > } > > +static bool __init check_load_address( > + const struct domain *d, const struct elf_binary *elf) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; > + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, > + PAGE_SIZE); You can use PAGE_ALIGN() here (and below) for simplicity. > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; Since the memory map is sorted you can end the loop once end start >= kernel_end? As further regions are past the kernel destination. > + > + if ( start <= kernel_start && > + end >= kernel_end && > + d->arch.e820[i].type == E820_RAM ) > + return true; > + } > + > + return false; > +} > + > +/* > + * Find an e820 RAM region that fits the kernel at a suitable alignment. > + */ This (and other) comment seems to fit in a single line: /* ... */. > +static paddr_t find_kernel_memory( > + const struct domain *d, struct elf_binary *elf, paddr_t align) elf can be const AFAICT. > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; > + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, > + PAGE_SIZE); > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + paddr_t kstart, kend, offset; > + > + if ( d->arch.e820[i].type != E820_RAM ) > + continue; > + > + if ( d->arch.e820[i].size < elf->dest_size ) > + continue; > + > + if ( end < kernel_end ) > + continue; I'm not sure I understand this check, why would we refuse regions below the fixed kernel end? Those should be equally suitable if big enough and meeting the alignment requirements. > + > + kstart = ROUNDUP(start, align); > + offset = kstart - kernel_start; > + kend = kernel_end + offset; > + > + if ( kend <= end ) > + return offset; Why not return this as an address to use to load the kernel instead of an offset from dest_base? That would also make the calculations easier IMO. > + } This should be limited to a range below 4GB. > + return 0; > +} > + > +/* > + * Check the kernel load address, and adjust if necessary and possible. > + */ > +static bool __init adjust_load_address( > + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms, > + paddr_t align) > +{ > + paddr_t offset; > + > + /* Check load address */ > + if ( check_load_address(d, elf) ) > + return true; > + > + if ( !test_bit(XENFEAT_pvh_relocatable, parms->f_supported) ) > + { > + printk("Address conflict and %pd kernel is not relocatable\n", d); > + return false; > + } > + > + if ( align == 0 ) > + align = MB(2); > + > + offset = find_kernel_memory(d, elf, align); > + if ( offset == 0 ) > + { > + printk("Failed find a load offset for the kernel\n"); > + return false; > + } > + > + printk("Adjusting load address by %#lx\n", offset); I think this would be more helpful if the previous and the new ranges are printed, as I'm not sure the previous dest_base is printed, in which case the offset doesn't help much. I would do: if ( opt_dom0_verbose ) printk("relocating kernel from [%lx, %lx] -> [%lx, %lx]\n", ...); > + elf->dest_base += offset; > + parms->phys_entry += offset; As noted above, I think it would be better if find_kernel_memory() find an absolute address which is then adjusted here. > + > + return true; > +} > + > static int __init pvh_load_kernel(struct domain *d, const module_t *image, > unsigned long image_headroom, > module_t *initrd, void *image_base, > @@ -587,6 +690,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, > elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base); > elf.dest_size = parms.virt_kend - parms.virt_kstart; > > + if ( !adjust_load_address(d, &elf, &parms, align) ) check_and_adjust_? As the address is not unconditionally adjusted. Thanks, Roger. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-07 9:30 ` Roger Pau Monné @ 2024-03-07 17:01 ` Jason Andryuk 2024-03-08 6:34 ` Juergen Gross 0 siblings, 1 reply; 21+ messages in thread From: Jason Andryuk @ 2024-03-07 17:01 UTC (permalink / raw) To: Roger Pau Monné Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini On 2024-03-07 04:30, Roger Pau Monné wrote: > On Wed, Mar 06, 2024 at 01:50:32PM -0500, Jason Andryuk wrote: >> Xen tries to load a PVH dom0 kernel at the fixed guest physical address >> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but >> it can be configured. >> >> Unfortunately there exist firmwares that have reserved regions at this >> address, so Xen fails to load the dom0 kernel since it's not RAM. >> >> The PVH entry code is not relocatable - it loads from absolute >> addresses, which fail when the kernel is loaded at a different address. >> With a suitably modified kernel, a reloctable entry point is possible. >> >> Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it >> supports a relocatable entry path. >> >> Change the loading to check for an acceptable load address. If the >> kernel is relocatable, support finding an alternate load address. >> >> Linux cares about its physical alignment. This can be pulled out of the >> bzImage header, but not from the vmlinux ELF file. If an alignment >> can't be found, use 2MB. > > While I'm fine with having a Linux specific way, there needs to be a > generic way of passing the alignment for non-bzImage kernels. > > ELF program headers have an align field, would that be suitable to > use? Unfortunately, it doesn't seem correct. Linux has CONFIG_PHYSICAL_ALIGN, and it doesn't seem to be used in the elf headers. As a quick test, I set CONFIG_PHYSICAL_ALIGN=0x800000, but the elf align values are still 0x200000. > In elf_parse_binary() where the p{start,end} is calculated, you could > also fetch the p_offset from the lower found program header and use it > as the required alignment. Would that be OK for Linux and maybe avoid > having to fiddle with the bzImage header? FWIW it is likely fine for > FreeBSD. Adding an explicit value in an elf note would avoid any ambiguity. >> >> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> > > I created a gitlab ticket for this: > > https://gitlab.com/xen-project/xen/-/issues/180 > > If you want to reference it. Will do >> --- >> Put alignment as a new ELF note? Presence of that note would indicate >> relocation support without a new XENFEAT flag. >> >> Default alignment to a multiple of 2MB to make more cases work? It has >> to be a power of two, and a multiple might allow loading a customized >> kernel. A larger alignment would limit the number of possible load >> locations. >> --- >> xen/arch/x86/hvm/dom0_build.c | 109 ++++++++++++++++++++++++++++++++++ >> xen/include/public/features.h | 5 ++ >> 2 files changed, 114 insertions(+) >> >> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c >> index bbae8a5645..34d68ee8fb 100644 >> --- a/xen/arch/x86/hvm/dom0_build.c >> +++ b/xen/arch/x86/hvm/dom0_build.c >> @@ -537,6 +537,109 @@ static paddr_t __init find_memory( >> return INVALID_PADDR; >> } >> >> +static bool __init check_load_address( >> + const struct domain *d, const struct elf_binary *elf) >> +{ >> + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; >> + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, >> + PAGE_SIZE); > > You can use PAGE_ALIGN() here (and below) for simplicity. Ok >> + unsigned int i; >> + >> + /* >> + * The memory map is sorted and all RAM regions starts and sizes are >> + * aligned to page boundaries. >> + */ >> + for ( i = 0; i < d->arch.nr_e820; i++ ) >> + { >> + paddr_t start = d->arch.e820[i].addr; >> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > > Since the memory map is sorted you can end the loop once end start >= > kernel_end? As further regions are past the kernel destination. Yes, thanks. >> + >> + if ( start <= kernel_start && >> + end >= kernel_end && >> + d->arch.e820[i].type == E820_RAM ) >> + return true; >> + } >> + >> + return false; >> +} >> + >> +/* >> + * Find an e820 RAM region that fits the kernel at a suitable alignment. >> + */ > > This (and other) comment seems to fit in a single line: /* ... */. Ok >> +static paddr_t find_kernel_memory( >> + const struct domain *d, struct elf_binary *elf, paddr_t align) > > elf can be const AFAICT. Ok >> +{ >> + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; >> + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, >> + PAGE_SIZE); >> + unsigned int i; >> + >> + /* >> + * The memory map is sorted and all RAM regions starts and sizes are >> + * aligned to page boundaries. >> + */ >> + for ( i = 0; i < d->arch.nr_e820; i++ ) >> + { >> + paddr_t start = d->arch.e820[i].addr; >> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; >> + paddr_t kstart, kend, offset; >> + >> + if ( d->arch.e820[i].type != E820_RAM ) >> + continue; >> + >> + if ( d->arch.e820[i].size < elf->dest_size ) >> + continue; >> + >> + if ( end < kernel_end ) >> + continue; > > I'm not sure I understand this check, why would we refuse regions > below the fixed kernel end? Those should be equally suitable if big > enough and meeting the alignment requirements. I thought (Linux at least) wouldn't want to be moved lower - only higher in ram. There was some Linux boot code that gave me that impression, but it may not be strictly true. >> + >> + kstart = ROUNDUP(start, align); >> + offset = kstart - kernel_start; >> + kend = kernel_end + offset; >> + >> + if ( kend <= end ) >> + return offset; > > Why not return this as an address to use to load the kernel instead of > an offset from dest_base? That would also make the calculations > easier IMO. phys_entry needs to be updated as well as dest_base, so returning the offset seemed more useful. >> + } > > This should be limited to a range below 4GB. Yes. >> + return 0; >> +} >> + >> +/* >> + * Check the kernel load address, and adjust if necessary and possible. >> + */ >> +static bool __init adjust_load_address( >> + const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms, >> + paddr_t align) >> +{ >> + paddr_t offset; >> + >> + /* Check load address */ >> + if ( check_load_address(d, elf) ) >> + return true; >> + >> + if ( !test_bit(XENFEAT_pvh_relocatable, parms->f_supported) ) >> + { >> + printk("Address conflict and %pd kernel is not relocatable\n", d); >> + return false; >> + } >> + >> + if ( align == 0 ) >> + align = MB(2); >> + >> + offset = find_kernel_memory(d, elf, align); >> + if ( offset == 0 ) >> + { >> + printk("Failed find a load offset for the kernel\n"); >> + return false; >> + } >> + >> + printk("Adjusting load address by %#lx\n", offset); > > I think this would be more helpful if the previous and the new ranges > are printed, as I'm not sure the previous dest_base is printed, in > which case the offset doesn't help much. I would do: > > if ( opt_dom0_verbose ) > printk("relocating kernel from [%lx, %lx] -> [%lx, %lx]\n", ...); dest_base was printed (maybe with extra verbosity), but this message is clearer since it shows the end result. >> + elf->dest_base += offset; >> + parms->phys_entry += offset; > > As noted above, I think it would be better if find_kernel_memory() > find an absolute address which is then adjusted here. I thought the modification with just offset was clearer compared to: params->phys_entry += (reloc_base - elf->dest_base); elf->dest_base = reloc_base; But I'm fine making the change. >> + >> + return true; >> +} >> + >> static int __init pvh_load_kernel(struct domain *d, const module_t *image, >> unsigned long image_headroom, >> module_t *initrd, void *image_base, >> @@ -587,6 +690,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image, >> elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base); >> elf.dest_size = parms.virt_kend - parms.virt_kstart; >> >> + if ( !adjust_load_address(d, &elf, &parms, align) ) > > check_and_adjust_? As the address is not unconditionally adjusted. I thought it was a little too wordy, but it is more accurate. I'll rename it. Thanks for taking a look. -Jason ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-07 17:01 ` Jason Andryuk @ 2024-03-08 6:34 ` Juergen Gross 0 siblings, 0 replies; 21+ messages in thread From: Juergen Gross @ 2024-03-08 6:34 UTC (permalink / raw) To: Jason Andryuk, Roger Pau Monné Cc: xen-devel, Jan Beulich, Andrew Cooper, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini [-- Attachment #1.1.1: Type: text/plain, Size: 1967 bytes --] On 07.03.24 18:01, Jason Andryuk wrote: > On 2024-03-07 04:30, Roger Pau Monné wrote: >> On Wed, Mar 06, 2024 at 01:50:32PM -0500, Jason Andryuk wrote: >>> Xen tries to load a PVH dom0 kernel at the fixed guest physical address >>> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but >>> it can be configured. >>> >>> Unfortunately there exist firmwares that have reserved regions at this >>> address, so Xen fails to load the dom0 kernel since it's not RAM. >>> >>> The PVH entry code is not relocatable - it loads from absolute >>> addresses, which fail when the kernel is loaded at a different address. >>> With a suitably modified kernel, a reloctable entry point is possible. >>> >>> Add the XENFEAT_pvh_relocatable flag to let a kernel indicate that it >>> supports a relocatable entry path. >>> >>> Change the loading to check for an acceptable load address. If the >>> kernel is relocatable, support finding an alternate load address. >>> >>> Linux cares about its physical alignment. This can be pulled out of the >>> bzImage header, but not from the vmlinux ELF file. If an alignment >>> can't be found, use 2MB. >> >> While I'm fine with having a Linux specific way, there needs to be a >> generic way of passing the alignment for non-bzImage kernels. >> >> ELF program headers have an align field, would that be suitable to >> use? > > Unfortunately, it doesn't seem correct. Linux has CONFIG_PHYSICAL_ALIGN, and it > doesn't seem to be used in the elf headers. As a quick test, I set > CONFIG_PHYSICAL_ALIGN=0x800000, but the elf align values are still 0x200000. An excerpt from the kernel's arch/x86/Makefile: # # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to # the linker to force 2MB page size regardless of the default page size used # by the linker. # ifdef CONFIG_X86_64 LDFLAGS_vmlinux += -z max-page-size=0x200000 endif Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3743 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-06 18:50 ` [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk 2024-03-07 2:09 ` Stefano Stabellini 2024-03-07 9:30 ` Roger Pau Monné @ 2024-03-11 16:53 ` Jan Beulich 2024-03-11 19:53 ` Jason Andryuk 2 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2024-03-11 16:53 UTC (permalink / raw) To: Jason Andryuk Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, xen-devel On 06.03.2024 19:50, Jason Andryuk wrote: > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -537,6 +537,109 @@ static paddr_t __init find_memory( > return INVALID_PADDR; > } > > +static bool __init check_load_address( > + const struct domain *d, const struct elf_binary *elf) > +{ > + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; > + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, > + PAGE_SIZE); > + unsigned int i; > + > + /* > + * The memory map is sorted and all RAM regions starts and sizes are > + * aligned to page boundaries. > + */ > + for ( i = 0; i < d->arch.nr_e820; i++ ) > + { > + paddr_t start = d->arch.e820[i].addr; > + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; > + > + if ( start <= kernel_start && > + end >= kernel_end && > + d->arch.e820[i].type == E820_RAM ) > + return true; > + } > + > + return false; > +} > + > +/* > + * Find an e820 RAM region that fits the kernel at a suitable alignment. > + */ > +static paddr_t find_kernel_memory( __init ? Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels 2024-03-11 16:53 ` Jan Beulich @ 2024-03-11 19:53 ` Jason Andryuk 0 siblings, 0 replies; 21+ messages in thread From: Jason Andryuk @ 2024-03-11 19:53 UTC (permalink / raw) To: Jan Beulich Cc: Andrew Cooper, Roger Pau Monné, Wei Liu, George Dunlap, Julien Grall, Stefano Stabellini, xen-devel On 2024-03-11 12:53, Jan Beulich wrote: > On 06.03.2024 19:50, Jason Andryuk wrote: >> --- a/xen/arch/x86/hvm/dom0_build.c >> +++ b/xen/arch/x86/hvm/dom0_build.c >> @@ -537,6 +537,109 @@ static paddr_t __init find_memory( >> return INVALID_PADDR; >> } >> >> +static bool __init check_load_address( >> + const struct domain *d, const struct elf_binary *elf) >> +{ >> + paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK; >> + paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size, >> + PAGE_SIZE); >> + unsigned int i; >> + >> + /* >> + * The memory map is sorted and all RAM regions starts and sizes are >> + * aligned to page boundaries. >> + */ >> + for ( i = 0; i < d->arch.nr_e820; i++ ) >> + { >> + paddr_t start = d->arch.e820[i].addr; >> + paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size; >> + >> + if ( start <= kernel_start && >> + end >= kernel_end && >> + d->arch.e820[i].type == E820_RAM ) >> + return true; >> + } >> + >> + return false; >> +} >> + >> +/* >> + * Find an e820 RAM region that fits the kernel at a suitable alignment. >> + */ >> +static paddr_t find_kernel_memory( > > __init ? Yes, thanks for catching that. Regards, Jason ^ permalink raw reply [flat|nested] 21+ messages in thread
* [LINUX PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC 2024-03-06 18:50 [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk ` (2 preceding siblings ...) 2024-03-06 18:50 ` [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk @ 2024-03-06 19:31 ` Jason Andryuk 2024-03-07 10:00 ` [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Roger Pau Monné 4 siblings, 0 replies; 21+ messages in thread From: Jason Andryuk @ 2024-03-06 19:31 UTC (permalink / raw) To: xen-devel Cc: Jan Beulich, Roger Pau Monné, Andrew Cooper, Juergen Gross, Boris Ostrovsky, Jason Andryuk The Xen PVH entrypoint is 32bit non-PIC code running at a default load address of 0x1000000 (16MB) (CONFIG_PHYSICAL_START). Xen loads the kernel at that physical address inside the PVH container. When running a PVH Dom0, the system reserved addresses are mapped 1-1 into the PVH container. There exist system firmwares (Coreboot/EDK2) with reserved memory at 16MB. This creates a conflict where the PVH kernel cannot be loaded at that address. Modify the PVH entrypoint to be position-indepedent to allow flexibility in load address. Initial PVH entry runs at the physical addresses and then transitions to the identity mapped address. While executing xen_prepare_pvh() calls through pv_ops function pointers transition to the high mapped addresses. Additionally, __va() is called on some hvm_start_info physical addresses, we need the directmap address range is used. So we need to run page tables with all of those ranges mapped. Modifying init_top_pgt tables ran into issue since startup_64/__startup_64() will modify those page tables again. Use a dedicated set of page tables - pvh_init_top_pgt - for the PVH entry to avoid unwanted interactions. In xen_pvh_init(), __pa() is called to find the physical address of the hypercall page. Set phys_base temporarily before calling into xen_prepare_pvh(), which calls xen_pvh_init(), and clear it afterwards. __startup_64() assumes phys_base is zero and adds load_delta to it. If phys_base is already set, the calculation results in an incorrect cr3. TODO: The 32bit entry path needs additional work to have relocatable pagetables. TODO: Sync features.h from xen.git commit xxxxxxxxxx when it is commited. Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> --- Put this out as an example for the Xen modifications. Will split for actual submission. Instead of setting and clearing phys_base, add a dedicated variable? Make __startup_64() exit if phys_base is already set to allow calling multiple times, and use that and init_top_pgt instead of adding additional page tables? --- arch/x86/platform/pvh/head.S | 195 ++++++++++++++++++++++++++++--- arch/x86/xen/xen-head.S | 7 +- include/xen/interface/features.h | 5 + 3 files changed, 190 insertions(+), 17 deletions(-) diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S index f7235ef87bc3..bab857db55ef 100644 --- a/arch/x86/platform/pvh/head.S +++ b/arch/x86/platform/pvh/head.S @@ -50,11 +50,32 @@ #define PVH_CS_SEL (PVH_GDT_ENTRY_CS * 8) #define PVH_DS_SEL (PVH_GDT_ENTRY_DS * 8) +#define rva(x) ((x) - pvh_start_xen) + SYM_CODE_START_LOCAL(pvh_start_xen) UNWIND_HINT_END_OF_STACK cld - lgdt (_pa(gdt)) + /* + * See the comment for startup_32 for more details. We need to + * execute a call to get the execution address to be position + * independent, but we don't have a stack. Save and restore the + * magic field of start_info in ebx, and use that as the stack. + */ + mov (%ebx), %eax + leal 4(%ebx), %esp + ANNOTATE_INTRA_FUNCTION_CALL + call 1f +1: popl %ebp + mov %eax, (%ebx) + subl $ rva(1b), %ebp + movl $0, %esp + + leal rva(gdt)(%ebp), %eax + movl %eax, %ecx + leal rva(gdt_start)(%ebp), %ecx + movl %ecx, 2(%eax) + lgdt (%eax) mov $PVH_DS_SEL,%eax mov %eax,%ds @@ -62,14 +83,14 @@ SYM_CODE_START_LOCAL(pvh_start_xen) mov %eax,%ss /* Stash hvm_start_info. */ - mov $_pa(pvh_start_info), %edi + leal rva(pvh_start_info)(%ebp), %edi mov %ebx, %esi - mov _pa(pvh_start_info_sz), %ecx + movl rva(pvh_start_info_sz)(%ebp), %ecx shr $2,%ecx rep movsl - mov $_pa(early_stack_end), %esp + leal rva(early_stack_end)(%ebp), %esp /* Enable PAE mode. */ mov %cr4, %eax @@ -83,29 +104,83 @@ SYM_CODE_START_LOCAL(pvh_start_xen) btsl $_EFER_LME, %eax wrmsr + mov %ebp, %ebx + subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */ + jz .Lpagetable_done + + /* Fixup page-tables for relocation. */ + leal rva(pvh_init_top_pgt)(%ebp), %edi + movl $512, %ecx +2: + movl 0x00(%edi), %eax + addl 0x04(%edi), %eax + jz 1f + addl %ebx, 0x00(%edi) +1: + addl $8, %edi + decl %ecx + jnz 2b + + /* L3 ident has a single entry. */ + leal rva(pvh_level3_ident_pgt)(%ebp), %edi + addl %ebx, 0x00(%edi) + + leal rva(pvh_level3_kernel_pgt)(%ebp), %edi + addl %ebx, (4096 - 16)(%edi) + addl %ebx, (4096 - 8)(%edi) + + /* pvh_level2_ident_pgt is fine - large pages */ + + /* pvh_level2_kernel_pgt needs adjustment - large pages */ + leal rva(pvh_level2_kernel_pgt)(%ebp), %edi + movl $512, %ecx +2: + movl 0x00(%edi), %eax + addl 0x04(%edi), %eax + jz 1f + addl %ebx, 0x00(%edi) +1: + addl $8, %edi + decl %ecx + jnz 2b + +.Lpagetable_done: /* Enable pre-constructed page tables. */ - mov $_pa(init_top_pgt), %eax + leal rva(pvh_init_top_pgt)(%ebp), %eax mov %eax, %cr3 mov $(X86_CR0_PG | X86_CR0_PE), %eax mov %eax, %cr0 /* Jump to 64-bit mode. */ - ljmp $PVH_CS_SEL, $_pa(1f) + pushl $PVH_CS_SEL + leal rva(1f)(%ebp), %eax + pushl %eax + lretl /* 64-bit entry point. */ .code64 1: /* Set base address in stack canary descriptor. */ mov $MSR_GS_BASE,%ecx - mov $_pa(canary), %eax + leal rva(canary)(%ebp), %eax xor %edx, %edx wrmsr + /* Calculate load offset from LOAD_PHYSICAL_ADDR and store in + * phys_base. __pa() needs phys_base set to calculate the the + * hypercall page in xen_pvh_init(). */ + movq %rbp, %rbx + subq $LOAD_PHYSICAL_ADDR, %rbx + movq %rbx, phys_base(%rip) call xen_prepare_pvh + /* Clear phys_base. startup_64/__startup_64 will *add* to its value, + so start from 0. */ + xor %rbx, %rbx + movq %rbx, phys_base(%rip) /* startup_64 expects boot_params in %rsi. */ - mov $_pa(pvh_bootparams), %rsi - mov $_pa(startup_64), %rax + lea rva(pvh_bootparams)(%ebp), %rsi + lea rva(startup_64)(%ebp), %rax ANNOTATE_RETPOLINE_SAFE jmp *%rax @@ -113,20 +188,27 @@ SYM_CODE_START_LOCAL(pvh_start_xen) call mk_early_pgtbl_32 - mov $_pa(initial_page_table), %eax + leal rva(initial_page_table)(%ebp), %eax mov %eax, %cr3 mov %cr0, %eax or $(X86_CR0_PG | X86_CR0_PE), %eax mov %eax, %cr0 - ljmp $PVH_CS_SEL, $1f + pushl $PVH_CS_SEL + leal rva(1f)(%ebp), %eax + pushl %eax + lretl + 1: call xen_prepare_pvh - mov $_pa(pvh_bootparams), %esi + leal rva(pvh_bootparams)(%ebp), %esi /* startup_32 doesn't expect paging and PAE to be on. */ - ljmp $PVH_CS_SEL, $_pa(2f) + pushl $PVH_CS_SEL + leal rva(2f)(%ebp), %eax + pushl %eax + lretl 2: mov %cr0, %eax and $~X86_CR0_PG, %eax @@ -135,15 +217,19 @@ SYM_CODE_START_LOCAL(pvh_start_xen) and $~X86_CR4_PAE, %eax mov %eax, %cr4 - ljmp $PVH_CS_SEL, $_pa(startup_32) + pushl $PVH_CS_SEL + leal rva(startup_32)(%ebp), %eax + pushl %eax + lretl #endif + SYM_CODE_END(pvh_start_xen) .section ".init.data","aw" .balign 8 SYM_DATA_START_LOCAL(gdt) - .word gdt_end - gdt_start - .long _pa(gdt_start) + .word gdt_end - gdt_start - 1 + .long 0 .word 0 SYM_DATA_END(gdt) SYM_DATA_START_LOCAL(gdt_start) @@ -163,5 +249,82 @@ SYM_DATA_START_LOCAL(early_stack) .fill BOOT_STACK_SIZE, 1, 0 SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end) +/* + * We are not able to switch in one step to the final KERNEL ADDRESS SPACE + * because we need identity-mapped pages. + */ +#define l4_index(x) (((x) >> 39) & 511) +#define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) + +L4_PAGE_OFFSET = l4_index(__PAGE_OFFSET_BASE_L4) +L4_START_KERNEL = l4_index(__START_KERNEL_map) +L3_START_KERNEL = pud_index(__START_KERNEL_map) + +#define SYM_DATA_START_PAGE_ALIGNED(name) \ + SYM_START(name, SYM_L_GLOBAL, .balign PAGE_SIZE) + +/* Automate the creation of 1 to 1 mapping pmd entries */ +#define PMDS(START, PERM, COUNT) \ + i = 0 ; \ + .rept (COUNT) ; \ + .quad (START) + (i << PMD_SHIFT) + (PERM) ; \ + i = i + 1 ; \ + .endr + +/* + * Xen PVH needs a set of identity mapped and kernel high mapping + * page tables. pvh_start_xen starts running on the identity mapped + * page tables, but xen_prepare_pvh calls into the high mapping. + * These page tables are need to be relocatable and are only used until + * startup_64 transitions to init_top_pgt. + */ +SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt) + .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC + .org pvh_init_top_pgt + L4_PAGE_OFFSET*8, 0 + .quad pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC + .org pvh_init_top_pgt + L4_START_KERNEL*8, 0 + /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */ + .quad pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC +SYM_DATA_END(pvh_init_top_pgt) + +SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt) + .quad pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC + .fill 511, 8, 0 +SYM_DATA_END(pvh_level3_ident_pgt) +SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt) + /* + * Since I easily can, map the first 1G. + * Don't set NX because code runs from these pages. + * + * Note: This sets _PAGE_GLOBAL despite whether + * the CPU supports it or it is enabled. But, + * the CPU should ignore the bit. + */ + PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) +SYM_DATA_END(pvh_level2_ident_pgt) +SYM_DATA_START_PAGE_ALIGNED(pvh_level3_kernel_pgt) + .fill L3_START_KERNEL,8,0 + /* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */ + .quad pvh_level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC + .quad 0 /* no fixmap */ +SYM_DATA_END(pvh_level3_kernel_pgt) + +SYM_DATA_START_PAGE_ALIGNED(pvh_level2_kernel_pgt) + /* + * Kernel high mapping. + * + * The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in + * virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled, + * 512 MiB otherwise. + * + * (NOTE: after that starts the module area, see MODULES_VADDR.) + * + * This table is eventually used by the kernel during normal runtime. + * Care must be taken to clear out undesired bits later, like _PAGE_RW + * or _PAGE_GLOBAL in some cases. + */ + PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE) +SYM_DATA_END(pvh_level2_kernel_pgt) + ELFNOTE(Xen, XEN_ELFNOTE_PHYS32_ENTRY, _ASM_PTR (pvh_start_xen - __START_KERNEL_map)) diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index a0ea285878db..e6994aaa8cc3 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -106,7 +106,12 @@ SYM_CODE_END(xen_cpu_bringup_again) # define FEATURES_PV 0 #endif #ifdef CONFIG_XEN_PVH -# define FEATURES_PVH (1 << XENFEAT_linux_rsdp_unrestricted) +# ifdef CONFIG_RELOCATABLE +# define FEATURES_PVH (1 << XENFEAT_linux_rsdp_unrestricted) | \ + (1 << XENFEAT_pvh_relocatable) +# else +# define FEATURES_PVH (1 << XENFEAT_linux_rsdp_unrestricted) +# endif #else # define FEATURES_PVH 0 #endif diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h index 53f760378e39..92e7da9194c8 100644 --- a/include/xen/interface/features.h +++ b/include/xen/interface/features.h @@ -97,6 +97,11 @@ #define XENFEAT_not_direct_mapped 16 #define XENFEAT_direct_mapped 17 +/* + * PVH: If set, the guest supports being relocated in physical memory on entry. + */ +#define XENFEAT_pvh_relocatable 20 + #define XENFEAT_NR_SUBMAPS 1 #endif /* __XEN_PUBLIC_FEATURES_H__ */ -- 2.44.0 g ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3] x86/pvh: Support relocating dom0 kernel 2024-03-06 18:50 [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk ` (3 preceding siblings ...) 2024-03-06 19:31 ` [LINUX PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC Jason Andryuk @ 2024-03-07 10:00 ` Roger Pau Monné 2024-03-07 10:08 ` Jan Beulich 4 siblings, 1 reply; 21+ messages in thread From: Roger Pau Monné @ 2024-03-07 10:00 UTC (permalink / raw) To: Jason Andryuk Cc: xen-devel, Andrew Cooper, George Dunlap, Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu On Wed, Mar 06, 2024 at 01:50:29PM -0500, Jason Andryuk wrote: > Xen tries to load a PVH dom0 kernel at the fixed guest physical address > from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but > it can be configured. > > Unfortunately there exist firmwares that have reserved regions at this > address, so Xen fails to load the dom0 kernel since it's not RAM. > > The other issue is that the Linux PVH entry point is not > position-independent. It expects to run at the compiled > CONFIG_PHYSICAL_ADDRESS. > > This patch set expands the PVH dom0 builder to try to relocate the > kernel if needed and possible. XENFEAT_pvh_relocatable is added for > kernels to indicate they are relocatable. However, we may want to > switch to an additional ELF note with the kernel alignment. Linux > specifies a kernel alignment in the bzImage boot_params.setup_header, > but that is not present the ELF vmlinux file. I wonder whether we need a pair of notes, to signal the min/max addresses the kernel supports being relocated to. Thanks, Roger. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3] x86/pvh: Support relocating dom0 kernel 2024-03-07 10:00 ` [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Roger Pau Monné @ 2024-03-07 10:08 ` Jan Beulich 2024-03-07 10:20 ` Roger Pau Monné 0 siblings, 1 reply; 21+ messages in thread From: Jan Beulich @ 2024-03-07 10:08 UTC (permalink / raw) To: Roger Pau Monné, Jason Andryuk Cc: xen-devel, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On 07.03.2024 11:00, Roger Pau Monné wrote: > On Wed, Mar 06, 2024 at 01:50:29PM -0500, Jason Andryuk wrote: >> Xen tries to load a PVH dom0 kernel at the fixed guest physical address >> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but >> it can be configured. >> >> Unfortunately there exist firmwares that have reserved regions at this >> address, so Xen fails to load the dom0 kernel since it's not RAM. >> >> The other issue is that the Linux PVH entry point is not >> position-independent. It expects to run at the compiled >> CONFIG_PHYSICAL_ADDRESS. >> >> This patch set expands the PVH dom0 builder to try to relocate the >> kernel if needed and possible. XENFEAT_pvh_relocatable is added for >> kernels to indicate they are relocatable. However, we may want to >> switch to an additional ELF note with the kernel alignment. Linux >> specifies a kernel alignment in the bzImage boot_params.setup_header, >> but that is not present the ELF vmlinux file. > > I wonder whether we need a pair of notes, to signal the min/max > addresses the kernel supports being relocated to. Plus, as per your other reply, a 3rd one to specify alignment needs. Then again - a single note can have multiple values. So perhaps it doesn't need to be more than one new notes (except if dealing with multi-value ones is deemed too complicated). Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3] x86/pvh: Support relocating dom0 kernel 2024-03-07 10:08 ` Jan Beulich @ 2024-03-07 10:20 ` Roger Pau Monné 2024-03-07 17:33 ` Jason Andryuk 0 siblings, 1 reply; 21+ messages in thread From: Roger Pau Monné @ 2024-03-07 10:20 UTC (permalink / raw) To: Jan Beulich Cc: Jason Andryuk, xen-devel, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On Thu, Mar 07, 2024 at 11:08:37AM +0100, Jan Beulich wrote: > On 07.03.2024 11:00, Roger Pau Monné wrote: > > On Wed, Mar 06, 2024 at 01:50:29PM -0500, Jason Andryuk wrote: > >> Xen tries to load a PVH dom0 kernel at the fixed guest physical address > >> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but > >> it can be configured. > >> > >> Unfortunately there exist firmwares that have reserved regions at this > >> address, so Xen fails to load the dom0 kernel since it's not RAM. > >> > >> The other issue is that the Linux PVH entry point is not > >> position-independent. It expects to run at the compiled > >> CONFIG_PHYSICAL_ADDRESS. > >> > >> This patch set expands the PVH dom0 builder to try to relocate the > >> kernel if needed and possible. XENFEAT_pvh_relocatable is added for > >> kernels to indicate they are relocatable. However, we may want to > >> switch to an additional ELF note with the kernel alignment. Linux > >> specifies a kernel alignment in the bzImage boot_params.setup_header, > >> but that is not present the ELF vmlinux file. > > > > I wonder whether we need a pair of notes, to signal the min/max > > addresses the kernel supports being relocated to. > > Plus, as per your other reply, a 3rd one to specify alignment needs. Alignment we could in theory get from the ELF program header, if OSes fill those reliably. FreeBSD seems to do so, haven't checked Linux. > Then again - a single note can have multiple values. So perhaps it > doesn't need to be more than one new notes (except if dealing with > multi-value ones is deemed too complicated). I've never dealt with a multi-value note, if that's not overly complicated I would be fine with it, otherwise multiple notes are OK IMO. Thanks, Roger. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3] x86/pvh: Support relocating dom0 kernel 2024-03-07 10:20 ` Roger Pau Monné @ 2024-03-07 17:33 ` Jason Andryuk 2024-03-08 7:03 ` Jan Beulich 0 siblings, 1 reply; 21+ messages in thread From: Jason Andryuk @ 2024-03-07 17:33 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich Cc: xen-devel, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu On 2024-03-07 05:20, Roger Pau Monné wrote: > On Thu, Mar 07, 2024 at 11:08:37AM +0100, Jan Beulich wrote: >> On 07.03.2024 11:00, Roger Pau Monné wrote: >>> On Wed, Mar 06, 2024 at 01:50:29PM -0500, Jason Andryuk wrote: >>>> Xen tries to load a PVH dom0 kernel at the fixed guest physical address >>>> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but >>>> it can be configured. >>>> >>>> Unfortunately there exist firmwares that have reserved regions at this >>>> address, so Xen fails to load the dom0 kernel since it's not RAM. >>>> >>>> The other issue is that the Linux PVH entry point is not >>>> position-independent. It expects to run at the compiled >>>> CONFIG_PHYSICAL_ADDRESS. >>>> >>>> This patch set expands the PVH dom0 builder to try to relocate the >>>> kernel if needed and possible. XENFEAT_pvh_relocatable is added for >>>> kernels to indicate they are relocatable. However, we may want to >>>> switch to an additional ELF note with the kernel alignment. Linux >>>> specifies a kernel alignment in the bzImage boot_params.setup_header, >>>> but that is not present the ELF vmlinux file. >>> >>> I wonder whether we need a pair of notes, to signal the min/max >>> addresses the kernel supports being relocated to. >> >> Plus, as per your other reply, a 3rd one to specify alignment needs. > > Alignment we could in theory get from the ELF program header, if OSes > fill those reliably. FreeBSD seems to do so, haven't checked Linux. I will look into this more, but at first glance, I don't see a connection between Linux's CONFIG_PHYSICAL_ALIGN and the values in the elf headers. As a quick test, I set CONFIG_PHYSICAL_ALIGN=0x800000, but the elf align values are still 0x200000. The elf header values may be a suitable fallback though? >> Then again - a single note can have multiple values. So perhaps it >> doesn't need to be more than one new notes (except if dealing with >> multi-value ones is deemed too complicated). > > I've never dealt with a multi-value note, if that's not overly > complicated I would be fine with it, otherwise multiple notes are OK > IMO. It looks like a single note can be followed by arbitrary data, so putting three values in should be fine. PVH is defined with a 32bit entry point, so are 32bit min, max & align values acceptable, or should 64bit be specified? Linux uses _ASM_PTR for PHYS32_ENTRY so its size matches the bitness of the target (32 on 32/ 64 on 64). I guess I'll follow that unless I hear a preference otherwise. Regards, Jason ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 0/3] x86/pvh: Support relocating dom0 kernel 2024-03-07 17:33 ` Jason Andryuk @ 2024-03-08 7:03 ` Jan Beulich 0 siblings, 0 replies; 21+ messages in thread From: Jan Beulich @ 2024-03-08 7:03 UTC (permalink / raw) To: Jason Andryuk Cc: xen-devel, Andrew Cooper, George Dunlap, Julien Grall, Stefano Stabellini, Wei Liu, Roger Pau Monné On 07.03.2024 18:33, Jason Andryuk wrote: > On 2024-03-07 05:20, Roger Pau Monné wrote: >> On Thu, Mar 07, 2024 at 11:08:37AM +0100, Jan Beulich wrote: >>> On 07.03.2024 11:00, Roger Pau Monné wrote: >>>> On Wed, Mar 06, 2024 at 01:50:29PM -0500, Jason Andryuk wrote: >>>>> Xen tries to load a PVH dom0 kernel at the fixed guest physical address >>>>> from the elf headers. For Linux, this defaults to 0x1000000 (16MB), but >>>>> it can be configured. >>>>> >>>>> Unfortunately there exist firmwares that have reserved regions at this >>>>> address, so Xen fails to load the dom0 kernel since it's not RAM. >>>>> >>>>> The other issue is that the Linux PVH entry point is not >>>>> position-independent. It expects to run at the compiled >>>>> CONFIG_PHYSICAL_ADDRESS. >>>>> >>>>> This patch set expands the PVH dom0 builder to try to relocate the >>>>> kernel if needed and possible. XENFEAT_pvh_relocatable is added for >>>>> kernels to indicate they are relocatable. However, we may want to >>>>> switch to an additional ELF note with the kernel alignment. Linux >>>>> specifies a kernel alignment in the bzImage boot_params.setup_header, >>>>> but that is not present the ELF vmlinux file. >>>> >>>> I wonder whether we need a pair of notes, to signal the min/max >>>> addresses the kernel supports being relocated to. >>> >>> Plus, as per your other reply, a 3rd one to specify alignment needs. >> >> Alignment we could in theory get from the ELF program header, if OSes >> fill those reliably. FreeBSD seems to do so, haven't checked Linux. > > I will look into this more, but at first glance, I don't see a > connection between Linux's CONFIG_PHYSICAL_ALIGN and the values in the > elf headers. As a quick test, I set CONFIG_PHYSICAL_ALIGN=0x800000, but > the elf align values are still 0x200000. > > The elf header values may be a suitable fallback though? Imo, given the above, explicit values should be required. Better not load a kernel than doing so and then getting hard to debug crashes. Jan ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-03-11 19:57 UTC | newest] Thread overview: 21+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-06 18:50 [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk 2024-03-06 18:50 ` [PATCH 1/3] features.h: Replace hard tabs Jason Andryuk 2024-03-06 20:41 ` Stefano Stabellini 2024-03-06 18:50 ` [PATCH 2/3] xen/x86: bzImage parse kernel_alignment Jason Andryuk 2024-03-07 2:09 ` Stefano Stabellini 2024-03-07 8:26 ` Jan Beulich 2024-03-07 15:06 ` Jason Andryuk 2024-03-06 18:50 ` [PATCH 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk 2024-03-07 2:09 ` Stefano Stabellini 2024-03-07 16:07 ` Jason Andryuk 2024-03-07 9:30 ` Roger Pau Monné 2024-03-07 17:01 ` Jason Andryuk 2024-03-08 6:34 ` Juergen Gross 2024-03-11 16:53 ` Jan Beulich 2024-03-11 19:53 ` Jason Andryuk 2024-03-06 19:31 ` [LINUX PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC Jason Andryuk 2024-03-07 10:00 ` [PATCH 0/3] x86/pvh: Support relocating dom0 kernel Roger Pau Monné 2024-03-07 10:08 ` Jan Beulich 2024-03-07 10:20 ` Roger Pau Monné 2024-03-07 17:33 ` Jason Andryuk 2024-03-08 7:03 ` Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.