Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jason Andryuk <jason.andryuk@amd.com>
Cc: xen-devel@lists.xenproject.org, Jan Beulich <jbeulich@suse.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>, Wei Liu <wl@xen.org>,
	George Dunlap <george.dunlap@citrix.com>,
	Julien Grall <julien@xen.org>,
	Stefano Stabellini <sstabellini@kernel.org>
Subject: Re: [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels
Date: Thu, 14 Mar 2024 10:48:52 +0100	[thread overview]
Message-ID: <ZfLIBHTbcbGqFAhY@macbook> (raw)
In-Reply-To: <20240313193021.241764-4-jason.andryuk@amd.com>

On Wed, Mar 13, 2024 at 03:30:21PM -0400, Jason Andryuk wrote:
> Xen tries to load a PVH dom0 kernel at the fixed guest physical address
> from the elf headers.  For Linux, this defaults to 0x1000000 (16MB), but
> it can be configured.
> 
> Unfortunately there exist firmwares that have reserved regions at this
> address, so Xen fails to load the dom0 kernel since it's not RAM.
> 
> The PVH entry code is not relocatable - it loads from absolute
> addresses, which fail when the kernel is loaded at a different address.
> With a suitably modified kernel, a reloctable entry point is possible.
> 
> Add XEN_ELFNOTE_PVH_RELOCATION which specifies the minimum, maximum and
> alignment needed for the kernel.  The presence of the NOTE indicates the
> kernel supports a relocatable entry path.
> 
> Change the loading to check for an acceptable load address.  If the
> kernel is relocatable, support finding an alternate load address.
> 
> Link: https://gitlab.com/xen-project/xen/-/issues/180
> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
> ---
> ELF Note printing looks like:
> (XEN) ELF: note: PVH_RELOCATION = min: 0x1000000 max: 0xffffffff align: 0x200000
> 
> v2:
> Use elfnote for min, max & align - use 64bit values.
> Print original and relocated memory addresses
> Use check_and_adjust_load_address() name
> Return relocated base instead of offset
> Use PAGE_ALIGN
> Don't load above max_phys (expected to be 4GB in kernel elf note)
> Use single line comments
> Exit check_load_address loop earlier
> Add __init to find_kernel_memory()
> ---
>  xen/arch/x86/hvm/dom0_build.c      | 108 +++++++++++++++++++++++++++++
>  xen/common/libelf/libelf-dominfo.c |  13 ++++
>  xen/include/public/elfnote.h       |  11 +++
>  xen/include/xen/libelf.h           |   3 +
>  4 files changed, 135 insertions(+)
> 
> diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
> index 0ceda4140b..5c6c0d2db3 100644
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -537,6 +537,108 @@ static paddr_t __init find_memory(
>      return INVALID_PADDR;
>  }
>  
> +static bool __init check_load_address(
> +    const struct domain *d, const struct elf_binary *elf)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;

Are you sure this is correct?  If a program header specifies a non-4K
aligned load address we should still try to honor it.  I think this is
very unlikely, but still we shouldn't apply non-requested alignments
to addresses coming from the ELF headers.

> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> +    unsigned int i;
> +
> +    /*
> +     * The memory map is sorted and all RAM regions starts and sizes are
> +     * aligned to page boundaries.

Relying on sizes to be page aligned seems fragile: it might work now
because of the order in which pvh_setup_vmx_realmode_helpers() first
reserves memory for the TSS and afterwards for the identity page
tables, but it's not a property this code should assume.

> +     */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> +
> +        if ( start >= kernel_end )
> +            return false;
> +
> +        if ( start <= kernel_start &&
> +             end >= kernel_end &&
> +             d->arch.e820[i].type == E820_RAM )

Nit: I would maybe do the type check before the boundary ones, as it's
pointless to do boundary checking on a region of a non-RAM type.  But
I guess you could also see it the other way around.

> +            return true;
> +    }
> +
> +    return false;
> +}
> +
> +/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
> +static paddr_t __init find_kernel_memory(
> +    const struct domain *d, struct elf_binary *elf,

const for elf also.

> +    const struct elf_dom_parms *parms)
> +{
> +    paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
> +    paddr_t kernel_end = PAGE_ALIGN((paddr_t)elf->dest_base + elf->dest_size);
> +    paddr_t kernel_size = kernel_end - kernel_start;

Hm, I'm again unsure about the alignments applied here.

I think if anything we want to assert that dest_base is aligned to phys_align.

> +    unsigned int i;
> +
> +    /*
> +     * The memory map is sorted and all RAM regions starts and sizes are
> +     * aligned to page boundaries.
> +     */
> +    for ( i = 0; i < d->arch.nr_e820; i++ )
> +    {
> +        paddr_t start = d->arch.e820[i].addr;
> +        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
> +        paddr_t kstart, kend;
> +
> +        if ( d->arch.e820[i].type != E820_RAM )
> +            continue;
> +
> +        if ( d->arch.e820[i].size < kernel_size )
> +            continue;

You can unify both checks in a single condition.

> +
> +        kstart = ROUNDUP(start, parms->phys_align);
> +        kstart = kstart < parms->phys_min ? parms->phys_min : kstart;
> +        kend = kstart + kernel_size;
> +
> +        if ( kend > parms->phys_max )
> +            return 0;
> +
> +        if ( kend <= end )
> +            return kstart;
> +    }
> +
> +    return 0;
> +}
> +
> +/* Check the kernel load address, and adjust if necessary and possible. */
> +static bool __init check_and_adjust_load_address(
> +    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
> +{
> +    paddr_t reloc_base;
> +
> +    if ( check_load_address(d, elf) )
> +        return true;
> +
> +    if ( parms->phys_align == UNSET_ADDR )
> +    {
> +        printk("Address conflict and %pd kernel is not relocatable\n", d);
> +        return false;
> +    }
> +
> +    reloc_base = find_kernel_memory(d, elf, parms);
> +    if ( reloc_base == 0 )
> +    {
> +        printk("Failed find a load address for the kernel\n");

Since you print the domain in the error message prior to this one, I
would consider also printing it here (or not printing it in both
cases).

> +        return false;
> +    }
> +
> +    if ( opt_dom0_verbose )
> +        printk("Relocating kernel from [%lx, %lx] -> [%lx, %lx]\n",
> +               (paddr_t)elf->dest_base,
> +               (paddr_t)elf->dest_base + elf->dest_size,
> +               reloc_base, reloc_base + elf->dest_size);

I think you need `- 1` for the end calculation if you use inclusive
ranges [, ].  Otherwise [, ) should be used.

> +
> +    parms->phys_entry += (reloc_base - (paddr_t)elf->dest_base);

This seems to assume that the image is always relocated at a higher
address that the original one?

parms->phys_entry = reloc_base + (parms->phys_entry - elf->dest_base);

> +    elf->dest_base = (char *)reloc_base;
> +
> +    return true;
> +}
> +
>  static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>                                    unsigned long image_headroom,
>                                    module_t *initrd, void *image_base,
> @@ -585,6 +687,12 @@ static int __init pvh_load_kernel(struct domain *d, const module_t *image,
>      elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
>      elf.dest_size = parms.virt_kend - parms.virt_kstart;
>  
> +    if ( !check_and_adjust_load_address(d, &elf, &parms) )
> +    {
> +        printk("Unable to load kernel\n");

check_and_adjust_load_address() already prints an error message,
probably no need to print another message.

> +        return -ENOMEM;
> +    }
> +
>      elf_set_vcpu(&elf, v);
>      rc = elf_load_binary(&elf);
>      if ( rc < 0 )
> diff --git a/xen/common/libelf/libelf-dominfo.c b/xen/common/libelf/libelf-dominfo.c
> index 7cc7b18a51..837a1b0f21 100644
> --- a/xen/common/libelf/libelf-dominfo.c
> +++ b/xen/common/libelf/libelf-dominfo.c
> @@ -125,6 +125,7 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>          [XEN_ELFNOTE_SUSPEND_CANCEL] = { "SUSPEND_CANCEL", ELFNOTE_INT },
>          [XEN_ELFNOTE_MOD_START_PFN] = { "MOD_START_PFN", ELFNOTE_INT },
>          [XEN_ELFNOTE_PHYS32_ENTRY] = { "PHYS32_ENTRY", ELFNOTE_INT },
> +        [XEN_ELFNOTE_PVH_RELOCATION] = { "PVH_RELOCATION", ELFNOTE_OTHER },
>      };
>  /* *INDENT-ON* */
>  
> @@ -234,6 +235,17 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>                  elf_note_numeric_array(elf, note, 8, 0),
>                  elf_note_numeric_array(elf, note, 8, 1));
>          break;
> +
> +    case XEN_ELFNOTE_PVH_RELOCATION:
> +        if ( elf_uval(elf, note, descsz) != 3 * sizeof(uint64_t) )
> +            return -1;
> +
> +        parms->phys_min = elf_note_numeric_array(elf, note, 8, 0);
> +        parms->phys_max = elf_note_numeric_array(elf, note, 8, 1);
> +        parms->phys_align = elf_note_numeric_array(elf, note, 8, 2);

Size for those needs to be 4 (32bits) as the entry point is in 32bit
mode?  I don't see how we can start past the 4GB boundary.

> +        elf_msg(elf, "min: %#"PRIx64" max: %#"PRIx64" align: %#"PRIx64"\n",
> +                parms->phys_min, parms->phys_max, parms->phys_align);
> +        break;
>      }
>      return 0;
>  }
> @@ -545,6 +557,7 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
>      parms->p2m_base = UNSET_ADDR;
>      parms->elf_paddr_offset = UNSET_ADDR;
>      parms->phys_entry = UNSET_ADDR32;
> +    parms->phys_align = UNSET_ADDR;

For correctness I would also init phys_{min,max}.

Thanks, Roger.

next prev parent reply	other threads:[~2024-03-14  9:49 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13 19:30 [PATCH v2 0/3] x86/pvh: Support relocating dom0 kernel Jason Andryuk
2024-03-13 19:30 ` [PATCH v2 1/3] Revert "xen/x86: bzImage parse kernel_alignment" Jason Andryuk
2024-03-14  7:11   ` Jan Beulich
2024-03-14 13:01     ` Jason Andryuk
2024-03-14 13:33       ` Jan Beulich
2024-03-13 19:30 ` [PATCH v2 2/3] libelf: Expand ELF note printing Jason Andryuk
2024-03-14 13:16   ` Jan Beulich
2024-03-14 20:36     ` Jason Andryuk
2024-03-13 19:30 ` [PATCH v2 3/3] x86/PVH: Support relocatable dom0 kernels Jason Andryuk
2024-03-13 21:02   ` Jason Andryuk
2024-03-14  7:12     ` Jan Beulich
2024-03-14 12:46       ` Jason Andryuk
2024-03-14  9:48   ` Roger Pau Monné [this message]
2024-03-14 13:51     ` Jason Andryuk
2024-03-14 14:33       ` Roger Pau Monné
2024-03-14 15:30         ` Jan Beulich
2024-03-14 16:48           ` Roger Pau Monné
2024-03-14 16:59           ` Jason Andryuk
2024-03-14 17:02             ` Jan Beulich
2024-03-15  8:45             ` Roger Pau Monné
2024-03-14 13:21   ` Jan Beulich
2024-03-14 14:13     ` Jason Andryuk
2024-03-14 14:19       ` Jan Beulich
2024-03-18 21:19         ` Jason Andryuk
2024-03-19  8:11           ` Jan Beulich
2024-03-14 13:31   ` Jan Beulich
2024-03-14 19:19     ` Jason Andryuk
2024-03-15  9:48       ` Jan Beulich
2024-03-18 21:21         ` Jason Andryuk
2024-03-19  8:15           ` Jan Beulich
2024-03-19 13:50             ` Jason Andryuk
2024-03-13 19:46 ` [PATCH] RFC: x86/pvh: Make Xen PVH entrypoint PIC for x86-64 Jason Andryuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfLIBHTbcbGqFAhY@macbook \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=george.dunlap@citrix.com \
    --cc=jason.andryuk@amd.com \
    --cc=jbeulich@suse.com \
    --cc=julien@xen.org \
    --cc=sstabellini@kernel.org \
    --cc=wl@xen.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.