Re: [PATCH qemu] spapr: Use address from elf parser for kernel address

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Fabiano Rosas <farosas@linux.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org
Cc: qemu-devel@nongnu.org, David Gibson <david@gibson.dropbear.id.au>,
	Daniel Henrique Barboza <danielhb413@gmail.com>
Subject: Re: [PATCH qemu] spapr: Use address from elf parser for kernel address
Date: Thu, 05 May 2022 12:50:53 -0300	[thread overview]
Message-ID: <87bkwc9dwi.fsf@linux.ibm.com> (raw)
In-Reply-To: <365ad8a7-bfa4-45d5-0fb4-46e295ea75b8@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 5/5/22 05:16, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>
>>> QEMU loads the kernel at 0x400000 by default which works most of
>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>> (position independent code). This works for a little endian zImage too.
>>>
>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>> 0x4000000 so current QEMU ends up loading it at
>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>
>>> This uses the kernel address returned from load_elf().
>>> If the default kernel_addr is used, there is no change in behavior (as
>>> translate_kernel_address() takes care of this), which is:
>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>> prints a warning and BE zImage boots.
>> 
>> I think we can fix this without needing a different command line for BE
>> zImage (apart from x-vof, which is a separate matter).
>> 
>> If you look at translate_kernel_address, it cannot really work when the
>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>> so if we fix that function like this...
>> 
>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>> {
>>      SpaprMachineState *spapr = opaque;
>> 
>>      return addr ? addr : spapr->kernel_addr;
>> }
>
>
> The qemu elf loader is supposed to handle relocations which should be 
> calling this hook more than once, now I wonder why it is not doing so.

For relocations, it seems to only call translate_fn for s390x.

>> ...then we could always use the ELF PhysAddr if it is different from 0
>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>> over the firmware (we have code to detect that).
>
>
> ELF address is 0 for LE zImage only, vmlinux BE/LE uses 
> 0xc000000000000000. And we are already chopping all these tops bits off 
> in translate_kernel_address() and I do not really know why _exactly_ it 
> is 0x0fffffff and not let's say 0x7fffffff.

Oh, I am not talking about the ELF entry point. And that is not what
load_elf passes to translate_kernel_address either. It is the ELF
PhysAddr:

$ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000

So it is 0x400000 for BE zImage and 0 for vmlinux.

>> 
>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>               exit(1);
>>>           }
>>>   
>>> +        if (spapr->kernel_addr != loaded_addr) {
>> 
>> This could be:
>> 
>>          if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>> 	    spapr->kernel_addr != loaded_addr) {
>> 
>> So the precedence would be:
>> 
>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>     falls here;
>>      
>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>     here;
>> 
>> 3- kernel_addr. The user is probably hacking something, just use what
>>     they gave us. QEMU will yell if they load the kernel over the fw.
>
>
> imho too complicated.
>
> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is 
> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?

Good point. It should always be 3. It is a bad user interface to have a
command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.

> I am basically fixing a bug when we ignore where load_elf() loaded the 
> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the 
> ELF was loaded where we want it to be.

That bug is already accounted for, that is why we have
translate_kernel_address:

  /* address_offset is hack for kernel images that are
     linked at the wrong physical address.  */
  if (translate_fn) {
      addr = translate_fn(translate_opaque, ph->p_paddr);

So we're actively telling load_elf to load the kernel at the wrong place
when we do:

(ph->p_addr & 0x0fffffff) + spapr->kernel_addr

It just happened to work so far because the vmlinux PhysAddr is 0.

When you set kernel-addr=0 you are simply working around this other bug
because:

(ph->p_addr & 0x0fffffff) + 0 == ph->p_addr

I just want to remove this indirection and use the ELF PhysAddr
directly.

> Everything else can be done but on top of this.

If you want to merge this I could send another patch on top of it later,
no worries. I realise that this a larger change that will need more
testing.

>>> +            warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx",
>>> +                        spapr->kernel_addr, loaded_addr);
>>> +            spapr->kernel_addr = loaded_addr;
>>> +        }
>>> +
>>>           /* load initrd */
>>>           if (initrd_filename) {
>>>               /* Try to locate the initrd in the gap between the kernel

next prev parent reply	other threads:[~2022-05-05 15:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04  6:55 [PATCH qemu] spapr: Use address from elf parser for kernel address Alexey Kardashevskiy
2022-05-04 19:16 ` Fabiano Rosas
2022-05-05  3:29   ` Alexey Kardashevskiy
2022-05-05  4:16     ` Joel Stanley
2022-05-05  5:07       ` Alexey Kardashevskiy
2022-05-05 15:50     ` Fabiano Rosas [this message]
2022-05-06  4:49       ` Alexey Kardashevskiy
2022-05-11 17:47         ` Fabiano Rosas
2022-05-12 17:47 ` Daniel Henrique Barboza
2022-05-17 18:58 ` Daniel Henrique Barboza
2022-05-18  2:51   ` Alexey Kardashevskiy
2022-05-18 10:07     ` Daniel Henrique Barboza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bkwc9dwi.fsf@linux.ibm.com \
    --to=farosas@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).