All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@linux.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, qemu-ppc@nongnu.org
Cc: qemu-devel@nongnu.org, David Gibson <david@gibson.dropbear.id.au>,
	Daniel Henrique Barboza <danielhb413@gmail.com>
Subject: Re: [PATCH qemu] spapr: Use address from elf parser for kernel address
Date: Wed, 11 May 2022 14:47:40 -0300	[thread overview]
Message-ID: <87zgjom05f.fsf@linux.ibm.com> (raw)
In-Reply-To: <1681b697-1ce6-b53c-068a-8728238d3272@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 06/05/2022 01:50, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> On 5/5/22 05:16, Fabiano Rosas wrote:
>>>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>>>>
>>>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>>>
>>>>> QEMU loads the kernel at 0x400000 by default which works most of
>>>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>>>> (position independent code). This works for a little endian zImage too.
>>>>>
>>>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>>>> 0x4000000 so current QEMU ends up loading it at
>>>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>>>
>>>>> This uses the kernel address returned from load_elf().
>>>>> If the default kernel_addr is used, there is no change in behavior (as
>>>>> translate_kernel_address() takes care of this), which is:
>>>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>>>> prints a warning and BE zImage boots.
>>>>
>>>> I think we can fix this without needing a different command line for BE
>>>> zImage (apart from x-vof, which is a separate matter).
>>>>
>>>> If you look at translate_kernel_address, it cannot really work when the
>>>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>>>> so if we fix that function like this...
>>>>
>>>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>>>> {
>>>>       SpaprMachineState *spapr = opaque;
>>>>
>>>>       return addr ? addr : spapr->kernel_addr;
>>>> }
>>>
>>>
>>> The qemu elf loader is supposed to handle relocations which should be
>>> calling this hook more than once, now I wonder why it is not doing so.
>> 
>> For relocations, it seems to only call translate_fn for s390x.
>
>
> Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/
>
>
>>>> ...then we could always use the ELF PhysAddr if it is different from 0
>>>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>>>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>>>> over the firmware (we have code to detect that).
>>>
>>>
>>> ELF address is 0 for LE zImage only, vmlinux BE/LE uses
>>> 0xc000000000000000. And we are already chopping all these tops bits off
>>> in translate_kernel_address() and I do not really know why _exactly_ it
>>> is 0x0fffffff and not let's say 0x7fffffff.
>> 
>> Oh, I am not talking about the ELF entry point. And that is not what
>> load_elf passes to translate_kernel_address either. It is the ELF
>> PhysAddr:
>> 
>> $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail
>> 
>> Program Headers:
>>    Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
>>    LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000
>> 
>> So it is 0x400000 for BE zImage and 0 for vmlinux.
>
> Ah right. Me wrong.
>
> btw potentially there can be more program sections.
>
> [fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf
>
> Elf file type is EXEC (Executable file)
> Entry point 0x100
> There are 2 program headers, starting at offset 64
>
> Program Headers:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
>    LOAD           0x0000000000000100 0x0000000000000100 0x0000000000000100
>                   0x0000000000008110 0x0000000000010290  RWE    0x4000
>    LOAD           0x0000000000008210 0x0000000000010400 0x0000000000010400
>                   0x0000000000000010 0x0000000000000010  RW     0x8
>
>
> grub might be similar.
>
>
>> 
>>>>
>>>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine)
>>>>>                exit(1);
>>>>>            }
>>>>>    
>>>>> +        if (spapr->kernel_addr != loaded_addr) {
>>>>
>>>> This could be:
>>>>
>>>>           if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>>>> 	    spapr->kernel_addr != loaded_addr) {
>>>>
>>>> So the precedence would be:
>>>>
>>>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>>>      falls here;
>>>>       
>>>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>>>      here;
>>>>
>>>> 3- kernel_addr. The user is probably hacking something, just use what
>>>>      they gave us. QEMU will yell if they load the kernel over the fw.
>>>
>>>
>>> imho too complicated.
>>>
>>> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
>>> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?
>> 
>> Good point. It should always be 3. It is a bad user interface to have a
>> command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.
>> 
>>> I am basically fixing a bug when we ignore where load_elf() loaded the
>>> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
>>> ELF was loaded where we want it to be.
>> 
>> That bug is already accounted for, that is why we have
>> translate_kernel_address:
>> 
>>    /* address_offset is hack for kernel images that are
>>       linked at the wrong physical address.  */
>>    if (translate_fn) {
>>        addr = translate_fn(translate_opaque, ph->p_paddr);
>> 
>> So we're actively telling load_elf to load the kernel at the wrong place
>> when we do:
>> 
>> (ph->p_addr & 0x0fffffff) + spapr->kernel_addr
>> 
>> It just happened to work so far because the vmlinux PhysAddr is 0.
>> 
>> When you set kernel-addr=0 you are simply working around this other bug
>> because:
>> 
>> (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr
>> 
>> I just want to remove this indirection and use the ELF PhysAddr
>> directly.
>
>
> That's alright but not in translate_kernel_address(). May be I should 
> rename kernel-addr to kernel-offset (which it really is)  or  hack 
> load_elf() so it would take the desired location and work out the offset 
> inside (and ditch the translate_fn hook) but either way we'll need 
> heuristics (or the user should know) to know if the image is 
> self-relocatable or not as otherwise it just won't boot.
>
>
>
>>> Everything else can be done but on top of this.
>> 
>> If you want to merge this I could send another patch on top of it later,
>> no worries. I realise that this a larger change that will need more
>> testing.
>
>
> I want to have some easy-to-explain way of booting BE zImage :)

Ok, I guess I can live with this kernel-addr=0 ugliness. We can deal
with translate_kernel_address another time.

Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>


  reply	other threads:[~2022-05-11 17:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04  6:55 [PATCH qemu] spapr: Use address from elf parser for kernel address Alexey Kardashevskiy
2022-05-04 19:16 ` Fabiano Rosas
2022-05-05  3:29   ` Alexey Kardashevskiy
2022-05-05  4:16     ` Joel Stanley
2022-05-05  5:07       ` Alexey Kardashevskiy
2022-05-05 15:50     ` Fabiano Rosas
2022-05-06  4:49       ` Alexey Kardashevskiy
2022-05-11 17:47         ` Fabiano Rosas [this message]
2022-05-12 17:47 ` Daniel Henrique Barboza
2022-05-17 18:58 ` Daniel Henrique Barboza
2022-05-18  2:51   ` Alexey Kardashevskiy
2022-05-18 10:07     ` Daniel Henrique Barboza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgjom05f.fsf@linux.ibm.com \
    --to=farosas@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.