qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface
Date: Tue, 22 Jun 2021 17:49:57 +1000	[thread overview]
Message-ID: <025657b5-417b-7c06-dd7f-9536f156d705@ozlabs.ru> (raw)
In-Reply-To: <d673149d-39b-6ecd-875c-d2f1ff6020@eik.bme.hu>



On 6/18/21 20:13, BALATON Zoltan wrote:
> On Fri, 18 Jun 2021, Alexey Kardashevskiy wrote:
>> On 6/17/21 21:29, BALATON Zoltan wrote:
>>> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>>>> On 17/06/2021 19:16, BALATON Zoltan wrote:
>>>>> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>> On 16/06/2021 20:34, BALATON Zoltan wrote:
>>>>>>> On Wed, 16 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>>>> On 6/15/21 20:29, BALATON Zoltan wrote:
>>>>>>>>> On Tue, 15 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>>>>>> The PAPR platform describes an OS environment that's presented by
>>>>>>>>>> a combination of a hypervisor and firmware. The features it 
>>>>>>>>>> specifies
>>>>>>>>>> require collaboration between the firmware and the hypervisor.
>>>>>>>>>>
>>>>>>>>>> Since the beginning, the runtime component of the firmware 
>>>>>>>>>> (RTAS) has
>>>>>>>>>> been implemented as a 20 byte shim which simply forwards it to
>>>>>>>>>> a hypercall implemented in qemu. The boot time firmware 
>>>>>>>>>> component is
>>>>>>>>>> SLOF - but a build that's specific to qemu, and has always 
>>>>>>>>>> needed to be
>>>>>>>>>> updated in sync with it. Even though we've managed to limit 
>>>>>>>>>> the amount
>>>>>>>>>> of runtime communication we need between qemu and SLOF, 
>>>>>>>>>> there's some,
>>>>>>>>>> and it has become increasingly awkward to handle as we've 
>>>>>>>>>> implemented
>>>>>>>>>> new features.
>>>>>>>>>>
>>>>>>>>>> This implements a boot time OF client interface (CI) which is
>>>>>>>>>> enabled by a new "x-vof" pseries machine option (stands for 
>>>>>>>>>> "Virtual Open
>>>>>>>>>> Firmware). When enabled, QEMU implements the custom 
>>>>>>>>>> H_OF_CLIENT hcall
>>>>>>>>>> which implements Open Firmware Client Interface (OF CI). This 
>>>>>>>>>> allows
>>>>>>>>>> using a smaller stateless firmware which does not have to manage
>>>>>>>>>> the device tree.
>>>>>>>>>>
>>>>>>>>>> The new "vof.bin" firmware image is included with source code 
>>>>>>>>>> under
>>>>>>>>>> pc-bios/. It also includes RTAS blob.
>>>>>>>>>>
>>>>>>>>>> This implements a handful of CI methods just to get 
>>>>>>>>>> -kernel/-initrd
>>>>>>>>>> working. In particular, this implements the device tree 
>>>>>>>>>> fetching and
>>>>>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) 
>>>>>>>>>> and updates
>>>>>>>>>> "/memory@0/available" to report the client about available 
>>>>>>>>>> memory.
>>>>>>>>>>
>>>>>>>>>> This implements changing some device tree properties which we 
>>>>>>>>>> know how
>>>>>>>>>> to deal with, the rest is ignored. To allow changes, this skips
>>>>>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some 
>>>>>>>>>> room for
>>>>>>>>>> appending.
>>>>>>>>>>
>>>>>>>>>> In absence of SLOF, this assigns phandles to device tree nodes 
>>>>>>>>>> to make
>>>>>>>>>> device tree traversing work.
>>>>>>>>>>
>>>>>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds 
>>>>>>>>>> a tree.
>>>>>>>>>>
>>>>>>>>>> This adds basic instances support which are managed by a hash map
>>>>>>>>>> ihandle -> [phandle].
>>>>>>>>>>
>>>>>>>>>> Before the guest started, the used memory is:
>>>>>>>>>> 0..e60 - the initial firmware
>>>>>>>>>> 8000..10000 - stack
>>>>>>>>>> 400000.. - kernel
>>>>>>>>>> 3ea0000.. - initramdisk
>>>>>>>>>>
>>>>>>>>>> This OF CI does not implement "interpret".
>>>>>>>>>>
>>>>>>>>>> Unlike SLOF, this does not format uninitialized nvram. 
>>>>>>>>>> Instead, this
>>>>>>>>>> includes a disk image with pre-formatted nvram.
>>>>>>>>>>
>>>>>>>>>> With this basic support, this can only boot into kernel directly.
>>>>>>>>>> However this is just enough for the petitboot kernel and 
>>>>>>>>>> initradmdisk to
>>>>>>>>>> boot from any possible source. Note this requires reasonably 
>>>>>>>>>> recent guest
>>>>>>>>>> kernel with:
>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 
>>>>>>>>>> The immediate benefit is much faster booting time which 
>>>>>>>>>> especially
>>>>>>>>>> crucial with fully emulated early CPU bring up environments. 
>>>>>>>>>> Also this
>>>>>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the 
>>>>>>>>>> day.
>>>>>>>>>>
>>>>>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be 
>>>>>>>>>> reused by
>>>>>>>>>> other POWERPC boards which do not support pSeries.
>>>>>>>>>>
>>>>>>>>>> This make VOF optional, it is disabled by default, add 
>>>>>>>>>> --enable-vof
>>>>>>>>>> to ./configure to enable it.
>>>>>>>>>>
>>>>>>>>>> This assumes potential support for booting from QEMU backends
>>>>>>>>>> such as blockdev or netdev without devices/drivers used.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> The example command line is:
>>>>>>>>>>
>>>>>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 
>>>>>>>>>> \
>>>>>>>>>> -nodefaults \
>>>>>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>>>>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>>>>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>>>>>>>>> -nographic \
>>>>>>>>>> -vga none \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -m 8G \
>>>>>>>>>> -machine 
>>>>>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off 
>>>>>>>>>> \
>>>>>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \
>>>>>>>>>> -initrd pb/rootfs.cpio.xz \
>>>>>>>>>> -drive 
>>>>>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw 
>>>>>>>>>> \
>>>>>>>>>> -global spapr-nvram.drive=DRIVE0 \
>>>>>>>>>> -snapshot \
>>>>>>>>>> -smp 8,threads=8 \
>>>>>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \
>>>>>>>>>> -trace events=qemu_trace_events \
>>>>>>>>>> -d guest_errors \
>>>>>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>>>>>>>>>> -mon chardev=SOCKET0,mode=control
>>>>>>>>>
>>>>>>>>> I haven't looked at it in detail yet, just some quick comments 
>>>>>>>>> I have on first skim through.
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> Changes:
>>>>>>>>>> v21:
>>>>>>>>>> * s/ld/ldz/ in entry.S
>>>>>>>>>
>>>>>>>>> Typo? Has this become lwz?
>>>>>>>>
>>>>>>>> Yup, lwz.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> * moved CONFIG_VOF from 
>>>>>>>>>> default-configs/devices/ppc64-softmmu.mak to Kconfig
>>>>>>>>>> * made CONFIG_VOF optional
>>>>>>>>>
>>>>>>>>> This won't work for pegasos2, see below.
>>>>>>>>>
>>>>>>>>>> * s/l.lds/vof.lds/
>>>>>>>>>> * force 32 BE in spapr_machine_reset() instead of the firmware
>>>>>>>>>> * added checks for non-null methods of VofMachineIfClass
>>>>>>>>>> * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a 
>>>>>>>>>> better comment
>>>>>>>>>> * added  path_offset wrapper for handling mixed case for 
>>>>>>>>>> addresses
>>>>>>>>>> after "@" in node names
>>>>>>>>>> * changed getprop() to check for actual "name" property in the 
>>>>>>>>>> fdt
>>>>>>>>>> * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as 
>>>>>>>>>> (unlike similar
>>>>>>>>>> rtas_ld/ldl_be_*) they return error codes
>>>>>>>>>> * VOF_MEM_READ uses now address_space_read (it was 
>>>>>>>>>> address_space_read_full
>>>>>>>>>> before, not sure why)
>>>>>>>>> [...]
>>>>>>>>>> ---
>>>>>>>>>> configure               |    9 +
>>>>>>>>>> pc-bios/vof/Makefile    |   23 +
>>>>>>>>>> include/hw/ppc/spapr.h  |   25 +-
>>>>>>>>>> include/hw/ppc/vof.h    |   55 ++
>>>>>>>>>> pc-bios/vof/vof.h       |   43 ++
>>>>>>>>>> hw/ppc/spapr.c          |   87 +++-
>>>>>>>>>> hw/ppc/spapr_hcall.c    |   29 +-
>>>>>>>>>> hw/ppc/spapr_vof.c      |  153 ++++++
>>>>>>>>>> hw/ppc/vof.c            | 1052 
>>>>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> pc-bios/vof/bootmem.c   |   14 +
>>>>>>>>>> pc-bios/vof/ci.c        |   91 ++++
>>>>>>>>>> pc-bios/vof/libc.c      |   92 ++++
>>>>>>>>>> pc-bios/vof/main.c      |   21 +
>>>>>>>>>> tests/qtest/rtas-test.c |   17 +-
>>>>>>>>>> MAINTAINERS             |   12 +
>>>>>>>>>> hw/ppc/Kconfig          |    3 +
>>>>>>>>>> hw/ppc/meson.build      |    3 +
>>>>>>>>>> hw/ppc/trace-events     |   24 +
>>>>>>>>>> meson.build             |    1 +
>>>>>>>>>> pc-bios/README          |    2 +
>>>>>>>>>> pc-bios/vof-nvram.bin   |  Bin 0 -> 16384 bytes
>>>>>>>>>> pc-bios/vof.bin         |  Bin 0 -> 3784 bytes
>>>>>>>>>> pc-bios/vof/entry.S     |   49 ++
>>>>>>>>>> pc-bios/vof/vof.lds     |   48 ++
>>>>>>>>>> 24 files changed, 1840 insertions(+), 13 deletions(-)
>>>>>>>>>> create mode 100644 pc-bios/vof/Makefile
>>>>>>>>>> create mode 100644 include/hw/ppc/vof.h
>>>>>>>>>> create mode 100644 pc-bios/vof/vof.h
>>>>>>>>>> create mode 100644 hw/ppc/spapr_vof.c
>>>>>>>>>> create mode 100644 hw/ppc/vof.c
>>>>>>>>>> create mode 100644 pc-bios/vof/bootmem.c
>>>>>>>>>> create mode 100644 pc-bios/vof/ci.c
>>>>>>>>>> create mode 100644 pc-bios/vof/libc.c
>>>>>>>>>> create mode 100644 pc-bios/vof/main.c
>>>>>>>>>> create mode 100644 pc-bios/vof-nvram.bin
>>>>>>>>>> create mode 100755 pc-bios/vof.bin
>>>>>>>>>> create mode 100644 pc-bios/vof/entry.S
>>>>>>>>>> create mode 100644 pc-bios/vof/vof.lds
>>>>>>>>>>
>>>>> [...]
>>>>>>>>>> diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h
>>>>>>>>>> new file mode 100644
>>>>>>>>>> index 000000000000..65ca2fed0d41
>>>>>>>>>> --- /dev/null
>>>>>>>>>> +++ b/include/hw/ppc/vof.h
>>>>>>>>>> @@ -0,0 +1,55 @@
>>>>>>>>>> +/*
>>>>>>>>>> + * Virtual Open Firmware
>>>>>>>>>> + *
>>>>>>>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>>>>>>>> + */
>>>>>>>>>> +#ifndef HW_VOF_H
>>>>>>>>>> +#define HW_VOF_H
>>>>>>>>>> +
>>>>>>>>>> +typedef struct Vof {
>>>>>>>>>> +    uint64_t top_addr; /* copied from rma_size */
>>>>>>>>>> +    GArray *claimed; /* array of SpaprOfClaimed */
>>>>>>>>>> +    uint64_t claimed_base;
>>>>>>>>>> +    GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
>>>>>>>>>> +    uint32_t of_instance_last;
>>>>>>>>>> +    char *bootargs;
>>>>>>>>>> +    long fw_size;
>>>>>>>>>> +} Vof;
>>>>>>>>>> +
>>>>>>>>>> +int vof_client_call(MachineState *ms, Vof *vof, void *fdt,
>>>>>>>>>> +                    target_ulong args_real);
>>>>>>>>>> +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size, 
>>>>>>>>>> uint64_t align);
>>>>>>>>>> +void vof_init(Vof *vof, uint64_t top_addr, Error **errp);
>>>>>>>>>> +void vof_cleanup(Vof *vof);
>>>>>>>>>> +void vof_build_dt(void *fdt, Vof *vof);
>>>>>>>>>> +uint32_t vof_client_open_store(void *fdt, Vof *vof, const 
>>>>>>>>>> char *nodename,
>>>>>>>>>> +                               const char *prop, const char 
>>>>>>>>>> *path);
>>>>>>>>>> +
>>>>>>>>>> +#define TYPE_VOF_MACHINE_IF "vof-machine-if"
>>>>>>>>>> +
>>>>>>>>>> +typedef struct VofMachineIfClass VofMachineIfClass;
>>>>>>>>>> +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE, 
>>>>>>>>>> TYPE_VOF_MACHINE_IF)
>>>>>>>>>> +
>>>>>>>>>> +struct VofMachineIfClass {
>>>>>>>>>> +    InterfaceClass parent;
>>>>>>>>>> +    target_ulong (*client_architecture_support)(MachineState 
>>>>>>>>>> *ms, CPUState *cs,
>>>>>>>>>> +                                                target_ulong 
>>>>>>>>>> vec);
>>>>>>>>>> +    void (*quiesce)(MachineState *ms);
>>>>>>>>>> +    bool (*setprop)(MachineState *ms, const char *path, const 
>>>>>>>>>> char *propname,
>>>>>>>>>> +                    void *val, int vallen);
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> +/*
>>>>>>>>>> + * Initial stack size is from
>>>>>>>>>> + * 
>>>>>>>>>> https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html 
>>>>>>>>>> + */
>>>>>>>>>> +#define VOF_STACK_SIZE       0x8000
>>>>>>>>>
>>>>>>>>> Maybe also add a define for RTAS_SIZE here? We'll need to put 
>>>>>>>>> that in the device tree but it depends on the rtas shim size 
>>>>>>>>> that's part of VOF so it should be defined here instead of 
>>>>>>>>> hardcoding it in boards that use VOF so it can be updated later 
>>>>>>>>> at one place if needed.
>>>>>>>>
>>>>>>>> This is rtas-size for pseries:
>>>>>>>>
>>>>>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
>>>>>>>>          ms->smp.max_cpus * sizeof(uint64_t)*2 + 
>>>>>>>> sizeof(uint64_t)));
>>>>>>>>
>>>>>>>> => depends on cpus => depends on the command line.
>>>>>>>>
>>>>>>>>
>>>>>>>> RTAS_SIZE is not used by anything in pseries anymore, I'll send 
>>>>>>>> a patch to ditch it.
>>>>>>>
>>>>>>> I mean you need to have at least the size of code in 
>>>>>>> pc-bios/vof/entry.S hv_rtas where also hv_rtas_size is defined 
>>>>>>> but that value is not available in QEMU where one needs to add it 
>>>>>>> to the device tree. So a define for that should be here in vof.h. 
>>>>>>> Currently I've counted instructions and have
>>>>>>>
>>>>>>>      qemu_fdt_setprop_cell(fdt, "/rtas", "rtas-size", 20);
>>>>>>>
>>>>>>> in pegasos2.c but that 20 should be some VOF_RTAS_SIZE instead 
>>>>>>> that you define corresponding to hv_rtas_size. You'll probably 
>>>>>>> need the same even after changing above rtas size calculation in 
>>>>>>> spapr because client has to allocate memory for instantiate-rtas.
>>>>>>
>>>>>>
>>>>>> Ah fair point. I do not like "20" here and I think the right thing 
>>>>>> will be adding whatever number of bytes to rtas-size in the 
>>>>>> firmware itself and update it in QEMU via "setprop" as we do for 
>>>>>> "linux,rtas-base". And then do the same in SLOF.
>>>>>
>>>>> This is not the base address but the size of the shim with the 
>>>>> hypercall that instantiate-rtas copies. Why does it need to be 
>>>>> updated?
>>>>
>>>> The vm kernel allocates the space for it.
>>>>
>>>>> And why does it need to be more bytes than necessary?
>>>>
>>>> What is necessary? It is definitely way more than 20 bytes.
>>>
>>> I thought instantiate-rtas only copies the hv_rtas routine as the 
>>> comment in qemu/pc-bios/vof/entry.S says
>>
>> It does only copy the code, correct.
>>
>>> and that routine is 20 bytes. 
>>
>>
>> There is no "#define XXX 20" anywhere though. QEMU does not know and 
>> does not need to know that it is 20, it does not manage the RTAS blob.
> 
> But it manages the rtas-size property which has to be at least the RTAS 
> blob size so that's why I thought VOF should share this define in vof.h.
> 
>>
>>> What else is needed? If that's not enough then we even more need a 
>>> define for it as boards using VOF have no idea otherwise.
>>>
>>>>> I don't know what you do for spapr and why do you need larger 
>>>>> rtas-size than this but for pegasos2 this /rtas/rtas-size property 
>>>>> is only used by guests to allocate memory for rtas so all I need is 
>>>>> how many bytes are needed for hv_rtas in pc-bios/vof/entry.S which 
>>>>> is what should be #defined in vof.h. I've found 20 is just enough 
>>>>> so you could add that to vof.h.
>>>>
>>>> I am thinking now that may be the property should be created by 
>>>> vof.bin and not QEMU, QEMU just has to tell how many bytes on top it 
>>>> needs.
>>>
>>> Maybe. If it's always in /rtas/rtas-size on every OF implementation 
>>> (if that path is kind of standard for rtas) then that could also work 
>>> or you could have an vof_init_rtas() function or similar that sets 
>>> this, maybe pass it "/rtas" as path argument or even the whole 
>>> property path ("/rtas/rtas-size") to avoid hard coding it and let the 
>>> board tell it where it expects this property, then the value can be 
>>> set by this function so that's within VOF then. But I think just 
>>> adding a define for it in vof.h is enough and simple. Then boards can 
>>> add whatever they need and put that in the property where they like.
>>
>>
>> My idea is that boards like pegasos put a zero in such property and 
>> VOF then adjusts it to whatever it is + 20.
> 
> That could work too if VOF knows how to find this property. If it's the 
> same on every board then it does not have to look through the whole tree 
> for it.


After some thinking, I guess you just have to use 20 or some safe future 
proof number (64 bytes or similar) as I cannot reliably define the RTAS 
blob size in QEMU. Hacking the firmware seems even worse as the firmware 
does not really care. Well, I can say in QEMU it is 20 but the same code 
in spapr_rtas.c is supposed to work with the RTAS blob provided by VOF 
and by SLOF and even though these are identical now, this is not 
enforced ahyhow and not checked either.



-- 
Alexey


  reply	other threads:[~2021-06-22  7:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15  7:06 [PATCH qemu v21] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-06-15 10:29 ` BALATON Zoltan
2021-06-16  6:49   ` Alexey Kardashevskiy
2021-06-16 10:34     ` BALATON Zoltan
2021-06-17  2:23       ` Alexey Kardashevskiy
2021-06-17  9:16         ` BALATON Zoltan
2021-06-17 10:28           ` Alexey Kardashevskiy
2021-06-17 11:29             ` BALATON Zoltan
2021-06-18  3:19               ` Alexey Kardashevskiy
2021-06-18 10:13                 ` BALATON Zoltan
2021-06-22  7:49                   ` Alexey Kardashevskiy [this message]
2021-06-19  9:28           ` David Gibson
2021-06-15 21:09 ` BALATON Zoltan
2021-06-16  6:49   ` Alexey Kardashevskiy
2021-06-16 10:26     ` BALATON Zoltan
2021-06-17  2:40       ` Alexey Kardashevskiy
2021-06-17  9:21         ` BALATON Zoltan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=025657b5-417b-7c06-dd7f-9536f156d705@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=balaton@eik.bme.hu \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).