From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: David Gibson <david@gibson.dropbear.id.au>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface
Date: Tue, 22 Jun 2021 17:49:57 +1000 [thread overview]
Message-ID: <025657b5-417b-7c06-dd7f-9536f156d705@ozlabs.ru> (raw)
In-Reply-To: <d673149d-39b-6ecd-875c-d2f1ff6020@eik.bme.hu>
On 6/18/21 20:13, BALATON Zoltan wrote:
> On Fri, 18 Jun 2021, Alexey Kardashevskiy wrote:
>> On 6/17/21 21:29, BALATON Zoltan wrote:
>>> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>>>> On 17/06/2021 19:16, BALATON Zoltan wrote:
>>>>> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>> On 16/06/2021 20:34, BALATON Zoltan wrote:
>>>>>>> On Wed, 16 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>>>> On 6/15/21 20:29, BALATON Zoltan wrote:
>>>>>>>>> On Tue, 15 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>>>>>> The PAPR platform describes an OS environment that's presented by
>>>>>>>>>> a combination of a hypervisor and firmware. The features it
>>>>>>>>>> specifies
>>>>>>>>>> require collaboration between the firmware and the hypervisor.
>>>>>>>>>>
>>>>>>>>>> Since the beginning, the runtime component of the firmware
>>>>>>>>>> (RTAS) has
>>>>>>>>>> been implemented as a 20 byte shim which simply forwards it to
>>>>>>>>>> a hypercall implemented in qemu. The boot time firmware
>>>>>>>>>> component is
>>>>>>>>>> SLOF - but a build that's specific to qemu, and has always
>>>>>>>>>> needed to be
>>>>>>>>>> updated in sync with it. Even though we've managed to limit
>>>>>>>>>> the amount
>>>>>>>>>> of runtime communication we need between qemu and SLOF,
>>>>>>>>>> there's some,
>>>>>>>>>> and it has become increasingly awkward to handle as we've
>>>>>>>>>> implemented
>>>>>>>>>> new features.
>>>>>>>>>>
>>>>>>>>>> This implements a boot time OF client interface (CI) which is
>>>>>>>>>> enabled by a new "x-vof" pseries machine option (stands for
>>>>>>>>>> "Virtual Open
>>>>>>>>>> Firmware). When enabled, QEMU implements the custom
>>>>>>>>>> H_OF_CLIENT hcall
>>>>>>>>>> which implements Open Firmware Client Interface (OF CI). This
>>>>>>>>>> allows
>>>>>>>>>> using a smaller stateless firmware which does not have to manage
>>>>>>>>>> the device tree.
>>>>>>>>>>
>>>>>>>>>> The new "vof.bin" firmware image is included with source code
>>>>>>>>>> under
>>>>>>>>>> pc-bios/. It also includes RTAS blob.
>>>>>>>>>>
>>>>>>>>>> This implements a handful of CI methods just to get
>>>>>>>>>> -kernel/-initrd
>>>>>>>>>> working. In particular, this implements the device tree
>>>>>>>>>> fetching and
>>>>>>>>>> simple memory allocator - "claim" (an OF CI memory allocator)
>>>>>>>>>> and updates
>>>>>>>>>> "/memory@0/available" to report the client about available
>>>>>>>>>> memory.
>>>>>>>>>>
>>>>>>>>>> This implements changing some device tree properties which we
>>>>>>>>>> know how
>>>>>>>>>> to deal with, the rest is ignored. To allow changes, this skips
>>>>>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some
>>>>>>>>>> room for
>>>>>>>>>> appending.
>>>>>>>>>>
>>>>>>>>>> In absence of SLOF, this assigns phandles to device tree nodes
>>>>>>>>>> to make
>>>>>>>>>> device tree traversing work.
>>>>>>>>>>
>>>>>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds
>>>>>>>>>> a tree.
>>>>>>>>>>
>>>>>>>>>> This adds basic instances support which are managed by a hash map
>>>>>>>>>> ihandle -> [phandle].
>>>>>>>>>>
>>>>>>>>>> Before the guest started, the used memory is:
>>>>>>>>>> 0..e60 - the initial firmware
>>>>>>>>>> 8000..10000 - stack
>>>>>>>>>> 400000.. - kernel
>>>>>>>>>> 3ea0000.. - initramdisk
>>>>>>>>>>
>>>>>>>>>> This OF CI does not implement "interpret".
>>>>>>>>>>
>>>>>>>>>> Unlike SLOF, this does not format uninitialized nvram.
>>>>>>>>>> Instead, this
>>>>>>>>>> includes a disk image with pre-formatted nvram.
>>>>>>>>>>
>>>>>>>>>> With this basic support, this can only boot into kernel directly.
>>>>>>>>>> However this is just enough for the petitboot kernel and
>>>>>>>>>> initradmdisk to
>>>>>>>>>> boot from any possible source. Note this requires reasonably
>>>>>>>>>> recent guest
>>>>>>>>>> kernel with:
>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
>>>>>>>>>> The immediate benefit is much faster booting time which
>>>>>>>>>> especially
>>>>>>>>>> crucial with fully emulated early CPU bring up environments.
>>>>>>>>>> Also this
>>>>>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the
>>>>>>>>>> day.
>>>>>>>>>>
>>>>>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be
>>>>>>>>>> reused by
>>>>>>>>>> other POWERPC boards which do not support pSeries.
>>>>>>>>>>
>>>>>>>>>> This make VOF optional, it is disabled by default, add
>>>>>>>>>> --enable-vof
>>>>>>>>>> to ./configure to enable it.
>>>>>>>>>>
>>>>>>>>>> This assumes potential support for booting from QEMU backends
>>>>>>>>>> such as blockdev or netdev without devices/drivers used.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> The example command line is:
>>>>>>>>>>
>>>>>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64
>>>>>>>>>> \
>>>>>>>>>> -nodefaults \
>>>>>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>>>>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>>>>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>>>>>>>>> -nographic \
>>>>>>>>>> -vga none \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -m 8G \
>>>>>>>>>> -machine
>>>>>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
>>>>>>>>>> \
>>>>>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \
>>>>>>>>>> -initrd pb/rootfs.cpio.xz \
>>>>>>>>>> -drive
>>>>>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw
>>>>>>>>>> \
>>>>>>>>>> -global spapr-nvram.drive=DRIVE0 \
>>>>>>>>>> -snapshot \
>>>>>>>>>> -smp 8,threads=8 \
>>>>>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \
>>>>>>>>>> -trace events=qemu_trace_events \
>>>>>>>>>> -d guest_errors \
>>>>>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>>>>>>>>>> -mon chardev=SOCKET0,mode=control
>>>>>>>>>
>>>>>>>>> I haven't looked at it in detail yet, just some quick comments
>>>>>>>>> I have on first skim through.
>>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>> Changes:
>>>>>>>>>> v21:
>>>>>>>>>> * s/ld/ldz/ in entry.S
>>>>>>>>>
>>>>>>>>> Typo? Has this become lwz?
>>>>>>>>
>>>>>>>> Yup, lwz.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> * moved CONFIG_VOF from
>>>>>>>>>> default-configs/devices/ppc64-softmmu.mak to Kconfig
>>>>>>>>>> * made CONFIG_VOF optional
>>>>>>>>>
>>>>>>>>> This won't work for pegasos2, see below.
>>>>>>>>>
>>>>>>>>>> * s/l.lds/vof.lds/
>>>>>>>>>> * force 32 BE in spapr_machine_reset() instead of the firmware
>>>>>>>>>> * added checks for non-null methods of VofMachineIfClass
>>>>>>>>>> * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a
>>>>>>>>>> better comment
>>>>>>>>>> * added path_offset wrapper for handling mixed case for
>>>>>>>>>> addresses
>>>>>>>>>> after "@" in node names
>>>>>>>>>> * changed getprop() to check for actual "name" property in the
>>>>>>>>>> fdt
>>>>>>>>>> * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as
>>>>>>>>>> (unlike similar
>>>>>>>>>> rtas_ld/ldl_be_*) they return error codes
>>>>>>>>>> * VOF_MEM_READ uses now address_space_read (it was
>>>>>>>>>> address_space_read_full
>>>>>>>>>> before, not sure why)
>>>>>>>>> [...]
>>>>>>>>>> ---
>>>>>>>>>> configure | 9 +
>>>>>>>>>> pc-bios/vof/Makefile | 23 +
>>>>>>>>>> include/hw/ppc/spapr.h | 25 +-
>>>>>>>>>> include/hw/ppc/vof.h | 55 ++
>>>>>>>>>> pc-bios/vof/vof.h | 43 ++
>>>>>>>>>> hw/ppc/spapr.c | 87 +++-
>>>>>>>>>> hw/ppc/spapr_hcall.c | 29 +-
>>>>>>>>>> hw/ppc/spapr_vof.c | 153 ++++++
>>>>>>>>>> hw/ppc/vof.c | 1052
>>>>>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>>>>>> pc-bios/vof/bootmem.c | 14 +
>>>>>>>>>> pc-bios/vof/ci.c | 91 ++++
>>>>>>>>>> pc-bios/vof/libc.c | 92 ++++
>>>>>>>>>> pc-bios/vof/main.c | 21 +
>>>>>>>>>> tests/qtest/rtas-test.c | 17 +-
>>>>>>>>>> MAINTAINERS | 12 +
>>>>>>>>>> hw/ppc/Kconfig | 3 +
>>>>>>>>>> hw/ppc/meson.build | 3 +
>>>>>>>>>> hw/ppc/trace-events | 24 +
>>>>>>>>>> meson.build | 1 +
>>>>>>>>>> pc-bios/README | 2 +
>>>>>>>>>> pc-bios/vof-nvram.bin | Bin 0 -> 16384 bytes
>>>>>>>>>> pc-bios/vof.bin | Bin 0 -> 3784 bytes
>>>>>>>>>> pc-bios/vof/entry.S | 49 ++
>>>>>>>>>> pc-bios/vof/vof.lds | 48 ++
>>>>>>>>>> 24 files changed, 1840 insertions(+), 13 deletions(-)
>>>>>>>>>> create mode 100644 pc-bios/vof/Makefile
>>>>>>>>>> create mode 100644 include/hw/ppc/vof.h
>>>>>>>>>> create mode 100644 pc-bios/vof/vof.h
>>>>>>>>>> create mode 100644 hw/ppc/spapr_vof.c
>>>>>>>>>> create mode 100644 hw/ppc/vof.c
>>>>>>>>>> create mode 100644 pc-bios/vof/bootmem.c
>>>>>>>>>> create mode 100644 pc-bios/vof/ci.c
>>>>>>>>>> create mode 100644 pc-bios/vof/libc.c
>>>>>>>>>> create mode 100644 pc-bios/vof/main.c
>>>>>>>>>> create mode 100644 pc-bios/vof-nvram.bin
>>>>>>>>>> create mode 100755 pc-bios/vof.bin
>>>>>>>>>> create mode 100644 pc-bios/vof/entry.S
>>>>>>>>>> create mode 100644 pc-bios/vof/vof.lds
>>>>>>>>>>
>>>>> [...]
>>>>>>>>>> diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h
>>>>>>>>>> new file mode 100644
>>>>>>>>>> index 000000000000..65ca2fed0d41
>>>>>>>>>> --- /dev/null
>>>>>>>>>> +++ b/include/hw/ppc/vof.h
>>>>>>>>>> @@ -0,0 +1,55 @@
>>>>>>>>>> +/*
>>>>>>>>>> + * Virtual Open Firmware
>>>>>>>>>> + *
>>>>>>>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>>>>>>>> + */
>>>>>>>>>> +#ifndef HW_VOF_H
>>>>>>>>>> +#define HW_VOF_H
>>>>>>>>>> +
>>>>>>>>>> +typedef struct Vof {
>>>>>>>>>> + uint64_t top_addr; /* copied from rma_size */
>>>>>>>>>> + GArray *claimed; /* array of SpaprOfClaimed */
>>>>>>>>>> + uint64_t claimed_base;
>>>>>>>>>> + GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
>>>>>>>>>> + uint32_t of_instance_last;
>>>>>>>>>> + char *bootargs;
>>>>>>>>>> + long fw_size;
>>>>>>>>>> +} Vof;
>>>>>>>>>> +
>>>>>>>>>> +int vof_client_call(MachineState *ms, Vof *vof, void *fdt,
>>>>>>>>>> + target_ulong args_real);
>>>>>>>>>> +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size,
>>>>>>>>>> uint64_t align);
>>>>>>>>>> +void vof_init(Vof *vof, uint64_t top_addr, Error **errp);
>>>>>>>>>> +void vof_cleanup(Vof *vof);
>>>>>>>>>> +void vof_build_dt(void *fdt, Vof *vof);
>>>>>>>>>> +uint32_t vof_client_open_store(void *fdt, Vof *vof, const
>>>>>>>>>> char *nodename,
>>>>>>>>>> + const char *prop, const char
>>>>>>>>>> *path);
>>>>>>>>>> +
>>>>>>>>>> +#define TYPE_VOF_MACHINE_IF "vof-machine-if"
>>>>>>>>>> +
>>>>>>>>>> +typedef struct VofMachineIfClass VofMachineIfClass;
>>>>>>>>>> +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE,
>>>>>>>>>> TYPE_VOF_MACHINE_IF)
>>>>>>>>>> +
>>>>>>>>>> +struct VofMachineIfClass {
>>>>>>>>>> + InterfaceClass parent;
>>>>>>>>>> + target_ulong (*client_architecture_support)(MachineState
>>>>>>>>>> *ms, CPUState *cs,
>>>>>>>>>> + target_ulong
>>>>>>>>>> vec);
>>>>>>>>>> + void (*quiesce)(MachineState *ms);
>>>>>>>>>> + bool (*setprop)(MachineState *ms, const char *path, const
>>>>>>>>>> char *propname,
>>>>>>>>>> + void *val, int vallen);
>>>>>>>>>> +};
>>>>>>>>>> +
>>>>>>>>>> +/*
>>>>>>>>>> + * Initial stack size is from
>>>>>>>>>> + *
>>>>>>>>>> https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html
>>>>>>>>>> + */
>>>>>>>>>> +#define VOF_STACK_SIZE 0x8000
>>>>>>>>>
>>>>>>>>> Maybe also add a define for RTAS_SIZE here? We'll need to put
>>>>>>>>> that in the device tree but it depends on the rtas shim size
>>>>>>>>> that's part of VOF so it should be defined here instead of
>>>>>>>>> hardcoding it in boards that use VOF so it can be updated later
>>>>>>>>> at one place if needed.
>>>>>>>>
>>>>>>>> This is rtas-size for pseries:
>>>>>>>>
>>>>>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
>>>>>>>> ms->smp.max_cpus * sizeof(uint64_t)*2 +
>>>>>>>> sizeof(uint64_t)));
>>>>>>>>
>>>>>>>> => depends on cpus => depends on the command line.
>>>>>>>>
>>>>>>>>
>>>>>>>> RTAS_SIZE is not used by anything in pseries anymore, I'll send
>>>>>>>> a patch to ditch it.
>>>>>>>
>>>>>>> I mean you need to have at least the size of code in
>>>>>>> pc-bios/vof/entry.S hv_rtas where also hv_rtas_size is defined
>>>>>>> but that value is not available in QEMU where one needs to add it
>>>>>>> to the device tree. So a define for that should be here in vof.h.
>>>>>>> Currently I've counted instructions and have
>>>>>>>
>>>>>>> qemu_fdt_setprop_cell(fdt, "/rtas", "rtas-size", 20);
>>>>>>>
>>>>>>> in pegasos2.c but that 20 should be some VOF_RTAS_SIZE instead
>>>>>>> that you define corresponding to hv_rtas_size. You'll probably
>>>>>>> need the same even after changing above rtas size calculation in
>>>>>>> spapr because client has to allocate memory for instantiate-rtas.
>>>>>>
>>>>>>
>>>>>> Ah fair point. I do not like "20" here and I think the right thing
>>>>>> will be adding whatever number of bytes to rtas-size in the
>>>>>> firmware itself and update it in QEMU via "setprop" as we do for
>>>>>> "linux,rtas-base". And then do the same in SLOF.
>>>>>
>>>>> This is not the base address but the size of the shim with the
>>>>> hypercall that instantiate-rtas copies. Why does it need to be
>>>>> updated?
>>>>
>>>> The vm kernel allocates the space for it.
>>>>
>>>>> And why does it need to be more bytes than necessary?
>>>>
>>>> What is necessary? It is definitely way more than 20 bytes.
>>>
>>> I thought instantiate-rtas only copies the hv_rtas routine as the
>>> comment in qemu/pc-bios/vof/entry.S says
>>
>> It does only copy the code, correct.
>>
>>> and that routine is 20 bytes.
>>
>>
>> There is no "#define XXX 20" anywhere though. QEMU does not know and
>> does not need to know that it is 20, it does not manage the RTAS blob.
>
> But it manages the rtas-size property which has to be at least the RTAS
> blob size so that's why I thought VOF should share this define in vof.h.
>
>>
>>> What else is needed? If that's not enough then we even more need a
>>> define for it as boards using VOF have no idea otherwise.
>>>
>>>>> I don't know what you do for spapr and why do you need larger
>>>>> rtas-size than this but for pegasos2 this /rtas/rtas-size property
>>>>> is only used by guests to allocate memory for rtas so all I need is
>>>>> how many bytes are needed for hv_rtas in pc-bios/vof/entry.S which
>>>>> is what should be #defined in vof.h. I've found 20 is just enough
>>>>> so you could add that to vof.h.
>>>>
>>>> I am thinking now that may be the property should be created by
>>>> vof.bin and not QEMU, QEMU just has to tell how many bytes on top it
>>>> needs.
>>>
>>> Maybe. If it's always in /rtas/rtas-size on every OF implementation
>>> (if that path is kind of standard for rtas) then that could also work
>>> or you could have an vof_init_rtas() function or similar that sets
>>> this, maybe pass it "/rtas" as path argument or even the whole
>>> property path ("/rtas/rtas-size") to avoid hard coding it and let the
>>> board tell it where it expects this property, then the value can be
>>> set by this function so that's within VOF then. But I think just
>>> adding a define for it in vof.h is enough and simple. Then boards can
>>> add whatever they need and put that in the property where they like.
>>
>>
>> My idea is that boards like pegasos put a zero in such property and
>> VOF then adjusts it to whatever it is + 20.
>
> That could work too if VOF knows how to find this property. If it's the
> same on every board then it does not have to look through the whole tree
> for it.
After some thinking, I guess you just have to use 20 or some safe future
proof number (64 bytes or similar) as I cannot reliably define the RTAS
blob size in QEMU. Hacking the firmware seems even worse as the firmware
does not really care. Well, I can say in QEMU it is 20 but the same code
in spapr_rtas.c is supposed to work with the RTAS blob provided by VOF
and by SLOF and even though these are identical now, this is not
enforced ahyhow and not checked either.
--
Alexey
next prev parent reply other threads:[~2021-06-22 7:51 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 7:06 [PATCH qemu v21] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-06-15 10:29 ` BALATON Zoltan
2021-06-16 6:49 ` Alexey Kardashevskiy
2021-06-16 10:34 ` BALATON Zoltan
2021-06-17 2:23 ` Alexey Kardashevskiy
2021-06-17 9:16 ` BALATON Zoltan
2021-06-17 10:28 ` Alexey Kardashevskiy
2021-06-17 11:29 ` BALATON Zoltan
2021-06-18 3:19 ` Alexey Kardashevskiy
2021-06-18 10:13 ` BALATON Zoltan
2021-06-22 7:49 ` Alexey Kardashevskiy [this message]
2021-06-19 9:28 ` David Gibson
2021-06-15 21:09 ` BALATON Zoltan
2021-06-16 6:49 ` Alexey Kardashevskiy
2021-06-16 10:26 ` BALATON Zoltan
2021-06-17 2:40 ` Alexey Kardashevskiy
2021-06-17 9:21 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=025657b5-417b-7c06-dd7f-9536f156d705@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=balaton@eik.bme.hu \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).