From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: David Gibson <david@gibson.dropbear.id.au>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH qemu v21] spapr: Implement Open Firmware client interface
Date: Thu, 17 Jun 2021 20:28:54 +1000 [thread overview]
Message-ID: <c680d69c-7dfc-fec5-031c-403a19e7050c@ozlabs.ru> (raw)
In-Reply-To: <33b662dc-6a97-6e61-c17-c21a4a44d8fb@eik.bme.hu>
On 17/06/2021 19:16, BALATON Zoltan wrote:
> On Thu, 17 Jun 2021, Alexey Kardashevskiy wrote:
>> On 16/06/2021 20:34, BALATON Zoltan wrote:
>>> On Wed, 16 Jun 2021, Alexey Kardashevskiy wrote:
>>>> On 6/15/21 20:29, BALATON Zoltan wrote:
>>>>> On Tue, 15 Jun 2021, Alexey Kardashevskiy wrote:
>>>>>> The PAPR platform describes an OS environment that's presented by
>>>>>> a combination of a hypervisor and firmware. The features it specifies
>>>>>> require collaboration between the firmware and the hypervisor.
>>>>>>
>>>>>> Since the beginning, the runtime component of the firmware (RTAS) has
>>>>>> been implemented as a 20 byte shim which simply forwards it to
>>>>>> a hypercall implemented in qemu. The boot time firmware component is
>>>>>> SLOF - but a build that's specific to qemu, and has always needed
>>>>>> to be
>>>>>> updated in sync with it. Even though we've managed to limit the
>>>>>> amount
>>>>>> of runtime communication we need between qemu and SLOF, there's some,
>>>>>> and it has become increasingly awkward to handle as we've implemented
>>>>>> new features.
>>>>>>
>>>>>> This implements a boot time OF client interface (CI) which is
>>>>>> enabled by a new "x-vof" pseries machine option (stands for
>>>>>> "Virtual Open
>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
>>>>>> which implements Open Firmware Client Interface (OF CI). This allows
>>>>>> using a smaller stateless firmware which does not have to manage
>>>>>> the device tree.
>>>>>>
>>>>>> The new "vof.bin" firmware image is included with source code under
>>>>>> pc-bios/. It also includes RTAS blob.
>>>>>>
>>>>>> This implements a handful of CI methods just to get -kernel/-initrd
>>>>>> working. In particular, this implements the device tree fetching and
>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and
>>>>>> updates
>>>>>> "/memory@0/available" to report the client about available memory.
>>>>>>
>>>>>> This implements changing some device tree properties which we know
>>>>>> how
>>>>>> to deal with, the rest is ignored. To allow changes, this skips
>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for
>>>>>> appending.
>>>>>>
>>>>>> In absence of SLOF, this assigns phandles to device tree nodes to
>>>>>> make
>>>>>> device tree traversing work.
>>>>>>
>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
>>>>>>
>>>>>> This adds basic instances support which are managed by a hash map
>>>>>> ihandle -> [phandle].
>>>>>>
>>>>>> Before the guest started, the used memory is:
>>>>>> 0..e60 - the initial firmware
>>>>>> 8000..10000 - stack
>>>>>> 400000.. - kernel
>>>>>> 3ea0000.. - initramdisk
>>>>>>
>>>>>> This OF CI does not implement "interpret".
>>>>>>
>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this
>>>>>> includes a disk image with pre-formatted nvram.
>>>>>>
>>>>>> With this basic support, this can only boot into kernel directly.
>>>>>> However this is just enough for the petitboot kernel and
>>>>>> initradmdisk to
>>>>>> boot from any possible source. Note this requires reasonably
>>>>>> recent guest
>>>>>> kernel with:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
>>>>>> The immediate benefit is much faster booting time which especially
>>>>>> crucial with fully emulated early CPU bring up environments. Also
>>>>>> this
>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day.
>>>>>>
>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by
>>>>>> other POWERPC boards which do not support pSeries.
>>>>>>
>>>>>> This make VOF optional, it is disabled by default, add --enable-vof
>>>>>> to ./configure to enable it.
>>>>>>
>>>>>> This assumes potential support for booting from QEMU backends
>>>>>> such as blockdev or netdev without devices/drivers used.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>
>>>>>> The example command line is:
>>>>>>
>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
>>>>>> -nodefaults \
>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>>>>> -nographic \
>>>>>> -vga none \
>>>>>> -enable-kvm \
>>>>>> -m 8G \
>>>>>> -machine
>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
>>>>>> \
>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \
>>>>>> -initrd pb/rootfs.cpio.xz \
>>>>>> -drive
>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw
>>>>>> \
>>>>>> -global spapr-nvram.drive=DRIVE0 \
>>>>>> -snapshot \
>>>>>> -smp 8,threads=8 \
>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \
>>>>>> -trace events=qemu_trace_events \
>>>>>> -d guest_errors \
>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>>>>>> -mon chardev=SOCKET0,mode=control
>>>>>
>>>>> I haven't looked at it in detail yet, just some quick comments I
>>>>> have on first skim through.
>>>>>
>>>>>> ---
>>>>>> Changes:
>>>>>> v21:
>>>>>> * s/ld/ldz/ in entry.S
>>>>>
>>>>> Typo? Has this become lwz?
>>>>
>>>> Yup, lwz.
>>>>
>>>>>
>>>>>> * moved CONFIG_VOF from default-configs/devices/ppc64-softmmu.mak
>>>>>> to Kconfig
>>>>>> * made CONFIG_VOF optional
>>>>>
>>>>> This won't work for pegasos2, see below.
>>>>>
>>>>>> * s/l.lds/vof.lds/
>>>>>> * force 32 BE in spapr_machine_reset() instead of the firmware
>>>>>> * added checks for non-null methods of VofMachineIfClass
>>>>>> * moved OF_STACK_SIZE to vof.h, renamed to VOF_..., added a better
>>>>>> comment
>>>>>> * added path_offset wrapper for handling mixed case for addresses
>>>>>> after "@" in node names
>>>>>> * changed getprop() to check for actual "name" property in the fdt
>>>>>> * moved VOF_MEM_READ/VOF_MEM_WRITE to vof.h for sharing as (unlike
>>>>>> similar
>>>>>> rtas_ld/ldl_be_*) they return error codes
>>>>>> * VOF_MEM_READ uses now address_space_read (it was
>>>>>> address_space_read_full
>>>>>> before, not sure why)
>>>>> [...]
>>>>>> ---
>>>>>> configure | 9 +
>>>>>> pc-bios/vof/Makefile | 23 +
>>>>>> include/hw/ppc/spapr.h | 25 +-
>>>>>> include/hw/ppc/vof.h | 55 ++
>>>>>> pc-bios/vof/vof.h | 43 ++
>>>>>> hw/ppc/spapr.c | 87 +++-
>>>>>> hw/ppc/spapr_hcall.c | 29 +-
>>>>>> hw/ppc/spapr_vof.c | 153 ++++++
>>>>>> hw/ppc/vof.c | 1052
>>>>>> +++++++++++++++++++++++++++++++++++++++
>>>>>> pc-bios/vof/bootmem.c | 14 +
>>>>>> pc-bios/vof/ci.c | 91 ++++
>>>>>> pc-bios/vof/libc.c | 92 ++++
>>>>>> pc-bios/vof/main.c | 21 +
>>>>>> tests/qtest/rtas-test.c | 17 +-
>>>>>> MAINTAINERS | 12 +
>>>>>> hw/ppc/Kconfig | 3 +
>>>>>> hw/ppc/meson.build | 3 +
>>>>>> hw/ppc/trace-events | 24 +
>>>>>> meson.build | 1 +
>>>>>> pc-bios/README | 2 +
>>>>>> pc-bios/vof-nvram.bin | Bin 0 -> 16384 bytes
>>>>>> pc-bios/vof.bin | Bin 0 -> 3784 bytes
>>>>>> pc-bios/vof/entry.S | 49 ++
>>>>>> pc-bios/vof/vof.lds | 48 ++
>>>>>> 24 files changed, 1840 insertions(+), 13 deletions(-)
>>>>>> create mode 100644 pc-bios/vof/Makefile
>>>>>> create mode 100644 include/hw/ppc/vof.h
>>>>>> create mode 100644 pc-bios/vof/vof.h
>>>>>> create mode 100644 hw/ppc/spapr_vof.c
>>>>>> create mode 100644 hw/ppc/vof.c
>>>>>> create mode 100644 pc-bios/vof/bootmem.c
>>>>>> create mode 100644 pc-bios/vof/ci.c
>>>>>> create mode 100644 pc-bios/vof/libc.c
>>>>>> create mode 100644 pc-bios/vof/main.c
>>>>>> create mode 100644 pc-bios/vof-nvram.bin
>>>>>> create mode 100755 pc-bios/vof.bin
>>>>>> create mode 100644 pc-bios/vof/entry.S
>>>>>> create mode 100644 pc-bios/vof/vof.lds
>>>>>>
> [...]
>>>>>> diff --git a/include/hw/ppc/vof.h b/include/hw/ppc/vof.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..65ca2fed0d41
>>>>>> --- /dev/null
>>>>>> +++ b/include/hw/ppc/vof.h
>>>>>> @@ -0,0 +1,55 @@
>>>>>> +/*
>>>>>> + * Virtual Open Firmware
>>>>>> + *
>>>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>>>> + */
>>>>>> +#ifndef HW_VOF_H
>>>>>> +#define HW_VOF_H
>>>>>> +
>>>>>> +typedef struct Vof {
>>>>>> + uint64_t top_addr; /* copied from rma_size */
>>>>>> + GArray *claimed; /* array of SpaprOfClaimed */
>>>>>> + uint64_t claimed_base;
>>>>>> + GHashTable *of_instances; /* ihandle -> SpaprOfInstance */
>>>>>> + uint32_t of_instance_last;
>>>>>> + char *bootargs;
>>>>>> + long fw_size;
>>>>>> +} Vof;
>>>>>> +
>>>>>> +int vof_client_call(MachineState *ms, Vof *vof, void *fdt,
>>>>>> + target_ulong args_real);
>>>>>> +uint64_t vof_claim(Vof *vof, uint64_t virt, uint64_t size,
>>>>>> uint64_t align);
>>>>>> +void vof_init(Vof *vof, uint64_t top_addr, Error **errp);
>>>>>> +void vof_cleanup(Vof *vof);
>>>>>> +void vof_build_dt(void *fdt, Vof *vof);
>>>>>> +uint32_t vof_client_open_store(void *fdt, Vof *vof, const char
>>>>>> *nodename,
>>>>>> + const char *prop, const char *path);
>>>>>> +
>>>>>> +#define TYPE_VOF_MACHINE_IF "vof-machine-if"
>>>>>> +
>>>>>> +typedef struct VofMachineIfClass VofMachineIfClass;
>>>>>> +DECLARE_CLASS_CHECKERS(VofMachineIfClass, VOF_MACHINE,
>>>>>> TYPE_VOF_MACHINE_IF)
>>>>>> +
>>>>>> +struct VofMachineIfClass {
>>>>>> + InterfaceClass parent;
>>>>>> + target_ulong (*client_architecture_support)(MachineState *ms,
>>>>>> CPUState *cs,
>>>>>> + target_ulong vec);
>>>>>> + void (*quiesce)(MachineState *ms);
>>>>>> + bool (*setprop)(MachineState *ms, const char *path, const
>>>>>> char *propname,
>>>>>> + void *val, int vallen);
>>>>>> +};
>>>>>> +
>>>>>> +/*
>>>>>> + * Initial stack size is from
>>>>>> + *
>>>>>> https://www.devicetree.org/open-firmware/bindings/ppc/release/ppc-2_1.html
>>>>>> + */
>>>>>> +#define VOF_STACK_SIZE 0x8000
>>>>>
>>>>> Maybe also add a define for RTAS_SIZE here? We'll need to put that
>>>>> in the device tree but it depends on the rtas shim size that's part
>>>>> of VOF so it should be defined here instead of hardcoding it in
>>>>> boards that use VOF so it can be updated later at one place if needed.
>>>>
>>>> This is rtas-size for pseries:
>>>>
>>>> _FDT(fdt_setprop_cell(fdt, rtas, "rtas-size", RTAS_ERROR_LOG_MAX +
>>>> ms->smp.max_cpus * sizeof(uint64_t)*2 + sizeof(uint64_t)));
>>>>
>>>> => depends on cpus => depends on the command line.
>>>>
>>>>
>>>> RTAS_SIZE is not used by anything in pseries anymore, I'll send a
>>>> patch to ditch it.
>>>
>>> I mean you need to have at least the size of code in
>>> pc-bios/vof/entry.S hv_rtas where also hv_rtas_size is defined but
>>> that value is not available in QEMU where one needs to add it to the
>>> device tree. So a define for that should be here in vof.h. Currently
>>> I've counted instructions and have
>>>
>>> qemu_fdt_setprop_cell(fdt, "/rtas", "rtas-size", 20);
>>>
>>> in pegasos2.c but that 20 should be some VOF_RTAS_SIZE instead that
>>> you define corresponding to hv_rtas_size. You'll probably need the
>>> same even after changing above rtas size calculation in spapr because
>>> client has to allocate memory for instantiate-rtas.
>>
>>
>> Ah fair point. I do not like "20" here and I think the right thing
>> will be adding whatever number of bytes to rtas-size in the firmware
>> itself and update it in QEMU via "setprop" as we do for
>> "linux,rtas-base". And then do the same in SLOF.
>
> This is not the base address but the size of the shim with the hypercall
> that instantiate-rtas copies. Why does it need to be updated?
The vm kernel allocates the space for it.
> And why
> does it need to be more bytes than necessary?
What is necessary? It is definitely way more than 20 bytes.
> I don't know what you do
> for spapr and why do you need larger rtas-size than this but for
> pegasos2 this /rtas/rtas-size property is only used by guests to
> allocate memory for rtas so all I need is how many bytes are needed for
> hv_rtas in pc-bios/vof/entry.S which is what should be #defined in
> vof.h. I've found 20 is just enough so you could add that to vof.h.
I am thinking now that may be the property should be created by vof.bin
and not QEMU, QEMU just has to tell how many bytes on top it needs.
--
Alexey
next prev parent reply other threads:[~2021-06-17 10:35 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-15 7:06 [PATCH qemu v21] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-06-15 10:29 ` BALATON Zoltan
2021-06-16 6:49 ` Alexey Kardashevskiy
2021-06-16 10:34 ` BALATON Zoltan
2021-06-17 2:23 ` Alexey Kardashevskiy
2021-06-17 9:16 ` BALATON Zoltan
2021-06-17 10:28 ` Alexey Kardashevskiy [this message]
2021-06-17 11:29 ` BALATON Zoltan
2021-06-18 3:19 ` Alexey Kardashevskiy
2021-06-18 10:13 ` BALATON Zoltan
2021-06-22 7:49 ` Alexey Kardashevskiy
2021-06-19 9:28 ` David Gibson
2021-06-15 21:09 ` BALATON Zoltan
2021-06-16 6:49 ` Alexey Kardashevskiy
2021-06-16 10:26 ` BALATON Zoltan
2021-06-17 2:40 ` Alexey Kardashevskiy
2021-06-17 9:21 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c680d69c-7dfc-fec5-031c-403a19e7050c@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=balaton@eik.bme.hu \
--cc=david@gibson.dropbear.id.au \
--cc=groug@kaod.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).