From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: BALATON Zoltan <balaton@eik.bme.hu>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH qemu v20] spapr: Implement Open Firmware client interface
Date: Sun, 23 May 2021 13:20:11 +1000 [thread overview]
Message-ID: <4f6ceca3-5f18-fe70-18f9-4efde8feb1ed@ozlabs.ru> (raw)
In-Reply-To: <babe39af-fd34-8c5-de99-a0f485bfbce@eik.bme.hu>
On 22/05/2021 23:01, BALATON Zoltan wrote:
> On Sat, 22 May 2021, Alexey Kardashevskiy wrote:
>> On 21/05/2021 19:05, BALATON Zoltan wrote:
>>> On Fri, 21 May 2021, Alexey Kardashevskiy wrote:
>>>> On 21/05/2021 07:59, BALATON Zoltan wrote:
>>>>> On Thu, 20 May 2021, Alexey Kardashevskiy wrote:
>>>>>> The PAPR platform describes an OS environment that's presented by
>>>>>> a combination of a hypervisor and firmware. The features it specifies
>>>>>> require collaboration between the firmware and the hypervisor.
>>>>>>
>>>>>> Since the beginning, the runtime component of the firmware (RTAS) has
>>>>>> been implemented as a 20 byte shim which simply forwards it to
>>>>>> a hypercall implemented in qemu. The boot time firmware component is
>>>>>> SLOF - but a build that's specific to qemu, and has always needed
>>>>>> to be
>>>>>> updated in sync with it. Even though we've managed to limit the
>>>>>> amount
>>>>>> of runtime communication we need between qemu and SLOF, there's some,
>>>>>> and it has become increasingly awkward to handle as we've implemented
>>>>>> new features.
>>>>>>
>>>>>> This implements a boot time OF client interface (CI) which is
>>>>>> enabled by a new "x-vof" pseries machine option (stands for
>>>>>> "Virtual Open
>>>>>> Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall
>>>>>> which implements Open Firmware Client Interface (OF CI). This allows
>>>>>> using a smaller stateless firmware which does not have to manage
>>>>>> the device tree.
>>>>>>
>>>>>> The new "vof.bin" firmware image is included with source code under
>>>>>> pc-bios/. It also includes RTAS blob.
>>>>>>
>>>>>> This implements a handful of CI methods just to get -kernel/-initrd
>>>>>> working. In particular, this implements the device tree fetching and
>>>>>> simple memory allocator - "claim" (an OF CI memory allocator) and
>>>>>> updates
>>>>>> "/memory@0/available" to report the client about available memory.
>>>>>>
>>>>>> This implements changing some device tree properties which we know
>>>>>> how
>>>>>> to deal with, the rest is ignored. To allow changes, this skips
>>>>>> fdt_pack() when x-vof=on as not packing the blob leaves some room for
>>>>>> appending.
>>>>>>
>>>>>> In absence of SLOF, this assigns phandles to device tree nodes to
>>>>>> make
>>>>>> device tree traversing work.
>>>>>>
>>>>>> When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree.
>>>>>>
>>>>>> This adds basic instances support which are managed by a hash map
>>>>>> ihandle -> [phandle].
>>>>>>
>>>>>> Before the guest started, the used memory is:
>>>>>> 0..e60 - the initial firmware
>>>>>> 8000..10000 - stack
>>>>>> 400000.. - kernel
>>>>>> 3ea0000.. - initramdisk
>>>>>>
>>>>>> This OF CI does not implement "interpret".
>>>>>>
>>>>>> Unlike SLOF, this does not format uninitialized nvram. Instead, this
>>>>>> includes a disk image with pre-formatted nvram.
>>>>>>
>>>>>> With this basic support, this can only boot into kernel directly.
>>>>>> However this is just enough for the petitboot kernel and
>>>>>> initradmdisk to
>>>>>> boot from any possible source. Note this requires reasonably
>>>>>> recent guest
>>>>>> kernel with:
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735
>>>>>> The immediate benefit is much faster booting time which especially
>>>>>> crucial with fully emulated early CPU bring up environments. Also
>>>>>> this
>>>>>> may come handy when/if GRUB-in-the-userspace sees light of the day.
>>>>>>
>>>>>> This separates VOF and sPAPR in a hope that VOF bits may be reused by
>>>>>> other POWERPC boards which do not support pSeries.
>>>>>>
>>>>>> This is coded in assumption that later on we might be adding
>>>>>> support for
>>>>>> booting from QEMU backends (blockdev is the first candidate) without
>>>>>> devices/drivers in between as OF1275 does not require that and
>>>>>> it is quite easy to so.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>
>>>>>> The example command line is:
>>>>>>
>>>>>> /home/aik/pbuild/qemu-killslof-localhost-ppc64/qemu-system-ppc64 \
>>>>>> -nodefaults \
>>>>>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>>>>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>>>>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>>>>> -nographic \
>>>>>> -vga none \
>>>>>> -enable-kvm \
>>>>>> -m 8G \
>>>>>> -machine
>>>>>> pseries,x-vof=on,cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off
>>>>>> \
>>>>>> -kernel pbuild/kernel-le-guest/vmlinux \
>>>>>> -initrd pb/rootfs.cpio.xz \
>>>>>> -drive
>>>>>> id=DRIVE0,if=none,file=./p/qemu-killslof/pc-bios/vof-nvram.bin,format=raw
>>>>>> \
>>>>>> -global spapr-nvram.drive=DRIVE0 \
>>>>>> -snapshot \
>>>>>> -smp 8,threads=8 \
>>>>>> -L /home/aik/t/qemu-ppc64-bios/ \
>>>>>> -trace events=qemu_trace_events \
>>>>>> -d guest_errors \
>>>>>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.tmux26 \
>>>>>> -mon chardev=SOCKET0,mode=control
>>>>>>
>>>>>> ---
>>>>>> Changes:
>>>>>> v20:
>>>>>> * compile vof.bin with -mcpu=power4 for better compatibility
>>>>>> * s/std/stw/ in entry.S to make it work on ppc32
>>>>>> * fixed dt_available property to support both 32 and 64bit
>>>>>> * shuffled prom_args handling code
>>>>>> * do not enforce 32bit in MSR (again, to support 32bit platforms)
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>>> diff --git a/default-configs/devices/ppc64-softmmu.mak
>>>>>> b/default-configs/devices/ppc64-softmmu.mak
>>>>>> index ae0841fa3a18..9fb201dfacfa 100644
>>>>>> --- a/default-configs/devices/ppc64-softmmu.mak
>>>>>> +++ b/default-configs/devices/ppc64-softmmu.mak
>>>>>> @@ -9,3 +9,4 @@ CONFIG_POWERNV=y
>>>>>> # For pSeries
>>>>>> CONFIG_PSERIES=y
>>>>>> CONFIG_NVDIMM=y
>>>>>> +CONFIG_VOF=y
>>>>>> diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
>>>>>> index e51e0e5e5ac6..964510dfc73d 100644
>>>>>> --- a/hw/ppc/Kconfig
>>>>>> +++ b/hw/ppc/Kconfig
>>>>>> @@ -143,3 +143,6 @@ config FW_CFG_PPC
>>>>>>
>>>>>> config FDT_PPC
>>>>>> bool
>>>>>> +
>>>>>> +config VOF
>>>>>> + bool
>>>>>
>>>>> I think you should just add "select VOF" to config PSERIES section
>>>>> in Kconfig instead of adding it to
>>>>> default-configs/devices/ppc64-softmmu.mak.
>>>>
>>>> oh well, can do that too.
>>>
>>> I think most config options should be selected by KConfig and the
>>> default config should only include machines, otherwise VOF would be
>>> added also when you don't compile PSERIES or PEGASOS2. With select in
>>> Kconfig it will be added when needed. That's why it's better to use
>>> select in this case.
>>>
>>>>> That should do it, it works in my updated pegasos2 patch:
>>>>>
>>>>> https://osdn.net/projects/qmiga/scm/git/qemu/commits/3c1fad08469b4d3c04def22044e52b2d27774a61
>>>>> [...]
>>>>>> diff --git a/pc-bios/vof/entry.S b/pc-bios/vof/entry.S
>>>>>> new file mode 100644
>>>>>> index 000000000000..569688714c91
>>>>>> --- /dev/null
>>>>>> +++ b/pc-bios/vof/entry.S
>>>>>> @@ -0,0 +1,51 @@
>>>>>> +#define LOAD32(rn, name) \
>>>>>> + lis rn,name##@h; \
>>>>>> + ori rn,rn,name##@l
>>>>>> +
>>>>>> +#define ENTRY(func_name) \
>>>>>> + .text; \
>>>>>> + .align 2; \
>>>>>> + .globl .func_name; \
>>>>>> + .func_name: \
>>>>>> + .globl func_name; \
>>>>>> + func_name:
>>>>>> +
>>>>>> +#define KVMPPC_HCALL_BASE 0xf000
>>>>>> +#define KVMPPC_H_RTAS (KVMPPC_HCALL_BASE + 0x0)
>>>>>> +#define KVMPPC_H_VOF_CLIENT (KVMPPC_HCALL_BASE + 0x5)
>>>>>> +
>>>>>> + . = 0x100 /* Do exactly as SLOF does */
>>>>>> +
>>>>>> +ENTRY(_start)
>>>>>> +# LOAD32(%r31, 0) /* Go 32bit mode */
>>>>>> +# mtmsrd %r31,0
>>>>>> + LOAD32(2, __toc_start)
>>>>>> + b entry_c
>>>>>> +
>>>>>> +ENTRY(_prom_entry)
>>>>>> + LOAD32(2, __toc_start)
>>>>>> + stwu %r1,-112(%r1)
>>>>>> + stw %r31,104(%r1)
>>>>>> + mflr %r31
>>>>>> + bl prom_entry
>>>>>> + nop
>>>>>> + mtlr %r31
>>>>>> + ld %r31,104(%r1)
>>>>>
>>>>> It's getting there, now I see the first client call from the guest
>>>>> boot code but then it crashes on this ld opcode which apparently is
>>>>> 64 bit only:
>>>>
>>>> Oh right.
>>>>
>>>>
>>>>> Hopefully this is the last such opcode left before I can really
>>>>> test this.
>>>>
>>>> Make it lwz, and test it?
>>>
>>> Yes, figured that out too after sending this message. Replacing with
>>> lwz works but I wonder that now you have stwu lwz do the stack
>>> offsets need adjusting too or you just waste 4 bytes now?
>>
>> Well, this assumes the 64bit client and that ABI. I think ideally the
>> firmware is supposed to use its own stack but I did not bother here. I
>> do not know 32bit ABI at all so say whether the existing code should
>> just work or not :-/
>
> It seems to work so that's OK, just thought if the firmware is 32 bit it
> does not need 64 bit values on stack but if that's also potentially used
> by a 64 bit kernel then it may be better to keep it that way to avoid
> confusion. With the 64 bit opcodes replaced it seems to work on pegasos2
> and the guest can call CI functions and get a reply so maybe it's just a
> few wasted bytes that's not a big deal.
>
>>> With lwz here I found no further 64 bit opcodes and the guest boot
>>> code could walk the device tree. It failed later but I think that's
>>> because I'll need to fill more info about the machine in the device
>>> tree. I'll experiment with that but it looks like it could work at
>>> least for MorphOS. I'll have to try Linux too.
>>
>> There are plenty of tracepoints, enable them all.
>
> I'm running with -trace enable="vof*" but it does not give me too much
> info as a lot of calls (such as peer, child, etc.) don't log anything
> other than there was a hypercall so only get info about opening paths
> and querying some props. The MorphOS boot.img just walks the device tree
> gathering some data about the machine then calls quiesce and boot into
> the OS that later tries to use the gathered info at which point it
> crashes without any logs if some info is not as expected. This does not
> make it easy to debug but I think once I fill the device tree enough
> with all needed info it should work. Currently I'm missing info about
> PCI devices that it may need.
One thing to note about PCI is that normally I think the client expects
the firmware to do PCI probing and SLOF does it. But VOF does not and
Linux scans PCI bus(es) itself. Might be a problem for you kernel.
>
>>>>> Do you have some info on how the stdout works in VOF? I think I'll
>>>>> need that to test with Linux and get output but I'm not sure what's
>>>>> needed on the machine side.
>>>>
>>>> VOF opens stsout and stores the ihandle (in fdt) which the client
>>>> (==kernel) uses for writing. To make it work properly, you need to
>>>> hook up that instance to a device backend similar to what I have for
>>>> spapr-vty:
>>>>
>>>> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd39cc5dad5b6385965d
>>>>
>>>> This is not a part of this patch as I'm trying to keep things
>>>> simpler and accessing backends from VOF is still unsettled. But
>>>> there is a workaround which is trace_vof_write, I use this. Thanks,
>>>
>>> The above patch is about stdin but stdout seems to be added by the
>>> current vof patch. What is spapr-vty?
>>
>> It is pseries' paravirtual serial device, pegasos does not have it.
>>
>>> I don't think I have something similar in pegasos2 where I just have
>>> a normal serial port created by ISASuperIO in the vt8231 model.
>>
>> Correct.
>>
>>> Can I use that backend somehow or have to create some other serial
>>> device to connect to stdout?
>>> Does trace_vof_write work for stuff output by the guest?
>>> I guess that's only for things printed by VOF itself
>>
>> VOF itself does not prints anything in this patch.
>
> However it seems to be needed for linux as the first thing it does seems
> to be getting /chosen/stdout and calls exit if it returns nothing. So
Right, Linux does but VOF (==vof.bin) does not.
> I'll need this at least for linux. (I think MorphOS may also query it to
> print a banner or some messages but not sure it needs it, at least it
> does not abort right away if not found.)
Tracepoints print this :)
>>> but to see Linux output do I need a stdout in VOF or it will just
>>> open the serial with its own driver and use that?
>>> So I'm not sure what's the stdout parts in the current vof patch does
>>> and if I need that for anything. I'll try to experiment with it some
>>> more but fixing the ld and Kconfig seems to be enough to get it work
>>> for me.
>>
>> So for the client to print something, /chosen/stdout needs to have a
>> valid ihandle.
>> The only way to get a valid ihandle is having a valid phandle which
>> vof_client_open() can open.
>> A valid phandle is a phandle of any node in the device tree. On spapr
>> we pick some spapr-vty, open it and store in /chosen/stdout.
>>
>> From this point output from the client can be seen via a tracepoint.
>>
>> Now if we want proper output without tracepoints - we need to hook it
>> up with some chardev backend (not a device such a vt8231 or spapr-vty
>> but backend).
>
> I don't know much about it but devices are also connected to some
> backend so is it possible to use the same backend for VOF as used for
> the normal serial port?
Yes but with this initial patch there is no backend support, you only
get tracepoints.
> But I need a way to find that and connect it to
> VOF and I'm not qure how to do that yet.
Pick some device in the machine reset code (or you can open the root -
"/"), resolve its FW (==FDT) path, call vof_client_open_store() on it,
it will store ihandle in the FDT. This will enable stdout and the output
can be seen via tracepoint.
> Or do I need to create a
> separate serial backend and connect that to VOF? I'll try to look at
> spapr-vty to see what it does.
No additional devices needed.
>
>> https://github.com/aik/qemu/commit/a381a5b50c23c74013e2bd3 does this:
>> 1. when a phandle is open, QEMU will search for DeviceState* for the
>> specific FDT node and get a chardev from the device.
>> 2. when write() is called, QEMU calls qemu_chr_fe_write_all() on
>> chardev from 1.
>>
>> From this point you do not need a tracepoint and the output will
>> appears in the console you set up for stdout.
>>
>> Now if you want input from this console, things get tricky. First, on
>> powernv/pseries we only need this for grub as otherwise the kernel has
>> all the drivers needed and will not use the client interface. For the
>> grub, we need to provide a valid ihandle for /chosen/stdin which is
>> easy but implementing read() on this is not as there is no simple
>> device-type-independend way of reading from chardev. I hacked it for
>> spapr-tvy but other serial devices will need special handling, or
>> we'll have to introduce some VOF_SERIAL_READ interface for those which
>> will face opposition :)
>>
>> Makes sense?
>
> It explains things a bit but still not entirely clear how can I get
> something to add as a stdout. With the pegasos2 firmware it puts the
> serial device there normally that it inits and opens. Without that
> firmware we have to somehow do that from QEMU so find the serial backend
> used by the serial device within the vt8231 model (or use a different
> backend just for this?) then open it and put it in the device tree. If
> that's correct or how to do it is not clear yet.
spapr looks through all spapr-vty and picks one with the lowest @reg.
You can do a similar thing. Or add a machine option with a serial device
id which you want to be the default console. So many options :)
--
Alexey
next prev parent reply other threads:[~2021-05-23 3:21 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-20 9:05 [PATCH qemu v20] spapr: Implement Open Firmware client interface Alexey Kardashevskiy
2021-05-20 21:59 ` BALATON Zoltan
2021-05-21 0:25 ` Alexey Kardashevskiy
2021-05-21 9:05 ` BALATON Zoltan
2021-05-21 19:57 ` BALATON Zoltan
2021-05-22 6:39 ` Alexey Kardashevskiy
2021-05-22 13:08 ` BALATON Zoltan
2021-05-23 3:47 ` Alexey Kardashevskiy
2021-05-23 12:12 ` BALATON Zoltan
2021-05-22 6:22 ` Alexey Kardashevskiy
2021-05-22 13:01 ` BALATON Zoltan
2021-05-22 15:02 ` BALATON Zoltan
2021-05-22 16:46 ` BALATON Zoltan
2021-05-23 3:41 ` Alexey Kardashevskiy
2021-05-23 12:02 ` BALATON Zoltan
2021-05-23 3:31 ` Alexey Kardashevskiy
2021-05-23 11:24 ` BALATON Zoltan
2021-05-24 4:26 ` Alexey Kardashevskiy
2021-05-24 5:40 ` David Gibson
2021-05-24 11:56 ` BALATON Zoltan
2021-05-23 3:20 ` Alexey Kardashevskiy [this message]
2021-05-23 11:19 ` BALATON Zoltan
2021-05-23 17:09 ` BALATON Zoltan
2021-05-24 6:01 ` David Gibson
2021-05-24 10:55 ` BALATON Zoltan
2021-05-24 12:46 ` Alexey Kardashevskiy
2021-05-24 22:34 ` BALATON Zoltan
2021-05-25 5:24 ` David Gibson
2021-05-25 5:23 ` David Gibson
2021-05-25 10:08 ` BALATON Zoltan
2021-05-27 5:34 ` David Gibson
2021-05-27 12:42 ` BALATON Zoltan
2021-06-02 7:57 ` David Gibson
2021-06-02 12:29 ` BALATON Zoltan
2021-06-04 6:29 ` David Gibson
2021-06-04 13:59 ` BALATON Zoltan
2021-06-07 3:30 ` David Gibson
2021-06-07 22:54 ` BALATON Zoltan
2021-06-09 5:51 ` Alexey Kardashevskiy
2021-06-09 10:19 ` BALATON Zoltan
2021-06-06 22:21 ` BALATON Zoltan
2021-06-07 3:37 ` David Gibson
2021-06-07 22:20 ` BALATON Zoltan
2021-05-24 12:42 ` BALATON Zoltan
2021-05-25 5:29 ` David Gibson
2021-05-25 9:55 ` BALATON Zoltan
2021-05-27 5:31 ` David Gibson
2021-05-24 5:23 ` David Gibson
2021-05-24 9:57 ` BALATON Zoltan
2021-05-24 10:50 ` David Gibson
2021-05-29 18:10 ` BALATON Zoltan
2021-05-30 17:33 ` BALATON Zoltan
2021-05-31 13:07 ` BALATON Zoltan
2021-06-01 12:02 ` Alexey Kardashevskiy
2021-06-01 14:12 ` BALATON Zoltan
2021-06-04 6:21 ` David Gibson
2021-06-04 13:27 ` BALATON Zoltan
2021-06-07 3:02 ` David Gibson
2021-06-04 6:19 ` David Gibson
2021-06-04 13:50 ` BALATON Zoltan
2021-06-04 14:34 ` BALATON Zoltan
2021-06-07 3:05 ` David Gibson
2021-06-09 6:13 ` Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4f6ceca3-5f18-fe70-18f9-4efde8feb1ed@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=balaton@eik.bme.hu \
--cc=david@gibson.dropbear.id.au \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).