qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH qemu] ppc/spapr: Receive and store device tree blob from SLOF
Date: Wed, 12 Dec 2018 11:29:55 +1100	[thread overview]
Message-ID: <20181212002955.GD2719@umbus.fritz.box> (raw)
In-Reply-To: <20181211105559.76040a70@bahia.lab.toulouse-stg.fr.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 8805 bytes --]

On Tue, Dec 11, 2018 at 10:55:59AM +0100, Greg Kurz wrote:
> On Tue, 11 Dec 2018 14:53:32 +1100
> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
> > On 10/12/2018 20:30, Greg Kurz wrote:
> > > On Mon, 10 Dec 2018 17:20:43 +1100
> > > David Gibson <david@gibson.dropbear.id.au> wrote:
> > >   
> > >> On Mon, Nov 12, 2018 at 03:12:26PM +1100, Alexey Kardashevskiy wrote:  
> > >>>
> > >>>
> > >>> On 12/11/2018 05:10, Greg Kurz wrote:    
> > >>>> Hi Alexey,
> > >>>>
> > >>>> Just a few remarks. See below.
> > >>>>
> > >>>> On Thu,  8 Nov 2018 12:44:06 +1100
> > >>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> > >>>>     
> > >>>>> SLOF receives a device tree and updates it with various properties
> > >>>>> before switching to the guest kernel and QEMU is not aware of any changes
> > >>>>> made by SLOF. Since there is no real RTAS (QEMU implements it), it makes
> > >>>>> sense to pass the SLOF final device tree to QEMU to let it implement
> > >>>>> RTAS related tasks better, such as PCI host bus adapter hotplug.
> > >>>>>
> > >>>>> Specifially, now QEMU can find out the actual XICS phandle (for PHB
> > >>>>> hotplug) and the RTAS linux,rtas-entry/base properties (for firmware
> > >>>>> assisted NMI - FWNMI).
> > >>>>>
> > >>>>> This stores the initial DT blob in the sPAPR machine and replaces it
> > >>>>> in the KVMPPC_H_UPDATE_DT (new private hypercall) handler.
> > >>>>>
> > >>>>> This adds an @update_dt_enabled machine property to allow backward
> > >>>>> migration.
> > >>>>>
> > >>>>> SLOF already has a hypercall since
> > >>>>> https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183
> > >>>>>
> > >>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > >>>>> ---
> > >>>>>  include/hw/ppc/spapr.h |  7 ++++++-
> > >>>>>  hw/ppc/spapr.c         | 29 ++++++++++++++++++++++++++++-
> > >>>>>  hw/ppc/spapr_hcall.c   | 32 ++++++++++++++++++++++++++++++++
> > >>>>>  hw/ppc/trace-events    |  2 ++
> > >>>>>  4 files changed, 68 insertions(+), 2 deletions(-)
> > >>>>>
> > >>>>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > >>>>> index ad4d7cfd97..f5dcaf44cb 100644
> > >>>>> --- a/include/hw/ppc/spapr.h
> > >>>>> +++ b/include/hw/ppc/spapr.h
> > >>>>> @@ -100,6 +100,7 @@ struct sPAPRMachineClass {
> > >>>>>  
> > >>>>>      /*< public >*/
> > >>>>>      bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs */
> > >>>>> +    bool update_dt_enabled;    /* enable KVMPPC_H_UPDATE_DT */
> > >>>>>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
> > >>>>>      bool pre_2_10_has_unused_icps;
> > >>>>>      bool legacy_irq_allocation;
> > >>>>> @@ -136,6 +137,9 @@ struct sPAPRMachineState {
> > >>>>>      int vrma_adjust;
> > >>>>>      ssize_t rtas_size;
> > >>>>>      void *rtas_blob;
> > >>>>> +    uint32_t fdt_size;
> > >>>>> +    uint32_t fdt_initial_size;    
> > >>>>
> > >>>> I don't quite see the purpose of fdt_initial_size... it seems to be only
> > >>>> used to print a trace.    
> > >>>
> > >>>
> > >>> Ah, lost in rebase. The purpose was to test if the new device tree has
> > >>> not grown too much.
> > >>>
> > >>>
> > >>>     
> > >>>>     
> > >>>>> +    void *fdt_blob;
> > >>>>>      long kernel_size;
> > >>>>>      bool kernel_le;
> > >>>>>      uint32_t initrd_base;
> > >>>>> @@ -462,7 +466,8 @@ struct sPAPRMachineState {
> > >>>>>  #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
> > >>>>>  /* Client Architecture support */
> > >>>>>  #define KVMPPC_H_CAS            (KVMPPC_HCALL_BASE + 0x2)
> > >>>>> -#define KVMPPC_HCALL_MAX        KVMPPC_H_CAS
> > >>>>> +#define KVMPPC_H_UPDATE_DT      (KVMPPC_HCALL_BASE + 0x3)
> > >>>>> +#define KVMPPC_HCALL_MAX        KVMPPC_H_UPDATE_DT
> > >>>>>  
> > >>>>>  typedef struct sPAPRDeviceTreeUpdateHeader {
> > >>>>>      uint32_t version_id;
> > >>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > >>>>> index c08130facb..5e2d4d211c 100644
> > >>>>> --- a/hw/ppc/spapr.c
> > >>>>> +++ b/hw/ppc/spapr.c
> > >>>>> @@ -1633,7 +1633,10 @@ static void spapr_machine_reset(void)
> > >>>>>      /* Load the fdt */
> > >>>>>      qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
> > >>>>>      cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
> > >>>>> -    g_free(fdt);
> > >>>>> +    g_free(spapr->fdt_blob);
> > >>>>> +    spapr->fdt_size = fdt_totalsize(fdt);
> > >>>>> +    spapr->fdt_initial_size = spapr->fdt_size;
> > >>>>> +    spapr->fdt_blob = fdt;    
> > >>>>
> > >>>> Hmm... It looks weird to store state in a reset handler. I'd rather zeroe
> > >>>> both fdt_blob and fdt_size here.    
> > >>>
> > >>> The device tree is built from the reset handler and the idea is that we
> > >>> want to always have some tree in the machine.    
> > >>
> > >> Yes, I think the approach here is fine.  Otherwise when we want to
> > >> look up the current fdt state in RTAS calls or whatever we'd always
> > >> have to do
> > >> 	if (fdt_blob)
> > >> 		look up that
> > >> 	else
> > >> 		look up qemu created fdt.
> > >>  
> > > 
> > > No. We only have one fdt blob: the initial one, I'd rather
> > > call reset time one, or the updated one.  
> > 
> > There is one fdt in the machine, always. Either initial or from cas.
> 
> Yeah, reset time fdt is either the initial one, either cas... and I'm now
> wandering what happens if migration occurs between cas that sets cas_reboot
> and the corresponding reset. With the current code base, I have the impression
> that the destination will redo the full cas+cas_reboot cycle after restart or
> am I missing something ?

Yes, I believe that's correct.  It's kind of an edge case and that CAS
cycle should still complete ok, it'll just take a little longer to
boot, so I thought that was preferable to the complexity of migrating
the CAS state.

> > >> Incidentally 'fdt' and 'fdt_blob' names do a terrible job of
> > >> distinguishing what the difference is.  Renaming fdt to fdt_initial
> > >> (to match fdt_initial_size) and fdt_blob to fdt should make that
> > >> clearer.
> > >>  
> > > 
> > > As mentioned earlier in this thread, spapr->fdt_initial_size is only used
> > > for tracing if the received fdt blob fails fdt_check_full()...
> > > 
> > > $ git grep -H fdt_initial_size
> > > hw/ppc/spapr.c:    spapr->fdt_initial_size = spapr->fdt_size;
> > > hw/ppc/spapr.c:        VMSTATE_UINT32(fdt_initial_size, sPAPRMachineState),
> > > hw/ppc/spapr_hcall.c:        trace_spapr_update_dt_failed(spapr->fdt_initial_size, cb,
> > > include/hw/ppc/spapr.h:    uint32_t fdt_initial_size;
> > > 
> > > Not sure it is helpful, and anyway, it is expected to be the same in source
> > > and destination, so why put it in the migration stream ?  
> > 
> > 
> > Well, we do build the fdt anyway even when receive migration but we do
> > not have to and yes we can expect the fdt on the destination to be of
> > the same size since it is the same command line, it is just guessing and
> > expecting vs. knowing and I prefer the latter as the reset time fdt and
> > migration source fdt might have different size because of
> > host-model/host-serial/slot-label/similar properties.
> 
> Right but I still don't see the usefulness of fdt_initial_size...

So, it's there to address exactly the problem you pointed out elswhere
in the thread: the idea was to disallow the guest resubmitting an fdt
which is "too much" bigger than the original one, thereby consuming a
bunch of qemu memory.  The thought was that this is a bit more robust
that just checking against a fixed max size, especially if we need to
increase that fixed size in future to handle really big partitions.

> > > The only case where we want to migrate something is when h_update_dt() has
> > > succeeded, ie, the guest passed a valid DT blob. This implies that its
> > > size isn't 0, otherwise fdt_check_full() would return -FDT_ERR_TRUNCATED.
> > > 
> > > I would suggest rather to:
> > > 
> > > - completely drop spapr->fdt_initial_size
> > > - clear spapr->fdt_size at machine reset
> > > - migrate if spapr->fdt_size is not zero
> > > 
> > > Also, I've just realized another problem... nothing prevents a malicious
> > > guest to pass an insanely great size to h_update_dt, which would cause
> > > g_malloc0() to abort... The passed size should be checked against
> > > FDT_MAX_SIZE.  
> > 
> > Good point. Just noticed - as posted, the checker actually checks the
> > reset time tree, not the updated one, my bad :)
> > 
> > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2018-12-12  0:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-08  1:44 [Qemu-devel] [PATCH qemu] ppc/spapr: Receive and store device tree blob from SLOF Alexey Kardashevskiy
2018-11-11 18:10 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2018-11-12  4:12   ` Alexey Kardashevskiy
2018-11-12  9:05     ` Greg Kurz
2018-11-13  5:31       ` Alexey Kardashevskiy
2018-12-10  6:20     ` David Gibson
2018-12-10  9:30       ` Greg Kurz
2018-12-11  3:53         ` Alexey Kardashevskiy
2018-12-11  9:55           ` Greg Kurz
2018-12-12  0:29             ` David Gibson [this message]
2018-12-12 16:54               ` Greg Kurz
2018-12-11  3:36       ` Alexey Kardashevskiy
2018-12-12  0:20         ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181212002955.GD2719@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=aik@ozlabs.ru \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).