From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
david@gibson.dropbear.id.au,
Scott Cheloha <cheloha@linux.vnet.ibm.com>,
qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion
Date: Fri, 18 Oct 2019 18:26:38 +0100 [thread overview]
Message-ID: <20191018172638.GD2990@work-vm> (raw)
In-Reply-To: <157141671749.15348.15966144834012002565@sif>
* Michael Roth (mdroth@linux.vnet.ibm.com) wrote:
> Quoting Dr. David Alan Gilbert (2019-10-18 04:43:52)
> > * Laurent Vivier (lvivier@redhat.com) wrote:
> > > On 18/10/2019 10:16, Dr. David Alan Gilbert wrote:
> > > > * Scott Cheloha (cheloha@linux.vnet.ibm.com) wrote:
> > > >> savevm_state's SaveStateEntry TAILQ is a priority queue. Priority
> > > >> sorting is maintained by searching from head to tail for a suitable
> > > >> insertion spot. Insertion is thus an O(n) operation.
> > > >>
> > > >> If we instead keep track of the head of each priority's subqueue
> > > >> within that larger queue we can reduce this operation to O(1) time.
> > > >>
> > > >> savevm_state_handler_remove() becomes slightly more complex to
> > > >> accomodate these gains: we need to replace the head of a priority's
> > > >> subqueue when removing it.
> > > >>
> > > >> With O(1) insertion, booting VMs with many SaveStateEntry objects is
> > > >> more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such
> > > >> objects to insert.
> > > >
> > > > Separate from reviewing this patch, I'd like to understand why you've
> > > > got 40000 objects. This feels very very wrong and is likely to cause
> > > > problems to random other bits of qemu as well.
> > >
> > > I think the 40000 objects are the "dr-connectors" that are used to plug
> > > peripherals (memory, pci card, cpus, ...).
> >
> > Yes, Scott confirmed that in the reply to the previous version.
> > IMHO nothing in qemu is designed to deal with that many devices/objects
> > - I'm sure that something other than the migration code is going to get upset.
>
> The device/object management aspect seems to handle things *mostly* okay, at
> least ever since QOM child properties started being tracked by a hash table
> instead of a linked list. It's worth noting that that change (b604a854) was
> done to better handle IRQ pins for ARM guests with lots of CPUs. I think it is
> inevitable that certain machine types/configurations will call for large
> numbers of objects and I think it is fair to improve things to allow for this
> sort of scalability.
>
> But I agree it shouldn't be abused, and you're right that there are some
> problem areas that arise. Trying to outline them:
>
> a) introspection commands like 'info qom-tree' become pretty unwieldly,
> and with large enough numbers of objects might even break things (QMP
> response size limits maybe?)
> b) various related lists like reset handlers, vmstate/savevm handlers might
> grow quite large
>
> I think we could work around a) with maybe flagging certain
> "internally-only" objects as 'hidden'. Introspection routines could then
> filter these out, and routines like qom-set/qom-get could return report
> something similar to EACCESS so they are never used/useful to management
> tools.
>
> In cases like b) we can optimize things where it makes sense like with
> Scott's patch here. In most cases these lists need to be walked one way
> or another, whether it's done internally by the object or through common
> interfaces provided by QEMU. It's really just the O(n^2) type handling
> where relying on common interfaces becomes drastically less efficient,
> but I think we should avoid implementing things in that way anyway, or
> improve them as needed.
>
> >
> > Is perhaps the structure wrong somewhere - should there be a single DRC
> > device that knows about all DRCs?
>
> That's an interesting proposition, I think it's worth exploring further,
> but from a high level:
>
> - each SpaprDrc has migration state, and some sub-classes SpaprDrc (e.g.
> SpaprDrcPhysical) have additional migration state. These are sent
> as-needed as separate VMState entries in the migration stream.
> Moving to a single DRC means we're either sending them as an flat
> array or a sparse list, which would put just as much load on the
> migration code (at least, with Scott's changes in place). It would
> also be difficult to do all this in a way which maintains migration
> compatibility with older machine types.
Having sparse arrays etc within a vmstate isn't as bad; none of
them actually need to be 'objects' as such - even if you have
separate chunks of VMState.
> - other aspects of modeling these as QOM objects, such as look-ups,
> reset-handling, and memory allocations, wouldn't be dramatically
> improved upon by handling it all internally within the object
>
> AFAICT the biggest issue with modeling the DRCs as individual objects
> is actually how we deal with introspection, and we should try to
> improve. What do you think of the alternative suggestion above of
> marking certain objects as 'hidden' from various introspection
> interfaces?
That's one for someone who knows/cares about QOM more than me;
Paolo, Dan Berrange, or Eduardo Habkost are QOM people.
Dave
> >
> > Dave
> >
> >
> > > https://github.com/qemu/qemu/blob/master/hw/ppc/spapr_drc.c
> > >
> > > They are part of SPAPR specification.
> > >
> > > https://raw.githubusercontent.com/qemu/qemu/master/docs/specs/ppc-spapr-hotplug.txt
> > >
> > > CC Michael Roth
> > >
> > > Thanks,
> > > Laurent
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> >
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2019-10-18 17:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-17 20:59 [PATCH v2 0/2] migration: faster savevm_state_handler_insert() Scott Cheloha
2019-10-17 20:59 ` [PATCH v2 1/2] migration: add savevm_state_handler_remove() Scott Cheloha
2019-12-04 16:43 ` Dr. David Alan Gilbert
2020-01-08 19:07 ` Juan Quintela
2019-10-17 20:59 ` [PATCH v2 2/2] migration: savevm_state_handler_insert: constant-time element insertion Scott Cheloha
2019-10-18 8:16 ` Dr. David Alan Gilbert
2019-10-18 8:34 ` Laurent Vivier
2019-10-18 9:43 ` Dr. David Alan Gilbert
2019-10-18 16:38 ` Michael Roth
2019-10-18 17:26 ` Dr. David Alan Gilbert [this message]
2019-10-21 7:33 ` David Gibson
2019-10-19 10:12 ` David Gibson
2019-10-21 8:14 ` Dr. David Alan Gilbert
2019-11-20 21:48 ` Scott Cheloha
2019-12-04 16:49 ` Dr. David Alan Gilbert
2019-12-04 22:28 ` David Gibson
2019-12-04 16:47 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191018172638.GD2990@work-vm \
--to=dgilbert@redhat.com \
--cc=cheloha@linux.vnet.ibm.com \
--cc=david@gibson.dropbear.id.au \
--cc=lvivier@redhat.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).