From: Peter Xu <peterx@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Fabiano Rosas" <farosas@suse.de>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Cédric Le Goater" <clg@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Joao Martins" <joao.m.martins@oracle.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH v2 08/17] migration: Add load_finish handler and associated functions
Date: Fri, 20 Sep 2024 12:45:50 -0400 [thread overview]
Message-ID: <Zu2mvrKOvmD1WtvD@x1n> (raw)
In-Reply-To: <bbed8165-de5c-4ebe-a6cc-ff33f9ea363a@maciej.szmigiero.name>
On Fri, Sep 20, 2024 at 05:23:08PM +0200, Maciej S. Szmigiero wrote:
> On 19.09.2024 23:11, Peter Xu wrote:
> > On Thu, Sep 19, 2024 at 09:49:10PM +0200, Maciej S. Szmigiero wrote:
> > > On 9.09.2024 22:03, Peter Xu wrote:
> > > > On Tue, Aug 27, 2024 at 07:54:27PM +0200, Maciej S. Szmigiero wrote:
> > > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > > >
> > > > > load_finish SaveVMHandler allows migration code to poll whether
> > > > > a device-specific asynchronous device state loading operation had finished.
> > > > >
> > > > > In order to avoid calling this handler needlessly the device is supposed
> > > > > to notify the migration code of its possible readiness via a call to
> > > > > qemu_loadvm_load_finish_ready_broadcast() while holding
> > > > > qemu_loadvm_load_finish_ready_lock.
> > > > >
> > > > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> > > > > ---
> > > > > include/migration/register.h | 21 +++++++++++++++
> > > > > migration/migration.c | 6 +++++
> > > > > migration/migration.h | 3 +++
> > > > > migration/savevm.c | 52 ++++++++++++++++++++++++++++++++++++
> > > > > migration/savevm.h | 4 +++
> > > > > 5 files changed, 86 insertions(+)
> > > > >
> > > > > diff --git a/include/migration/register.h b/include/migration/register.h
> > > > > index 4a578f140713..44d8cf5192ae 100644
> > > > > --- a/include/migration/register.h
> > > > > +++ b/include/migration/register.h
> > > > > @@ -278,6 +278,27 @@ typedef struct SaveVMHandlers {
> > > > > int (*load_state_buffer)(void *opaque, char *data, size_t data_size,
> > > > > Error **errp);
> > > > > + /**
> > > > > + * @load_finish
> > > > > + *
> > > > > + * Poll whether all asynchronous device state loading had finished.
> > > > > + * Not called on the load failure path.
> > > > > + *
> > > > > + * Called while holding the qemu_loadvm_load_finish_ready_lock.
> > > > > + *
> > > > > + * If this method signals "not ready" then it might not be called
> > > > > + * again until qemu_loadvm_load_finish_ready_broadcast() is invoked
> > > > > + * while holding qemu_loadvm_load_finish_ready_lock.
> > > >
> > > > [1]
> > > >
> > > > > + *
> > > > > + * @opaque: data pointer passed to register_savevm_live()
> > > > > + * @is_finished: whether the loading had finished (output parameter)
> > > > > + * @errp: pointer to Error*, to store an error if it happens.
> > > > > + *
> > > > > + * Returns zero to indicate success and negative for error
> > > > > + * It's not an error that the loading still hasn't finished.
> > > > > + */
> > > > > + int (*load_finish)(void *opaque, bool *is_finished, Error **errp);
> > > >
> > > > The load_finish() semantics is a bit weird, especially above [1] on "only
> > > > allowed to be called once if ..." and also on the locks.
> > >
> > > The point of this remark is that a driver needs to call
> > > qemu_loadvm_load_finish_ready_broadcast() if it wants for the migration
> > > core to call its load_finish handler again.
> > >
> > > > It looks to me vfio_load_finish() also does the final load of the device.
> > > >
> > > > I wonder whether that final load can be done in the threads,
> > >
> > > Here, the problem is that current VFIO VMState has to be loaded from the main
> > > migration thread as it internally calls QEMU core address space modification
> > > methods which explode if called from another thread(s).
> >
> > Ahh, I see. I'm trying to make dest qemu loadvm in a thread too and yield
> > BQL if possible, when that's ready then in your case here IIUC you can
> > simply take BQL in whichever thread that loads it.. but yeah it's not ready
> > at least..
>
> Yeah, long term we might want to work on making these QEMU core address space
> modification methods somehow callable from multiple threads but that's
> definitely not something for the initial patch set.
>
> > Would it be possible vfio_save_complete_precopy_async_thread_config_state()
> > be done in VFIO's save_live_complete_precopy() through the main channel
> > somehow? IOW, does it rely on iterative data to be fetched first from
> > kernel, or completely separate states?
>
> The device state data needs to be fully loaded first before "activating"
> the device by loading its config state.
>
> > And just curious: how large is it
> > normally (and I suppose this decides whether it's applicable to be sent via
> > the main channel at all..)?
>
> Config data is *much* smaller than device state data - as far as I remember
> it was on order of kilobytes.
>
> > >
> > > > then after
> > > > everything loaded the device post a semaphore telling the main thread to
> > > > continue. See e.g.:
> > > >
> > > > if (migrate_switchover_ack()) {
> > > > qemu_loadvm_state_switchover_ack_needed(mis);
> > > > }
> > > >
> > > > IIUC, VFIO can register load_complete_ack similarly so it only sem_post()
> > > > when all things are loaded? We can then get rid of this slightly awkward
> > > > interface. I had a feeling that things can be simplified (e.g., if the
> > > > thread will take care of loading the final vmstate then the mutex is also
> > > > not needed? etc.).
> > >
> > > With just a single call to switchover_ack_needed per VFIO device it would
> > > need to do a blocking wait for the device buffers and config state load
> > > to finish, therefore blocking other VFIO devices from potentially loading
> > > their config state if they are ready to begin this operation earlier.
> >
> > I am not sure I get you here, loading VFIO device states (I mean, the
> > non-iterable part) will need to be done sequentially IIUC due to what you
> > said and should rely on BQL, so I don't know how that could happen
> > concurrently for now. But I think indeed BQL is a problem.
> Consider that we have two VFIO devices (A and B), with the following order
> of switchover_ack_needed handler calls for them: first A get this call,
> once the call for A finishes then B gets this call.
>
> Now consider what happens if B had loaded all its buffers (in the loading
> thread) and it is ready for its config load before A finished loading its
> buffers.
>
> B has to wait idle in this situation (even though it could have been already
> loading its config) since the switchover_ack_needed handler for A won't
> return until A is fully done.
This sounds like a performance concern, and I wonder how much this impacts
the real workload (that you run a test and measure, with/without such
concurrency) when we can save two devices in parallel anyway; I would
expect the real diff is small due to the fact I mentioned that we save >1
VFIO devices concurrently via multifd.
Do you think we can start with a simpler approach?
So what I'm thinking could be very clean is, we just discussed about
MIG_CMD_SWITCHOVER and looks like you also think it's an OK approach. I
wonder when with it why not we move one step further to have
MIG_CMD_SEND_NON_ITERABE just to mark that "iterable devices all done,
ready to send non-iterable". It can be controlled by the same migration
property so we only send these two flags in 9.2+ machine types.
Then IIUC VFIO can send config data through main wire (just like most of
other pci devices! which is IMHO a good fit..) and on destination VFIO
holds off loading them until passing the MIG_CMD_SEND_NON_ITERABE phase.
Side note: when looking again, I really think we should cleanup some
migration switchover phase functions, e.g. I think
qemu_savevm_state_complete_precopy() parameters are pretty confusing,
especially iterable_only, even if inside it it also have some postcopy
implicit checks, urgh.. but this is not relevant to our discussion, and I
won't draft that before your series land; that can complicate stuff.
>
> > So IMHO this recv side interface so far is the major pain that I really
> > want to avoid (comparing to the rest) in the series. Let's see whether we
> > can come up with something better..
> >
> > One other (probably not pretty..) idea is when waiting here in the main
> > thread it yields BQL, then other threads can take it and load the VFIO
> > final chunk of data. But I could miss something else.
> >
>
> I think temporary dropping BQL deep inside migration code is similar
> to running QEMU event loop deep inside migration code (about which
> people complained in my generic thread pool implementation): it's easy
> to miss some subtle dependency/race somewhere and accidentally cause rare
> hard to debug deadlock.
>
> That's why I think that it's ultimately probably better to make QEMU core
> address space modification methods thread safe / re-entrant instead.
Right, let's see how you think about above.
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2024-09-20 16:46 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-27 17:54 [PATCH v2 00/17] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 01/17] vfio/migration: Add save_{iterate, complete_precopy}_started trace events Maciej S. Szmigiero
2024-09-05 13:08 ` [PATCH v2 01/17] vfio/migration: Add save_{iterate,complete_precopy}_started " Avihai Horon
2024-09-09 18:04 ` Maciej S. Szmigiero
2024-09-11 14:50 ` Avihai Horon
2024-08-27 17:54 ` [PATCH v2 02/17] migration/ram: Add load start trace event Maciej S. Szmigiero
2024-08-28 18:44 ` Fabiano Rosas
2024-08-28 20:21 ` Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 03/17] migration/multifd: Zero p->flags before starting filling a packet Maciej S. Szmigiero
2024-08-28 18:50 ` Fabiano Rosas
2024-09-09 15:41 ` Peter Xu
2024-08-27 17:54 ` [PATCH v2 04/17] thread-pool: Add a DestroyNotify parameter to thread_pool_submit{, _aio)() Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 05/17] thread-pool: Implement non-AIO (generic) pool support Maciej S. Szmigiero
2024-09-02 22:07 ` Fabiano Rosas
2024-09-03 12:02 ` Maciej S. Szmigiero
2024-09-03 14:26 ` Fabiano Rosas
2024-09-03 18:14 ` Maciej S. Szmigiero
2024-09-03 13:55 ` Stefan Hajnoczi
2024-09-03 16:54 ` Maciej S. Szmigiero
2024-09-03 19:04 ` Stefan Hajnoczi
2024-09-09 16:45 ` Peter Xu
2024-09-09 18:38 ` Maciej S. Szmigiero
2024-09-09 19:12 ` Peter Xu
2024-09-09 19:16 ` Maciej S. Szmigiero
2024-09-09 19:24 ` Peter Xu
2024-08-27 17:54 ` [PATCH v2 06/17] migration: Add save_live_complete_precopy_{begin, end} handlers Maciej S. Szmigiero
2024-08-28 19:03 ` [PATCH v2 06/17] migration: Add save_live_complete_precopy_{begin,end} handlers Fabiano Rosas
2024-09-05 13:45 ` Avihai Horon
2024-09-09 17:59 ` Peter Xu
2024-09-09 18:32 ` Maciej S. Szmigiero
2024-09-09 19:08 ` Peter Xu
2024-09-09 19:32 ` Peter Xu
2024-09-19 19:48 ` Maciej S. Szmigiero
2024-09-19 19:47 ` Maciej S. Szmigiero
2024-09-19 20:54 ` Peter Xu
2024-09-20 15:22 ` Maciej S. Szmigiero
2024-09-20 16:08 ` Peter Xu
2024-09-09 18:05 ` Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 07/17] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2024-08-30 19:05 ` Fabiano Rosas
2024-09-05 14:15 ` Avihai Horon
2024-09-09 18:05 ` Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 08/17] migration: Add load_finish handler and associated functions Maciej S. Szmigiero
2024-08-30 19:28 ` Fabiano Rosas
2024-09-05 15:13 ` Avihai Horon
2024-09-09 18:05 ` Maciej S. Szmigiero
2024-09-09 20:03 ` Peter Xu
2024-09-19 19:49 ` Maciej S. Szmigiero
2024-09-19 21:11 ` Peter Xu
2024-09-20 15:23 ` Maciej S. Szmigiero
2024-09-20 16:45 ` Peter Xu [this message]
2024-09-26 22:34 ` Maciej S. Szmigiero
2024-09-27 0:53 ` Peter Xu
2024-09-30 19:25 ` Maciej S. Szmigiero
2024-09-30 21:57 ` Peter Xu
2024-10-01 20:41 ` Maciej S. Szmigiero
2024-10-01 21:30 ` Peter Xu
2024-10-02 20:11 ` Maciej S. Szmigiero
2024-10-02 21:25 ` Peter Xu
2024-10-03 20:34 ` Maciej S. Szmigiero
2024-10-03 21:17 ` Peter Xu
2024-08-27 17:54 ` [PATCH v2 09/17] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2024-08-30 20:22 ` Fabiano Rosas
2024-09-02 20:12 ` Maciej S. Szmigiero
2024-09-03 14:42 ` Fabiano Rosas
2024-09-03 18:41 ` Maciej S. Szmigiero
2024-09-09 19:52 ` Peter Xu
2024-09-19 19:49 ` Maciej S. Szmigiero
2024-09-05 16:47 ` Avihai Horon
2024-09-09 18:05 ` Maciej S. Szmigiero
2024-09-12 8:13 ` Avihai Horon
2024-09-12 13:52 ` Fabiano Rosas
2024-09-19 19:59 ` Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 10/17] migration/multifd: Convert multifd_send()::next_channel to atomic Maciej S. Szmigiero
2024-08-30 18:13 ` Fabiano Rosas
2024-09-02 20:11 ` Maciej S. Szmigiero
2024-09-03 15:01 ` Fabiano Rosas
2024-09-03 20:04 ` Maciej S. Szmigiero
2024-09-10 14:13 ` Peter Xu
2024-08-27 17:54 ` [PATCH v2 11/17] migration/multifd: Add an explicit MultiFDSendData destructor Maciej S. Szmigiero
2024-08-30 13:12 ` Fabiano Rosas
2024-08-27 17:54 ` [PATCH v2 12/17] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2024-08-29 0:41 ` Fabiano Rosas
2024-08-29 20:03 ` Maciej S. Szmigiero
2024-08-30 13:02 ` Fabiano Rosas
2024-09-09 19:40 ` Peter Xu
2024-09-19 19:50 ` Maciej S. Szmigiero
2024-09-10 19:48 ` Peter Xu
2024-09-12 18:43 ` Fabiano Rosas
2024-09-13 0:23 ` Peter Xu
2024-09-13 13:21 ` Fabiano Rosas
2024-09-13 14:19 ` Peter Xu
2024-09-13 15:04 ` Fabiano Rosas
2024-09-13 15:22 ` Peter Xu
2024-09-13 18:26 ` Fabiano Rosas
2024-09-17 15:39 ` Peter Xu
2024-09-17 17:07 ` Cédric Le Goater
2024-09-17 17:50 ` Peter Xu
2024-09-19 19:51 ` Maciej S. Szmigiero
2024-09-19 19:49 ` Maciej S. Szmigiero
2024-09-19 21:17 ` Peter Xu
2024-09-20 15:23 ` Maciej S. Szmigiero
2024-09-20 17:09 ` Peter Xu
2024-09-10 16:06 ` Peter Xu
2024-09-19 19:49 ` Maciej S. Szmigiero
2024-09-19 21:18 ` Peter Xu
2024-08-27 17:54 ` [PATCH v2 13/17] migration/multifd: Add migration_has_device_state_support() Maciej S. Szmigiero
2024-08-30 18:55 ` Fabiano Rosas
2024-09-02 20:11 ` Maciej S. Szmigiero
2024-09-03 15:09 ` Fabiano Rosas
2024-08-27 17:54 ` [PATCH v2 14/17] migration: Add save_live_complete_precopy_thread handler Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 15/17] vfio/migration: Multifd device state transfer support - receive side Maciej S. Szmigiero
2024-09-09 8:55 ` Avihai Horon
2024-09-09 18:06 ` Maciej S. Szmigiero
2024-09-12 8:20 ` Avihai Horon
2024-09-12 8:45 ` Cédric Le Goater
2024-08-27 17:54 ` [PATCH v2 16/17] vfio/migration: Add x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2024-08-27 17:54 ` [PATCH v2 17/17] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2024-09-09 11:41 ` Avihai Horon
2024-09-09 18:07 ` Maciej S. Szmigiero
2024-09-12 8:26 ` Avihai Horon
2024-09-12 8:57 ` Cédric Le Goater
2024-08-28 20:46 ` [PATCH v2 00/17] Multifd 🔀 device state transfer support with VFIO consumer Fabiano Rosas
2024-08-28 21:58 ` Maciej S. Szmigiero
2024-08-29 0:51 ` Fabiano Rosas
2024-08-29 20:02 ` Maciej S. Szmigiero
2024-10-11 13:58 ` Cédric Le Goater
2024-10-15 21:12 ` Maciej S. Szmigiero
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zu2mvrKOvmD1WtvD@x1n \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=armbru@redhat.com \
--cc=avihaih@nvidia.com \
--cc=berrange@redhat.com \
--cc=clg@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=joao.m.martins@oracle.com \
--cc=mail@maciej.szmigiero.name \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.