qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: "Dr. David Alan Gilbert" <dave@treblig.org>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls
Date: Mon, 3 Feb 2025 15:36:19 -0500	[thread overview]
Message-ID: <Z6Eow-Ei3CvLy1vG@x1.local> (raw)
In-Reply-To: <afb27de1-d20a-4b0d-b271-ef6eef0e06ed@maciej.szmigiero.name>

On Mon, Feb 03, 2025 at 09:15:52PM +0100, Maciej S. Szmigiero wrote:
> On 3.02.2025 20:58, Peter Xu wrote:
> > On Mon, Feb 03, 2025 at 02:57:36PM +0100, Maciej S. Szmigiero wrote:
> > > On 2.02.2025 13:45, Dr. David Alan Gilbert wrote:
> > > > * Maciej S. Szmigiero (mail@maciej.szmigiero.name) wrote:
> > > > > On 2.02.2025 03:06, Dr. David Alan Gilbert wrote:
> > > > > > * Maciej S. Szmigiero (mail@maciej.szmigiero.name) wrote:
> > > > > > > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> > > > > > > 
> > > > > > > postcopy_ram_listen_thread() is a free running thread, so it needs to
> > > > > > > take BQL around function calls to migration methods requiring BQL.
> > > > > > > 
> > > > > > > qemu_loadvm_state_main() needs BQL held since it ultimately calls
> > > > > > > "load_state" SaveVMHandlers.
> > > > > > > 
> > > > > > > migration_incoming_state_destroy() needs BQL held since it ultimately calls
> > > > > > > "load_cleanup" SaveVMHandlers.
> > > > > > > 
> > > > > > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> > > > > > > ---
> > > > > > >     migration/savevm.c | 4 ++++
> > > > > > >     1 file changed, 4 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > > > > index b0b74140daea..0ceea9638cc1 100644
> > > > > > > --- a/migration/savevm.c
> > > > > > > +++ b/migration/savevm.c
> > > > > > > @@ -2013,7 +2013,9 @@ static void *postcopy_ram_listen_thread(void *opaque)
> > > > > > >          * in qemu_file, and thus we must be blocking now.
> > > > > > >          */
> > > > > > >         qemu_file_set_blocking(f, true);
> > > > > > > +    bql_lock();
> > > > > > >         load_res = qemu_loadvm_state_main(f, mis);
> > > > > > > +    bql_unlock();
> > > > > > 
> > > > > > Doesn't that leave that held for a heck of a long time?
> > > > > 
> > > > > Yes, and it effectively broke "postcopy recover" test but I
> > > > > think the reason for that is qemu_loadvm_state_main() and
> > > > > its children don't drop BQL while waiting for I/O.
> > > > > 
> > > > > I've described this case in more detail in my reply to Fabiano here:
> > > > > https://lore.kernel.org/qemu-devel/0a09e627-955e-4f26-8d08-0192ecd250a8@maciej.szmigiero.name/
> > > > 
> > > > While it might be the cause in this case, my feeling is it's more fundamental
> > > > here - it's the whole reason that postcopy has a separate ram listen
> > > > thread.  As the destination is running, after it loads it's devices
> > > > and as it starts up the destination will be still loading RAM
> > > > (and other postcopiable devices) potentially for quite a while.
> > > > Holding the bql around the ram listen thread means that the
> > > > execution of the destination won't be able to take that lock
> > > > until the postcopy load has finished; so while that might apparently
> > > > complete, it'll lead to the destination stalling until that's finished
> > > > which defeats the whole point of postcopy.
> > > > That last one probably won't fail a test but it will lead to a long stall
> > > > if you give it a nice big guest with lots of RAM that it's rapidly
> > > > changing.
> > > 
> > > Okay, I understand the postcopy case/flow now.
> > > Thanks for explaining it clearly.
> > > 
> > > > > I still think that "load_state" SaveVMHandlers need to be called
> > > > > with BQL held since implementations apparently expect it that way:
> > > > > for example, I think PCI device configuration restore calls
> > > > > address space manipulation methods which abort() if called
> > > > > without BQL held.
> > > > 
> > > > However, the only devices that *should* be arriving on the channel
> > > > that the postcopy_ram_listen_thread is reading from are those
> > > > that are postcopiable (i.e. RAM and hmm block's dirty_bitmap).
> > > > Those load handlers are safe to be run while the other devices
> > > > are being changed.   Note the *should* - you could add a check
> > > > to fail if any other device arrives on that channel.
> > > 
> > > I think ultimately there should be either an explicit check, or,
> > > as you suggest in the paragraph below, a separate SaveVMHandler
> > > that runs without BQL held.
> > 
> > To me those are bugs happening during postcopy, so those abort()s in
> > memory.c are indeed for catching these issues too.
> > 
> > > Since the current state of just running these SaveVMHandlers
> > > without BQL in this case and hoping that nothing breaks is
> > > clearly sub-optimal.
> > > 
> > > > > I have previously even submitted a patch to explicitly document
> > > > > "load_state" SaveVMHandler as requiring BQL (which was also
> > > > > included in the previous version of this patch set) and it
> > > > > received a "Reviewed-by:" tag:
> > > > > https://lore.kernel.org/qemu-devel/6976f129df610c8207da4e531c8c0475ec204fa4.1730203967.git.maciej.szmigiero@oracle.com/
> > > > > https://lore.kernel.org/qemu-devel/e1949839932efaa531e2fe63ac13324e5787439c.1731773021.git.maciej.szmigiero@oracle.com/
> > > > > https://lore.kernel.org/qemu-devel/87o732bti7.fsf@suse.de/
> > > > 
> > > > It happens!
> > > > You could make this safer by having a load_state and a load_state_postcopy
> > > > member, and only mark the load_state as requiring the lock.
> > > 
> > > To not digress too much from the subject of this patch set
> > > (multifd VFIO device state transfer) for now I've just updated the
> > > TODO comment around that qemu_loadvm_state_main(), so hopefully this
> > > discussion won't get forgotten:
> > > https://gitlab.com/maciejsszmigiero/qemu/-/commit/046e3deac5b1dbc406b3e9571f62468bd6743e79
> > 
> > The commit message may still need some touch ups, e.g.:
> > 
> >    postcopy_ram_listen_thread() is a free running thread, so it needs to
> >    take BQL around function calls to migration methods requiring BQL.
> > 
> > 
> > This sentence is still not correct, IMHO. As Dave explained, the ram load
> > thread is designed to run without BQL at least for the major workloads it
> > runs.
> 
> So what's your proposed wording of this commit then?

Perhaps dropping it? As either it implies qemu_loadvm_state_main() needs to
take bql (which could be wrong in case of postcopy at least from
design.. not sanity check pov), or it provides no real meaning to suggest
where to take it..

Personally I would put the comment as easy as possible - the large portion
isn't helping me to understand the code but only made it slightly more
confusing..

    /*
     * TODO: qemu_loadvm_state_main() could call "load_state" SaveVMHandlers
     * that are expecting BQL to be held, which isn't in this case.
     *
     * In principle, the only devices that should be arriving on this channel
     * now are those that are postcopiable and whose load handlers are safe
     * to be called without BQL being held.
     *
     * But nothing currently prevents the source from sending data for "unsafe"
     * devices which would cause trouble here.
     */

IMHO we could put it very simple if you think we need such sanity check
later:

    /* TODO: sanity check that only postcopiable data will be loaded here */

> 
> > I don't worry on src sending something that crashes the dest: if that
> > happens, that's a bug, we need to fix it..  In that case abort() either in
> > memory.c or migration/ would be the same.
> 
> Yeah, but it would be a bug in the source (or just bit stream corruption for
> any reason), yet it's the destination which would abort() or crash.
> 
> I think cases like that in principle should be handled more gracefully,
> like exiting the destination QEMU with an error.
> But that's something outside of the scope of this patch set.

Yes I agree.  It's just that postcopy normally cannot gracefully quits on
dest anyway.. as src QEMU cannot continue with a dead dest QEMU. For
obvious programming errors, I think abort() is still ok in this case, on
either src or dest if postcopy already started.

For this series, we could always stick with precopy, it could help converge
the series.

> 
> > We could add some explicit check
> > in migration code, but I don't expect it to catch anything real, at least
> > such never happened since postcopy introduced.. so it's roughly 10 years
> > without anything like that happens.
> > 
> > Taking BQL for migration_incoming_state_destroy() looks all safe.  There's
> > one qemu_ram_block_writeback() which made me a bit nervous initially, but
> > then looks like RAM backends should be almost noop (for shmem and
> > hugetlbfs) but except pmem.
> 
> That's the only part where taking BQL is actually necessary for the
> functionality of this patch set to work properly, so it's fine to leave
> that call to qemu_loadvm_state_main() as-is (without BQL) for time being.
> 
> > 
> > The other alternative is we define load_cleanup() to not rely on BQL (which
> > I believe is true before this series?), then take it only when VFIO's path
> > needs it.
> 
> I think other paths always call load_cleanup() with BQL so it's probably
> safer to have consistent semantics here.

IMHO we don't necessarily need to make it the default that vmstate handler
hooks will need BQL by default - we can always properly define them to best
suite our need.

For this case I think it's ok either way. But I'm assuming: (1) no serious
users run QEMU RAMs on normal file systems (or RAM's cleanup() can do
msync() on those, which can flush page caches for a long time to disks),
and (2) pmem isn't important.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-02-03 20:37 UTC|newest]

Thread overview: 137+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-30 10:08 [PATCH v4 00/33] Multifd 🔀 device state transfer support with VFIO consumer Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 01/33] migration: Clarify that {load, save}_cleanup handlers can run without setup Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 02/33] thread-pool: Remove thread_pool_submit() function Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 03/33] thread-pool: Rename AIO pool functions to *_aio() and data types to *Aio Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 04/33] thread-pool: Implement generic (non-AIO) pool support Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 05/33] migration: Add MIG_CMD_SWITCHOVER_START and its load handler Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 06/33] migration: Add qemu_loadvm_load_state_buffer() and its handler Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 07/33] io: tls: Allow terminating the TLS session gracefully with EOF Maciej S. Szmigiero
2025-02-04 15:15   ` Daniel P. Berrangé
2025-02-04 16:02     ` Maciej S. Szmigiero
2025-02-04 16:14       ` Daniel P. Berrangé
2025-02-04 18:25         ` Maciej S. Szmigiero
2025-02-06 21:53           ` Peter Xu
2025-01-30 10:08 ` [PATCH v4 08/33] migration/multifd: Allow premature EOF on TLS incoming channels Maciej S. Szmigiero
2025-02-03 18:20   ` Peter Xu
2025-02-03 18:53     ` Maciej S. Szmigiero
2025-02-03 20:20       ` Peter Xu
2025-02-03 21:41         ` Maciej S. Szmigiero
2025-02-03 22:56           ` Peter Xu
2025-02-04 13:51             ` Fabiano Rosas
2025-02-04 14:39             ` Maciej S. Szmigiero
2025-02-04 15:00               ` Fabiano Rosas
2025-02-04 15:10                 ` Maciej S. Szmigiero
2025-02-04 15:31               ` Peter Xu
2025-02-04 15:39                 ` Daniel P. Berrangé
2025-02-05 19:09                   ` Fabiano Rosas
2025-02-05 20:42                     ` Fabiano Rosas
2025-02-05 20:55                       ` Maciej S. Szmigiero
2025-02-06 14:13                         ` Fabiano Rosas
2025-02-06 14:53                           ` Maciej S. Szmigiero
2025-02-06 15:20                             ` Fabiano Rosas
2025-02-06 16:01                               ` Maciej S. Szmigiero
2025-02-06 17:32                                 ` Fabiano Rosas
2025-02-06 17:55                                   ` Maciej S. Szmigiero
2025-02-06 21:51                                   ` Peter Xu
2025-02-07 13:17                                     ` Fabiano Rosas
2025-02-07 14:04                                       ` Peter Xu
2025-02-07 14:16                                         ` Fabiano Rosas
2025-02-05 21:13                       ` Peter Xu
2025-02-06 14:19                         ` Fabiano Rosas
2025-02-04 15:10         ` Daniel P. Berrangé
2025-02-04 15:08     ` Daniel P. Berrangé
2025-02-04 16:02       ` Peter Xu
2025-02-04 16:12         ` Daniel P. Berrangé
2025-02-04 16:29           ` Peter Xu
2025-02-04 18:25         ` Fabiano Rosas
2025-02-04 19:34           ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 09/33] migration: postcopy_ram_listen_thread() needs to take BQL for some calls Maciej S. Szmigiero
2025-02-02  2:06   ` Dr. David Alan Gilbert
2025-02-02 11:55     ` Maciej S. Szmigiero
2025-02-02 12:45       ` Dr. David Alan Gilbert
2025-02-03 13:57         ` Maciej S. Szmigiero
2025-02-03 19:58           ` Peter Xu
2025-02-03 20:15             ` Maciej S. Szmigiero
2025-02-03 20:36               ` Peter Xu [this message]
2025-02-03 21:41                 ` Maciej S. Szmigiero
2025-02-03 23:02                   ` Peter Xu
2025-02-04 14:57                     ` Maciej S. Szmigiero
2025-02-04 15:39                       ` Peter Xu
2025-02-04 19:32                         ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 10/33] error: define g_autoptr() cleanup function for the Error type Maciej S. Szmigiero
2025-02-03 20:53   ` Peter Xu
2025-02-03 21:13   ` Daniel P. Berrangé
2025-02-03 21:51     ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 11/33] migration: Add thread pool of optional load threads Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 12/33] migration/multifd: Split packet into header and RAM data Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 13/33] migration/multifd: Device state transfer support - receive side Maciej S. Szmigiero
2025-02-03 21:27   ` Peter Xu
2025-02-03 22:18     ` Maciej S. Szmigiero
2025-02-03 22:59       ` Peter Xu
2025-02-04 14:40         ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 14/33] migration/multifd: Make multifd_send() thread safe Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 15/33] migration/multifd: Add an explicit MultiFDSendData destructor Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 16/33] migration/multifd: Device state transfer support - send side Maciej S. Szmigiero
2025-02-03 21:47   ` Peter Xu
2025-01-30 10:08 ` [PATCH v4 17/33] migration/multifd: Make MultiFDSendData a struct Maciej S. Szmigiero
2025-02-07 14:36   ` Fabiano Rosas
2025-02-07 19:43     ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 18/33] migration/multifd: Add multifd_device_state_supported() Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 19/33] migration: Add save_live_complete_precopy_thread handler Maciej S. Szmigiero
2025-02-04 17:54   ` Peter Xu
2025-02-04 19:32     ` Maciej S. Szmigiero
2025-02-04 20:34       ` Peter Xu
2025-02-05 11:53         ` Maciej S. Szmigiero
2025-02-05 15:55           ` Peter Xu
2025-02-06 11:41             ` Maciej S. Szmigiero
2025-02-06 22:16               ` Peter Xu
2025-01-30 10:08 ` [PATCH v4 20/33] vfio/migration: Add x-migration-load-config-after-iter VFIO property Maciej S. Szmigiero
2025-02-10 17:24   ` Cédric Le Goater
2025-02-11 14:37     ` Maciej S. Szmigiero
2025-02-11 15:00       ` Cédric Le Goater
2025-02-11 15:57         ` Maciej S. Szmigiero
2025-02-11 16:28           ` Cédric Le Goater
2025-01-30 10:08 ` [PATCH v4 21/33] vfio/migration: Add load_device_config_state_start trace event Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 22/33] vfio/migration: Convert bytes_transferred counter to atomic Maciej S. Szmigiero
2025-01-30 21:35   ` Cédric Le Goater
2025-01-31  9:47     ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 23/33] vfio/migration: Multifd device state transfer support - basic types Maciej S. Szmigiero
2025-02-10 17:17   ` Cédric Le Goater
2025-01-30 10:08 ` [PATCH v4 24/33] vfio/migration: Multifd device state transfer support - VFIOStateBuffer(s) Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 25/33] vfio/migration: Multifd device state transfer - add support checking function Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 26/33] vfio/migration: Multifd device state transfer support - receive init/cleanup Maciej S. Szmigiero
2025-02-12 10:55   ` Cédric Le Goater
2025-02-14 20:55     ` Maciej S. Szmigiero
2025-02-17  9:38       ` Cédric Le Goater
2025-02-17 22:13         ` Maciej S. Szmigiero
2025-02-18  7:54           ` Cédric Le Goater
2025-01-30 10:08 ` [PATCH v4 27/33] vfio/migration: Multifd device state transfer support - received buffers queuing Maciej S. Szmigiero
2025-02-12 13:47   ` Cédric Le Goater
2025-02-14 20:58     ` Maciej S. Szmigiero
2025-02-17 13:48       ` Cédric Le Goater
2025-02-17 22:15         ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 28/33] vfio/migration: Multifd device state transfer support - load thread Maciej S. Szmigiero
2025-02-12 15:48   ` Cédric Le Goater
2025-02-12 16:19     ` Cédric Le Goater
2025-02-17 22:09       ` Maciej S. Szmigiero
2025-02-17 22:09     ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 29/33] vfio/migration: Multifd device state transfer support - config loading support Maciej S. Szmigiero
2025-02-12 16:21   ` Cédric Le Goater
2025-02-17 22:09     ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 30/33] migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 31/33] vfio/migration: Multifd device state transfer support - send side Maciej S. Szmigiero
2025-02-12 17:03   ` Cédric Le Goater
2025-02-17 22:12     ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 32/33] vfio/migration: Add x-migration-multifd-transfer VFIO property Maciej S. Szmigiero
2025-02-12 17:10   ` Cédric Le Goater
2025-02-14 20:56     ` Maciej S. Szmigiero
2025-02-17 13:57       ` Cédric Le Goater
2025-02-17 14:16         ` Maciej S. Szmigiero
2025-01-30 10:08 ` [PATCH v4 33/33] hw/core/machine: Add compat for " Maciej S. Szmigiero
2025-01-30 20:19 ` [PATCH v4 00/33] Multifd 🔀 device state transfer support with VFIO consumer Fabiano Rosas
2025-01-30 20:27   ` Maciej S. Szmigiero
2025-01-30 20:46     ` Fabiano Rosas
2025-01-31 18:16     ` Maciej S. Szmigiero
2025-02-03 14:19 ` Cédric Le Goater
2025-02-21  6:57   ` Yanghang Liu
2025-02-22  9:51     ` Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6Eow-Ei3CvLy1vG@x1.local \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=armbru@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=berrange@redhat.com \
    --cc=clg@redhat.com \
    --cc=dave@treblig.org \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=joao.m.martins@oracle.com \
    --cc=mail@maciej.szmigiero.name \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).