From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org,
"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
"Cédric Le Goater" <clg@redhat.com>,
"Avihai Horon" <avihaih@nvidia.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Prasad Pandit" <ppandit@redhat.com>
Subject: Re: [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complete
Date: Fri, 6 Dec 2024 09:40:56 -0500 [thread overview]
Message-ID: <Z1MM-LqLTYWv7ztO@x1n> (raw)
In-Reply-To: <87a5d9c5nh.fsf@suse.de>
On Fri, Dec 06, 2024 at 10:17:22AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> > complete()") removed the FLUSH operation on complete() which should avoid
> > one global sync on destination side, because it's not needed.
> >
> > However that commit overlooked multifd_ram_flush_and_sync() part of things,
> > as that's always used together with the FLUSH message on the main channel.
> >
> > For very old binaries (multifd_flush_after_each_section==true), that's
> > still needed because each EOS received on destination will enforce
> > all-channel sync once.
> >
> > For new binaries (multifd_flush_after_each_section==false), the flush and
> > sync shouldn't be needed just like the FLUSH, with the same reasoning.
> >
> > Currently, on new binaries we'll still send SYNC messages on multifd
> > channels, even if FLUSH is omitted at the end. It'll make all recv threads
> > hang at SYNC message.
> >
> > Multifd is still all working fine because luckily recv side cleanup
> > code (mostly multifd_recv_sync_main()) is smart enough to make sure even if
> > recv threads are stuck at SYNC it'll get kicked out. And since this is the
> > completion phase of migration, nothing else will be sent after the SYNCs.
> >
> > This may be needed for VFIO in the future because VFIO can have data to
> > push after ram_save_complete(), hence we don't want the recv thread to be
> > stuck in SYNC message. Remove this limitation will make src even slightly
> > faster too to avoid some more code.
> >
> > Stable branches do not need this patch, as no real bug I can think of that
> > will go wrong there.. so not attaching Fixes to be clear on the backport
> > not needed.
>
> You forgot my comments on the commit message, so here's a brand-new one
Yes I did.. :(
> with some things spelt out more explicitly:
>
> --
> Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
> ram_save_complete(), because the sync on the destination side is not
> needed due to the last iteration of find_dirty_block() having already
> done it.
>
> However, that commit overlooked that multifd_ram_flush_and_sync() on the
> source side is also not needed at ram_save_complete(), for the same
> reason.
>
> Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
> multifd_ram_flush_and_sync() means that currently the recv threads will
> hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
> destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
> received.
>
> Luckily, multifd is still all working fine because recv side cleanup
> code (mostly multifd_recv_sync_main()) is smart enough to make sure even
> if recv threads are stuck at SYNC it'll get kicked out. And since this
> is the completion phase of migration, nothing else will be sent after
> the SYNCs.
>
> This needs to be fixed because in the future VFIO will have data to push
> after ram_save_complete() and we don't want the recv thread to be stuck
> in the MULTIFD_FLAG_SYNC message.
>
> Remove the unnecessary (and buggy) invocation of
> multifd_ram_flush_and_sync().
>
> For very old binaries (multifd_flush_after_each_section==true), the
> flush_and_sync is still needed because each EOS received on destination
> will enforce all-channel sync once.
>
> Stable branches do not need this patch, as no real bug I can think of
> that will go wrong there.. so not attaching Fixes to be clear on the
> backport not needed.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
Sure, I used it, thanks.
>
> > ---
> > migration/ram.c | 13 ++++++++++---
> > 1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 05ff9eb328..7284c34bd8 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> > }
> > }
> >
> > - ret = multifd_ram_flush_and_sync();
> > - if (ret < 0) {
> > - return ret;
> > + if (migrate_multifd() &&
> > + migrate_multifd_flush_after_each_section()) {
> > + /*
> > + * Only the old dest QEMU will need this sync, because each EOS
> > + * will require one SYNC message on each channel.
> > + */
> > + ret = multifd_ram_flush_and_sync();
> > + if (ret < 0) {
> > + return ret;
> > + }
> > }
> >
> > if (migrate_mapped_ram()) {
>
--
Peter Xu
next prev parent reply other threads:[~2024-12-06 14:41 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-06 0:58 [PATCH v2 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
2024-12-06 0:58 ` [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complete Peter Xu
2024-12-06 13:17 ` Fabiano Rosas
2024-12-06 14:40 ` Peter Xu [this message]
2024-12-06 0:58 ` [PATCH v2 2/7] migration/multifd: Allow to sync with sender threads only Peter Xu
2024-12-06 13:26 ` Fabiano Rosas
2024-12-06 14:50 ` Peter Xu
2024-12-06 15:00 ` Fabiano Rosas
2024-12-06 0:58 ` [PATCH v2 3/7] migration/ram: Move RAM_SAVE_FLAG* into ram.h Peter Xu
2024-12-06 13:43 ` Fabiano Rosas
2024-12-06 15:03 ` Peter Xu
2024-12-06 15:10 ` Fabiano Rosas
2024-12-06 15:46 ` Peter Xu
2024-12-06 16:58 ` Fabiano Rosas
2024-12-06 0:58 ` [PATCH v2 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages Peter Xu
2024-12-06 14:12 ` Fabiano Rosas
2024-12-06 0:58 ` [PATCH v2 5/7] migration/multifd: Remove sync processing on postcopy Peter Xu
2024-12-06 14:19 ` Fabiano Rosas
2024-12-06 0:58 ` [PATCH v2 6/7] migration/multifd: Cleanup src flushes on condition check Peter Xu
2024-12-06 14:18 ` Fabiano Rosas
2024-12-06 15:13 ` Peter Xu
2024-12-06 0:58 ` [PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup() Peter Xu
2024-12-06 14:40 ` Fabiano Rosas
2024-12-06 15:36 ` Peter Xu
2024-12-06 17:01 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z1MM-LqLTYWv7ztO@x1n \
--to=peterx@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=farosas@suse.de \
--cc=mail@maciej.szmigiero.name \
--cc=ppandit@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).