qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Fabiano Rosas <farosas@suse.de>
Cc: qemu-devel@nongnu.org,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	"Cédric Le Goater" <clg@redhat.com>,
	"Avihai Horon" <avihaih@nvidia.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Prasad Pandit" <ppandit@redhat.com>
Subject: Re: [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complete
Date: Fri, 6 Dec 2024 09:40:56 -0500	[thread overview]
Message-ID: <Z1MM-LqLTYWv7ztO@x1n> (raw)
In-Reply-To: <87a5d9c5nh.fsf@suse.de>

On Fri, Dec 06, 2024 at 10:17:22AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> > complete()") removed the FLUSH operation on complete() which should avoid
> > one global sync on destination side, because it's not needed.
> >
> > However that commit overlooked multifd_ram_flush_and_sync() part of things,
> > as that's always used together with the FLUSH message on the main channel.
> >
> > For very old binaries (multifd_flush_after_each_section==true), that's
> > still needed because each EOS received on destination will enforce
> > all-channel sync once.
> >
> > For new binaries (multifd_flush_after_each_section==false), the flush and
> > sync shouldn't be needed just like the FLUSH, with the same reasoning.
> >
> > Currently, on new binaries we'll still send SYNC messages on multifd
> > channels, even if FLUSH is omitted at the end.  It'll make all recv threads
> > hang at SYNC message.
> >
> > Multifd is still all working fine because luckily recv side cleanup
> > code (mostly multifd_recv_sync_main()) is smart enough to make sure even if
> > recv threads are stuck at SYNC it'll get kicked out.  And since this is the
> > completion phase of migration, nothing else will be sent after the SYNCs.
> >
> > This may be needed for VFIO in the future because VFIO can have data to
> > push after ram_save_complete(), hence we don't want the recv thread to be
> > stuck in SYNC message. Remove this limitation will make src even slightly
> > faster too to avoid some more code.
> >
> > Stable branches do not need this patch, as no real bug I can think of that
> > will go wrong there.. so not attaching Fixes to be clear on the backport
> > not needed.
> 
> You forgot my comments on the commit message, so here's a brand-new one

Yes I did.. :(

> with some things spelt out more explicitly:
> 
> --
> Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
> complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
> ram_save_complete(), because the sync on the destination side is not
> needed due to the last iteration of find_dirty_block() having already
> done it.
> 
> However, that commit overlooked that multifd_ram_flush_and_sync() on the
> source side is also not needed at ram_save_complete(), for the same
> reason.
> 
> Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
> multifd_ram_flush_and_sync() means that currently the recv threads will
> hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
> destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
> received.
> 
> Luckily, multifd is still all working fine because recv side cleanup
> code (mostly multifd_recv_sync_main()) is smart enough to make sure even
> if recv threads are stuck at SYNC it'll get kicked out. And since this
> is the completion phase of migration, nothing else will be sent after
> the SYNCs.
> 
> This needs to be fixed because in the future VFIO will have data to push
> after ram_save_complete() and we don't want the recv thread to be stuck
> in the MULTIFD_FLAG_SYNC message.
> 
> Remove the unnecessary (and buggy) invocation of
> multifd_ram_flush_and_sync().
> 
> For very old binaries (multifd_flush_after_each_section==true), the
> flush_and_sync is still needed because each EOS received on destination
> will enforce all-channel sync once.
> 
> Stable branches do not need this patch, as no real bug I can think of
> that will go wrong there.. so not attaching Fixes to be clear on the
> backport not needed.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Fabiano Rosas <farosas@suse.de>

Sure, I used it, thanks.

> 
> > ---
> >  migration/ram.c | 13 ++++++++++---
> >  1 file changed, 10 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/ram.c b/migration/ram.c
> > index 05ff9eb328..7284c34bd8 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
> >          }
> >      }
> >  
> > -    ret = multifd_ram_flush_and_sync();
> > -    if (ret < 0) {
> > -        return ret;
> > +    if (migrate_multifd() &&
> > +        migrate_multifd_flush_after_each_section()) {
> > +        /*
> > +         * Only the old dest QEMU will need this sync, because each EOS
> > +         * will require one SYNC message on each channel.
> > +         */
> > +        ret = multifd_ram_flush_and_sync();
> > +        if (ret < 0) {
> > +            return ret;
> > +        }
> >      }
> >  
> >      if (migrate_mapped_ram()) {
> 

-- 
Peter Xu



  reply	other threads:[~2024-12-06 14:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-06  0:58 [PATCH v2 0/7] migration/multifd: Some VFIO / postcopy preparations on flush Peter Xu
2024-12-06  0:58 ` [PATCH v2 1/7] migration/multifd: Further remove the SYNC on complete Peter Xu
2024-12-06 13:17   ` Fabiano Rosas
2024-12-06 14:40     ` Peter Xu [this message]
2024-12-06  0:58 ` [PATCH v2 2/7] migration/multifd: Allow to sync with sender threads only Peter Xu
2024-12-06 13:26   ` Fabiano Rosas
2024-12-06 14:50     ` Peter Xu
2024-12-06 15:00       ` Fabiano Rosas
2024-12-06  0:58 ` [PATCH v2 3/7] migration/ram: Move RAM_SAVE_FLAG* into ram.h Peter Xu
2024-12-06 13:43   ` Fabiano Rosas
2024-12-06 15:03     ` Peter Xu
2024-12-06 15:10       ` Fabiano Rosas
2024-12-06 15:46         ` Peter Xu
2024-12-06 16:58           ` Fabiano Rosas
2024-12-06  0:58 ` [PATCH v2 4/7] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages Peter Xu
2024-12-06 14:12   ` Fabiano Rosas
2024-12-06  0:58 ` [PATCH v2 5/7] migration/multifd: Remove sync processing on postcopy Peter Xu
2024-12-06 14:19   ` Fabiano Rosas
2024-12-06  0:58 ` [PATCH v2 6/7] migration/multifd: Cleanup src flushes on condition check Peter Xu
2024-12-06 14:18   ` Fabiano Rosas
2024-12-06 15:13     ` Peter Xu
2024-12-06  0:58 ` [PATCH v2 7/7] migration/multifd: Document the reason to sync for save_setup() Peter Xu
2024-12-06 14:40   ` Fabiano Rosas
2024-12-06 15:36     ` Peter Xu
2024-12-06 17:01       ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1MM-LqLTYWv7ztO@x1n \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=farosas@suse.de \
    --cc=mail@maciej.szmigiero.name \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).