All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Prasad Pandit <ppandit@redhat.com>
Cc: qemu-devel@nongnu.org, farosas@suse.de, berrange@redhat.com,
	Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PATCH v5 3/5] migration: enable multifd and postcopy together
Date: Mon, 10 Feb 2025 11:59:14 -0500	[thread overview]
Message-ID: <Z6owYoktb5nk2yRw@x1.local> (raw)
In-Reply-To: <CAE8KmOwMTw-m0w+JbFBZ7mn-ZuSNfpk9xbq-_KbLXu7_kDhDVg@mail.gmail.com>

On Sat, Feb 08, 2025 at 04:06:56PM +0530, Prasad Pandit wrote:
> Hello Peter,
> 
> On Fri, 7 Feb 2025 at 21:16, Peter Xu <peterx@redhat.com> wrote:
> > This is not easy to follow neither with the current name, nor that you
> > "assumed this is main channel" and test it.
> 
> * It is not my doing, nor is there any assumption, but that is how
> current implementation works.
> ===
> static bool migration_should_start_incoming(bool main_channel)
> {
>     /* Multifd doesn't start unless all channels are established */
>     if (migrate_multifd()) {
>         return migration_has_all_channels();
>     }
> 
>     /* Preempt channel only starts when the main channel is created */
>     if (migrate_postcopy_preempt()) {
>         return main_channel;
>     }
> 
>     /*
>      * For all the rest types of migration, we should only reach here when
>      * it's the main channel that's being created, and we should always
>      * proceed with this channel.
>      */
>     assert(main_channel);
>     return true;
> }
> ===
> * Above code returns 'true' for 'multifd' and 'main_channel' cases.
> When migrate_postcopy_preempt() is true, main_channel is 'false', so
> it returns false. All I have done is reused the
> migration_should_start_incoming() function to simplify the 'if'
> conditional at the top.

Yes, and I suggest a rename or introduce a new helper, per previous reply.

> 
> > I think you may want to split
> > migration_has_all_channels() into migration_has_essential_channels() which
> > only covers main and multifd cases.  Then you can check if (!has_esential)
> > here.  You'd better also add a comment that all "essential channels" can be
> > peeked.
> >
> > You may also want to bypass a few things, e.g. "postcopy paused stage" here
> > rather than inside, because postcopy-recover only happens:
> >
> >   - First with a main channel, that is not peekable as no header when resume
> >   - Then with preempt channel, that is also not peekable
> >
> > [a]
> >
> > You may also need to keep the mapped-ram check.  They also don't support
> > peek.
> 
> * Instead of adding specific conditions and splitting functions, my
> request is, let's work towards consistent channel behaviour that will
> automatically simplify these conditions and channel handling. Maybe we
> can do that in a subsequent series.

I didn't follow, sorry - do you mean this patch is correct on dropping the
mapped-ram check? I don't yet understand how it can work if without.

> 
> > >
> > > > > +        } else if (mis->from_src_file
> > > > > +            && (!strcmp(ioc->name, "migration-tls-incoming")
> > > > > +                || !strcmp(ioc->name, "migration-file-incoming"))) {
> > > > > +            channel = CH_MULTIFD;
> > > >
> > > > Confused here too.  Why do we need to check ioc name? Shouldn't multifd has
> > > > the headers?
> > >
> > > * Because they are not 'multifd' channels, tls/file channels don't
> > > send magic values, but are still handled by
> >
> > It might be because you have a bug where you removed mapped-ram check at
> > [b] above.  I think we need to keep it.
> 
> * ie. Because I removed the mapped-ram check, so tls/file channels are
> handled by multifd_recv_new_channel()? No, that's not the case.
> Rather, that is how it works currently. I have not changed anything,
> only made it more explicit to see that when it is tls/file channel,
> handle it as a CH_MULTIFD type. Looking at the current code, one can
> not see clearly how tls/file channels are handled.
> 
> > Why TLS channels don't send magic?
> 
> * Probably because they do TLS hand-shake while establishing a connection?

I meant tls channels should have these magics too.  Do you mean they're
not?

> 
> > > because if multifd page is getting late, that network
> > > latency should affect 'postcopy' channel too, no? But still if it is
> >
> > I don't think so.  postcopy doesn't use any multifd channels.
> 
> * Yes, but it uses the same wire/network.
> 
> > > possible, do we want to call - multifd_ram_flush_and_sync() before
> > > postcopy_start()? Will that help?  I'll check if/how it works.
> >
> > Note that all things flushed may or may not be enough, because IIUC the
> > flush only makes sure all threads are synced.
> 
> * We are again using 'flush' and 'sync' interchangeably. What does -
> flush only makes sure all threads are synced - mean really? Is it not
> writing all socket data onto the wire/channel?
> 
> * Flush should write all socket data onto the network wire/channel.
> The _order_ in which threads flush/write their socket data onto the
> wire/channel is to synchronise them, maintaining/controlling that
> _order_ is not flush.
> 
> > It may not make sure the order of things to happen in multifd threads and postcopy thread.  The
> > latter is what we need - we need to make sure no page land in postcopy threads.
> 
> * Let's see:
>    1) When migration is in Postcopy mode, ram_save_multifd_page() is
> _not_ called on the source side. ie. no multifd data gets enqueued on
> the multifd queue.
>        1.1) multifd_queue_page() function also calls multifd_send() if
> the queue is full, before enqueueing new pages.
>    2) If a multifd page reaches the destination during Postcopy mode,
> it must have been sent/written on the multifd channel before Postcopy
> mode started, right?

Yes.

>    3) In this case, writing/flushing all multifd socket data onto the
> wire/channel, before calling postcopy_start() function should help
> IIUC.
>        3.1) ie. calling multifd_send() before postcopy_start() should
> help to clear the multifd queue, before Postcopy begins.
>        3.2) Same can be done by - multifd_ram_flush_and_sync() ->
> multifd_send() - sequence.

No I don't think so.

Flushing sending side makes sure send buffer is empty.  It doesn't
guarantee recv buffer is empty on the other side.

> 
> * If all multifd pages are sent/written/flushed onto the multifd
> channels before postcopy_start() is called, then multifd pages should
> not arrive at the destination after postcopy starts IIUC.  If that is
> happening, we need a reproducer for such a case. Do we have such a
> reproducer?

With or without a reproducer, we need to at least justify it in theory.  If
it doesn't even work in theory, it's a problem.

> 
> > That's why I was requesting to add an assert() in multifd recv thread to
> > make sure we will never receive a page during postcopy.
> 
> * ie. Add  assert(!migrate_in_postcopy())  in multifd_recv_thread()
> function?  Okay.

Yes, probably before multifd_recv_state->ops->recv().

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-02-10 16:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-05 12:27 [PATCH v5 0/5] Allow to enable multifd and postcopy migration together Prasad Pandit
2025-02-05 12:27 ` [PATCH v5 1/5] migration/multifd: move macros to multifd header Prasad Pandit
2025-02-06 22:41   ` Peter Xu
2025-02-05 12:27 ` [PATCH v5 2/5] migration: refactor ram_save_target_page functions Prasad Pandit
2025-02-06 22:43   ` Peter Xu
2025-02-07 12:19     ` Fabiano Rosas
2025-02-05 12:27 ` [PATCH v5 3/5] migration: enable multifd and postcopy together Prasad Pandit
2025-02-06 23:16   ` Peter Xu
2025-02-07 10:32     ` Prasad Pandit
2025-02-07 15:45       ` Peter Xu
2025-02-08 10:36         ` Prasad Pandit
2025-02-10 16:59           ` Peter Xu [this message]
2025-02-11  9:04             ` Prasad Pandit
2025-02-11 15:20               ` Peter Xu
2025-02-12 13:27                 ` Prasad Pandit
2025-02-12 14:37                   ` Peter Xu
2025-02-12 17:36                     ` Prasad Pandit
2025-02-12 17:52                       ` Peter Xu
2025-02-05 12:27 ` [PATCH v5 4/5] tests/qtest/migration: add postcopy tests with multifd Prasad Pandit
2025-02-05 12:27 ` [PATCH v5 5/5] tests/qtest/migration: consolidate set capabilities Prasad Pandit
2025-02-06 22:44   ` Peter Xu
2025-02-07  8:59     ` Prasad Pandit
2025-02-07 22:01   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6owYoktb5nk2yRw@x1.local \
    --to=peterx@redhat.com \
    --cc=berrange@redhat.com \
    --cc=farosas@suse.de \
    --cc=pjp@fedoraproject.org \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.