From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>,
qemu-devel@nongnu.org,
Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH RFC 14/15] migration: Postcopy preemption on separate channel
Date: Tue, 8 Feb 2022 11:24:14 +0000 [thread overview]
Message-ID: <YgJS3qUuyopB+JFZ@work-vm> (raw)
In-Reply-To: <YgHv/4Ep4JUhfLB4@xz-m1.local>
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Feb 03, 2022 at 05:45:32PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > This patch enables postcopy-preempt feature.
> > >
> > > It contains two major changes to the migration logic:
> > >
> > > (1) Postcopy requests are now sent via a different socket from precopy
> > > background migration stream, so as to be isolated from very high page
> > > request delays
> > >
> > > (2) For huge page enabled hosts: when there's postcopy requests, they can now
> > > intercept a partial sending of huge host pages on src QEMU.
> > >
> > > After this patch, we'll have two "channels" (or say, sockets, because it's only
> > > supported on socket-based channels) for postcopy: (1) PRECOPY channel (which is
> > > the default channel that transfers background pages), and (2) POSTCOPY
> > > channel (which only transfers requested pages).
> > >
> > > On the source QEMU, when we found a postcopy request, we'll interrupt the
> > > PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
> > > After we serviced all the high priority postcopy pages, we'll switch back to
> > > PRECOPY channel so that we'll continue to send the interrupted huge page again.
> > > There's no new thread introduced.
> > >
> > > On the destination QEMU, one new thread is introduced to receive page data from
> > > the postcopy specific socket.
> > >
> > > This patch has a side effect. After sending postcopy pages, previously we'll
> > > assume the guest will access follow up pages so we'll keep sending from there.
> > > Now it's changed. Instead of going on with a postcopy requested page, we'll go
> > > back and continue sending the precopy huge page (which can be intercepted by a
> > > postcopy request so the huge page can be sent partially before).
> > >
> > > Whether that's a problem is debatable, because "assuming the guest will
> > > continue to access the next page" doesn't really suite when huge pages are
> > > used, especially if the huge page is large (e.g. 1GB pages). So that locality
> > > hint is much meaningless if huge pages are used.
> > >
> > > If postcopy preempt is enabled, a separate channel is created for it so that it
> > > can be used later for postcopy specific page requests. On dst node, a
> > > standalone thread is used to receive postcopy requested pages. The thread is
> > > created along with the ram listen thread during POSTCOPY_LISTEN phase.
> >
> > I think this patch could do with being split into two; the first one that
> > deals with closing/opening channels; and the second that handles the
> > data on the two channels and does the preemption.
>
> Sounds good, I'll give it a shot on the split.
>
> >
> > Another thought is whether, if in the future we allow multifd +
> > postcopy, the multifd code would change - I think it would end up closer
> > to using multiple channels taking different pages on each one.
>
> Right, so potentially the postcopy channels can be multi-threaded too itself.
>
> We've had a quick discussion on irc, just to recap: I didn't reuse multifd
> infra because IMO multifd is designed with below ideas in mind:
>
> (1) Every multifd thread is equal
> (2) Throughput oriented
>
> However I found that postcopy needs something different when they're mixed up
> together with multifd.
>
> Firstly, we will have some channels sending as much as we could where latency
> is not an issue (aka background pages). However it's not suitable for page
> requests, so we could also have channels that are servicing page faults fron
> dst. In short, there're two types of channels/threads we want, and we may want
> to treat them differently.
>
> The current model is we only have 1 postcopy channel and 1 precopy channel, but
> it should be easier if we want to make it N post + 1 pre base on this series.
It's not clear to me if we need to be able to do N post + M pre, or
whether we have a rule like always at least 1 post, but if there's more
pagefaults in the queue then you can steal all of the pre channels.
> So far all send() is still done in the migration thread so no new sender thread
> but 1 more receiver thread only. If we want to grow that 1->N for postcopy
> channels we may want to move that out too just like what we do with multifd.
> Not sure whether there can be something reused around. That's where I haven't
> yet explored, but this series should already share a common piece of code on
> refactoring of things like tmp huge page on dst node to be able to receive with
> multiple huge pages.
Right; it makes me think the multifd+postcopy should just use channels.
> This also reminded me that, instead of a new capability, should I simply expose
> a parameter "postcopy-channels=N" to CLI so that we can be prepared with multi
> postcopy channels?
I'm not sure we know enough yet about what configuration it would have;
I'd be tempted to just make it work for the user by enabling both
multifd and preemption and then using this new mechanism rather than
having to add yet another parameter.
Dave
> >
> >
> > Do we need to do anything in psotcopy recovery ?
>
> Yes. It's a todo (in the cover letter), if the whole thing looks sane I'll add
> that together in the non-rfc series.
>
> Thanks,
>
> --
> Peter Xu
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2022-02-08 11:59 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 8:09 [PATCH RFC 00/15] migration: Postcopy Preemption Peter Xu
2022-01-19 8:09 ` [PATCH RFC 01/15] migration: No off-by-one for pss->page update in host page size Peter Xu
2022-01-19 12:58 ` Dr. David Alan Gilbert
2022-01-27 9:40 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 02/15] migration: Allow pss->page jump over clean pages Peter Xu
2022-01-19 13:42 ` Dr. David Alan Gilbert
2022-01-20 2:12 ` Peter Xu
2022-02-03 18:19 ` Dr. David Alan Gilbert
2022-02-08 3:20 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 03/15] migration: Enable UFFD_FEATURE_THREAD_ID even without blocktime feat Peter Xu
2022-01-19 14:15 ` Dr. David Alan Gilbert
2022-01-27 9:40 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 04/15] migration: Add postcopy_has_request() Peter Xu
2022-01-19 14:27 ` Dr. David Alan Gilbert
2022-01-27 9:41 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 05/15] migration: Simplify unqueue_page() Peter Xu
2022-01-19 16:36 ` Dr. David Alan Gilbert
2022-01-20 2:23 ` Peter Xu
2022-01-25 11:01 ` Dr. David Alan Gilbert
2022-01-27 9:41 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 06/15] migration: Move temp page setup and cleanup into separate functions Peter Xu
2022-01-19 16:58 ` Dr. David Alan Gilbert
2022-01-27 9:43 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 07/15] migration: Introduce postcopy channels on dest node Peter Xu
2022-02-03 15:08 ` Dr. David Alan Gilbert
2022-02-08 3:27 ` Peter Xu
2022-02-08 9:43 ` Dr. David Alan Gilbert
2022-02-08 10:07 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 08/15] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
2022-02-03 15:15 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 09/15] migration: Add postcopy_thread_create() Peter Xu
2022-02-03 15:19 ` Dr. David Alan Gilbert
2022-02-08 3:37 ` Peter Xu
2022-02-08 11:16 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 10/15] migration: Move static var in ram_block_from_stream() into global Peter Xu
2022-02-03 17:48 ` Dr. David Alan Gilbert
2022-02-08 3:51 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 11/15] migration: Add pss.postcopy_requested status Peter Xu
2022-02-03 15:42 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 12/15] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
2022-02-03 15:44 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 13/15] migration: Add postcopy-preempt capability Peter Xu
2022-02-03 15:46 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 14/15] migration: Postcopy preemption on separate channel Peter Xu
2022-02-03 17:45 ` Dr. David Alan Gilbert
2022-02-08 4:22 ` Peter Xu
2022-02-08 11:24 ` Dr. David Alan Gilbert [this message]
2022-02-08 11:39 ` Peter Xu
2022-02-08 13:23 ` Dr. David Alan Gilbert
2022-02-09 2:16 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 15/15] tests: Add postcopy preempt test Peter Xu
2022-02-03 15:53 ` Dr. David Alan Gilbert
2022-01-19 12:32 ` [PATCH RFC 00/15] migration: Postcopy Preemption Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YgJS3qUuyopB+JFZ@work-vm \
--to=dgilbert@redhat.com \
--cc=lsoaresp@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.