From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>,
qemu-devel@nongnu.org,
Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH RFC 14/15] migration: Postcopy preemption on separate channel
Date: Tue, 8 Feb 2022 11:24:14 +0000 [thread overview]
Message-ID: <YgJS3qUuyopB+JFZ@work-vm> (raw)
In-Reply-To: <YgHv/4Ep4JUhfLB4@xz-m1.local>
* Peter Xu (peterx@redhat.com) wrote:
> On Thu, Feb 03, 2022 at 05:45:32PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > This patch enables postcopy-preempt feature.
> > >
> > > It contains two major changes to the migration logic:
> > >
> > > (1) Postcopy requests are now sent via a different socket from precopy
> > > background migration stream, so as to be isolated from very high page
> > > request delays
> > >
> > > (2) For huge page enabled hosts: when there's postcopy requests, they can now
> > > intercept a partial sending of huge host pages on src QEMU.
> > >
> > > After this patch, we'll have two "channels" (or say, sockets, because it's only
> > > supported on socket-based channels) for postcopy: (1) PRECOPY channel (which is
> > > the default channel that transfers background pages), and (2) POSTCOPY
> > > channel (which only transfers requested pages).
> > >
> > > On the source QEMU, when we found a postcopy request, we'll interrupt the
> > > PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
> > > After we serviced all the high priority postcopy pages, we'll switch back to
> > > PRECOPY channel so that we'll continue to send the interrupted huge page again.
> > > There's no new thread introduced.
> > >
> > > On the destination QEMU, one new thread is introduced to receive page data from
> > > the postcopy specific socket.
> > >
> > > This patch has a side effect. After sending postcopy pages, previously we'll
> > > assume the guest will access follow up pages so we'll keep sending from there.
> > > Now it's changed. Instead of going on with a postcopy requested page, we'll go
> > > back and continue sending the precopy huge page (which can be intercepted by a
> > > postcopy request so the huge page can be sent partially before).
> > >
> > > Whether that's a problem is debatable, because "assuming the guest will
> > > continue to access the next page" doesn't really suite when huge pages are
> > > used, especially if the huge page is large (e.g. 1GB pages). So that locality
> > > hint is much meaningless if huge pages are used.
> > >
> > > If postcopy preempt is enabled, a separate channel is created for it so that it
> > > can be used later for postcopy specific page requests. On dst node, a
> > > standalone thread is used to receive postcopy requested pages. The thread is
> > > created along with the ram listen thread during POSTCOPY_LISTEN phase.
> >
> > I think this patch could do with being split into two; the first one that
> > deals with closing/opening channels; and the second that handles the
> > data on the two channels and does the preemption.
>
> Sounds good, I'll give it a shot on the split.
>
> >
> > Another thought is whether, if in the future we allow multifd +
> > postcopy, the multifd code would change - I think it would end up closer
> > to using multiple channels taking different pages on each one.
>
> Right, so potentially the postcopy channels can be multi-threaded too itself.
>
> We've had a quick discussion on irc, just to recap: I didn't reuse multifd
> infra because IMO multifd is designed with below ideas in mind:
>
> (1) Every multifd thread is equal
> (2) Throughput oriented
>
> However I found that postcopy needs something different when they're mixed up
> together with multifd.
>
> Firstly, we will have some channels sending as much as we could where latency
> is not an issue (aka background pages). However it's not suitable for page
> requests, so we could also have channels that are servicing page faults fron
> dst. In short, there're two types of channels/threads we want, and we may want
> to treat them differently.
>
> The current model is we only have 1 postcopy channel and 1 precopy channel, but
> it should be easier if we want to make it N post + 1 pre base on this series.
It's not clear to me if we need to be able to do N post + M pre, or
whether we have a rule like always at least 1 post, but if there's more
pagefaults in the queue then you can steal all of the pre channels.
> So far all send() is still done in the migration thread so no new sender thread
> but 1 more receiver thread only. If we want to grow that 1->N for postcopy
> channels we may want to move that out too just like what we do with multifd.
> Not sure whether there can be something reused around. That's where I haven't
> yet explored, but this series should already share a common piece of code on
> refactoring of things like tmp huge page on dst node to be able to receive with
> multiple huge pages.
Right; it makes me think the multifd+postcopy should just use channels.
> This also reminded me that, instead of a new capability, should I simply expose
> a parameter "postcopy-channels=N" to CLI so that we can be prepared with multi
> postcopy channels?
I'm not sure we know enough yet about what configuration it would have;
I'd be tempted to just make it work for the user by enabling both
multifd and preemption and then using this new mechanism rather than
having to add yet another parameter.
Dave
> >
> >
> > Do we need to do anything in psotcopy recovery ?
>
> Yes. It's a todo (in the cover letter), if the whole thing looks sane I'll add
> that together in the non-rfc series.
>
> Thanks,
>
> --
> Peter Xu
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2022-02-08 11:59 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-19 8:09 [PATCH RFC 00/15] migration: Postcopy Preemption Peter Xu
2022-01-19 8:09 ` [PATCH RFC 01/15] migration: No off-by-one for pss->page update in host page size Peter Xu
2022-01-19 12:58 ` Dr. David Alan Gilbert
2022-01-27 9:40 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 02/15] migration: Allow pss->page jump over clean pages Peter Xu
2022-01-19 13:42 ` Dr. David Alan Gilbert
2022-01-20 2:12 ` Peter Xu
2022-02-03 18:19 ` Dr. David Alan Gilbert
2022-02-08 3:20 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 03/15] migration: Enable UFFD_FEATURE_THREAD_ID even without blocktime feat Peter Xu
2022-01-19 14:15 ` Dr. David Alan Gilbert
2022-01-27 9:40 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 04/15] migration: Add postcopy_has_request() Peter Xu
2022-01-19 14:27 ` Dr. David Alan Gilbert
2022-01-27 9:41 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 05/15] migration: Simplify unqueue_page() Peter Xu
2022-01-19 16:36 ` Dr. David Alan Gilbert
2022-01-20 2:23 ` Peter Xu
2022-01-25 11:01 ` Dr. David Alan Gilbert
2022-01-27 9:41 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 06/15] migration: Move temp page setup and cleanup into separate functions Peter Xu
2022-01-19 16:58 ` Dr. David Alan Gilbert
2022-01-27 9:43 ` Juan Quintela
2022-01-19 8:09 ` [PATCH RFC 07/15] migration: Introduce postcopy channels on dest node Peter Xu
2022-02-03 15:08 ` Dr. David Alan Gilbert
2022-02-08 3:27 ` Peter Xu
2022-02-08 9:43 ` Dr. David Alan Gilbert
2022-02-08 10:07 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 08/15] migration: Dump ramblock and offset too when non-same-page detected Peter Xu
2022-02-03 15:15 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 09/15] migration: Add postcopy_thread_create() Peter Xu
2022-02-03 15:19 ` Dr. David Alan Gilbert
2022-02-08 3:37 ` Peter Xu
2022-02-08 11:16 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 10/15] migration: Move static var in ram_block_from_stream() into global Peter Xu
2022-02-03 17:48 ` Dr. David Alan Gilbert
2022-02-08 3:51 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 11/15] migration: Add pss.postcopy_requested status Peter Xu
2022-02-03 15:42 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 12/15] migration: Move migrate_allow_multifd and helpers into migration.c Peter Xu
2022-02-03 15:44 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 13/15] migration: Add postcopy-preempt capability Peter Xu
2022-02-03 15:46 ` Dr. David Alan Gilbert
2022-01-19 8:09 ` [PATCH RFC 14/15] migration: Postcopy preemption on separate channel Peter Xu
2022-02-03 17:45 ` Dr. David Alan Gilbert
2022-02-08 4:22 ` Peter Xu
2022-02-08 11:24 ` Dr. David Alan Gilbert [this message]
2022-02-08 11:39 ` Peter Xu
2022-02-08 13:23 ` Dr. David Alan Gilbert
2022-02-09 2:16 ` Peter Xu
2022-01-19 8:09 ` [PATCH RFC 15/15] tests: Add postcopy preempt test Peter Xu
2022-02-03 15:53 ` Dr. David Alan Gilbert
2022-01-19 12:32 ` [PATCH RFC 00/15] migration: Postcopy Preemption Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YgJS3qUuyopB+JFZ@work-vm \
--to=dgilbert@redhat.com \
--cc=lsoaresp@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).