From: Peter Xu <peterx@redhat.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: qemu-devel@nongnu.org,
Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Manish Mishra <manish.mishra@nutanix.com>,
"Daniel P . Berrange" <berrange@redhat.com>
Subject: Re: [PATCH v6 10/13] migration: Respect postcopy request order in preemption mode
Date: Tue, 24 May 2022 14:42:41 -0400 [thread overview]
Message-ID: <Yo0nIYoTDFclTWmx@xz-m1.local> (raw)
In-Reply-To: <YotoTrRaTIaQdVR4@work-vm>
On Mon, May 23, 2022 at 11:56:14AM +0100, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > With preemption mode on, when we see a postcopy request that was requesting
> > for exactly the page that we have preempted before (so we've partially sent
> > the page already via PRECOPY channel and it got preempted by another
> > postcopy request), currently we drop the request so that after all the
> > other postcopy requests are serviced then we'll go back to precopy stream
> > and start to handle that.
> >
> > We dropped the request because we can't send it via postcopy channel since
> > the precopy channel already contains partial of the data, and we can only
> > send a huge page via one channel as a whole. We can't split a huge page
> > into two channels.
> >
> > That's a very corner case and that works, but there's a change on the order
> > of postcopy requests that we handle since we're postponing this (unlucky)
> > postcopy request to be later than the other queued postcopy requests. The
> > problem is there's a possibility that when the guest was very busy, the
> > postcopy queue can be always non-empty, it means this dropped request will
> > never be handled until the end of postcopy migration. So, there's a chance
> > that there's one dest QEMU vcpu thread waiting for a page fault for an
> > extremely long time just because it's unluckily accessing the specific page
> > that was preempted before.
> >
> > The worst case time it needs can be as long as the whole postcopy migration
> > procedure. It's extremely unlikely to happen, but when it happens it's not
> > good.
> >
> > The root cause of this problem is because we treat pss->postcopy_requested
> > variable as with two meanings bound together, as the variable shows:
> >
> > 1. Whether this page request is urgent, and,
> > 2. Which channel we should use for this page request.
> >
> > With the old code, when we set postcopy_requested it means either both (1)
> > and (2) are true, or both (1) and (2) are false. We can never have (1)
> > and (2) to have different values.
> >
> > However it doesn't necessarily need to be like that. It's very legal that
> > there's one request that has (1) very high urgency, but (2) we'd like to
> > use the precopy channel. Just like the corner case we were discussing
> > above.
> >
> > To differenciate the two meanings better, introduce a new field called
> > postcopy_target_channel, showing which channel we should use for this page
> > request, so as to cover the old meaning (2) only. Then we leave the
> > postcopy_requested variable to stand only for meaning (1), which is the
> > urgency of this page request.
> >
> > With this change, we can easily boost priority of a preempted precopy page
> > as long as we know that page is also requested as a postcopy page. So with
> > the new approach in get_queued_page() instead of dropping that request, we
> > send it right away with the precopy channel so we get back the ordering of
> > the page faults just like how they're requested on dest.
> >
> > Alongside, I touched up find_dirty_block() to only set the postcopy fields
> > in the pss section if we're going through a postcopy migration. That's a
> > very light optimization and shouldn't affect much.
> >
> > Reported-by: manish.mishra@nutanix.com
> > Signed-off-by: Peter Xu <peterx@redhat.com>
>
> So I think this is OK; getting a bit complicated!
Yes it is. I added some more comment, hopefully it'll help a little bit.
>
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Thanks!
> > static bool find_dirty_block(RAMState *rs, PageSearchStatus *pss, bool *again)
> > {
> > - /* This is not a postcopy requested page */
> > - pss->postcopy_requested = false;
> > + if (migration_in_postcopy()) {
> > + /*
> > + * This is not a postcopy requested page, mark it "not urgent", and
> > + * use precopy channel to send it.
> > + */
> > + pss->postcopy_requested = false;
> > + pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
> > + }
>
> Do you need the 'if' here?
Hmm good question.. precopy should always have these two fields cleared
anyway so I wanted to avoid setting them every time, but I just noticed
that pss is not initialized at all when used..
static int ram_find_and_save_block(RAMState *rs)
{
PageSearchStatus pss;
...
}
So either we'd reset pss explicitly on these fields, or simpler - let me
drop the if..
Thanks,
--
Peter Xu
next prev parent reply other threads:[~2022-05-24 18:44 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-17 19:57 [PATCH v6 00/13] migration: Postcopy Preemption Peter Xu
2022-05-17 19:57 ` [PATCH v6 01/13] migration: Add postcopy-preempt capability Peter Xu
2022-05-17 19:57 ` [PATCH v6 02/13] migration: Postcopy preemption preparation on channel creation Peter Xu
2022-05-17 19:57 ` [PATCH v6 03/13] migration: Postcopy preemption enablement Peter Xu
2022-05-17 19:57 ` [PATCH v6 04/13] migration: Postcopy recover with preempt enabled Peter Xu
2022-05-17 19:57 ` [PATCH v6 05/13] migration: Create the postcopy preempt channel asynchronously Peter Xu
2022-05-17 19:57 ` [PATCH v6 06/13] migration: Add property x-postcopy-preempt-break-huge Peter Xu
2022-05-17 19:57 ` [PATCH v6 07/13] migration: Add helpers to detect TLS capability Peter Xu
2022-05-18 8:57 ` Daniel P. Berrangé
2022-05-18 13:04 ` Peter Xu
2022-05-17 19:57 ` [PATCH v6 08/13] migration: Export tls-[creds|hostname|authz] params to cmdline too Peter Xu
2022-05-18 14:05 ` Daniel P. Berrangé
2022-05-17 19:57 ` [PATCH v6 09/13] migration: Enable TLS for preempt channel Peter Xu
2022-05-18 14:07 ` Daniel P. Berrangé
2022-05-17 19:57 ` [PATCH v6 10/13] migration: Respect postcopy request order in preemption mode Peter Xu
2022-05-23 10:56 ` Dr. David Alan Gilbert
2022-05-23 17:18 ` manish.mishra
2022-05-24 18:42 ` Peter Xu [this message]
2022-05-17 19:57 ` [PATCH v6 11/13] tests: Add postcopy tls migration test Peter Xu
2022-05-19 9:45 ` Dr. David Alan Gilbert
2022-05-19 10:11 ` Daniel P. Berrangé
2022-05-24 21:06 ` Peter Xu
2022-05-17 19:57 ` [PATCH v6 12/13] tests: Add postcopy tls recovery " Peter Xu
2022-05-19 9:34 ` Dr. David Alan Gilbert
2022-05-17 19:57 ` [PATCH v6 13/13] tests: Add postcopy preempt tests Peter Xu
2022-05-19 8:58 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yo0nIYoTDFclTWmx@xz-m1.local \
--to=peterx@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lsoaresp@redhat.com \
--cc=manish.mishra@nutanix.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.