Re: [RFC] Improving userfaultfd scalability for live migration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: James Houghton <jthoughton@google.com>
Cc: David Matlack <dmatlack@google.com>, Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Linux MM <linux-mm@kvack.org>, kvm <kvm@vger.kernel.org>,
	chao.p.peng@linux.intel.com
Subject: Re: [RFC] Improving userfaultfd scalability for live migration
Date: Tue, 6 Dec 2022 18:00:53 +0000	[thread overview]
Message-ID: <Y4+DVdq1Pj3k4Nyz@google.com> (raw)
In-Reply-To: <CADrL8HVM1poR5EYCsghhMMoN2U+FYT6yZr_5hZ8pLZTXpLnu8Q@mail.gmail.com>

On Tue, Dec 06, 2022, James Houghton wrote:
> On Mon, Dec 5, 2022 at 8:06 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Mon, Dec 05, 2022, James Houghton wrote:
> > > On Mon, Dec 5, 2022 at 1:20 PM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Mon, Dec 05, 2022, David Matlack wrote:
> > > > > On Mon, Dec 5, 2022 at 7:30 AM Peter Xu <peterx@redhat.com> wrote:
> > > > > > ...
> > > > > > I'll have a closer read on the nested part, but note that this path already
> > > > > > has the mmap lock then it invalidates the goal if we want to avoid taking
> > > > > > it from the first place, or maybe we don't care?
> > >
> > > Not taking the mmap lock would be helpful, but we still have to take
> > > it in UFFDIO_CONTINUE, so it's ok if we have to still take it here.
> >
> > IIUC, Peter is suggesting that the kernel not even get to the point where UFFD
> > is involved.  The "fault" would get propagated to userspace by KVM, userspace
> > fixes the fault (gets the page from the source, does MADV_POPULATE_WRITE), and
> > resumes the vCPU.
> 
> If we haven't UFFDIO_CONTINUE'd some address range yet,
> MADV_POPULATE_WRITE for that range will drop into handle_userfault and
> go to sleep. Not good!

Ah, right, userspace would still need to register UFFD for the region to handle
non-KVM (or incompatible KVM) accesses and could loop back on itself.

> So, going with the no-slow-GUP approach, resolving faults is done like this:
> - If we haven't UFFDIO_CONTINUE'd yet, do that now and restart
> KVM_RUN. The PTEs will be none/blank right now. This is the common
> case.
> - If we have UFFDIO_CONTINUE'd already, if we were to do it again, we
> would get EEXIST. (In this case, we probably have some type of swap
> entry in the page tables.) We have to change the page tables to make
> fast GUP succeed now *without* using UFFDIO_CONTINUE now.
> MADV_POPULATE_WRITE seems to be the right tool for the job. This case
> happens if the kernel has swapped the memory out, is migrating it, has
> poisoned it, etc. If MADV_POPULATE_WRITE fails, we probably need to
> crash or inject a memory error.
> 
> So with this approach, we never need to take the mmap_lock for reading
> in hva_to_pfn, but we still need to take it in UFFDIO_CONTINUE.
> Without removing the mmap_lock from *both*, we don't gain much.
> 
> So if we disregard this tiny mmap_lock benefit, the other approach
> (the PF_NO_UFFD_WAIT approach) seems better.

Can you elaborate on what makes it better?  Or maybe generate a list of pros and
cons?  I can think of (dis)advantages for both approaches, but I haven't identified
anything that would be a blocking issue for either approach.  Doesn't mean there
isn't one or more blocking issues, just that I haven't thought of any :-)

> When KVM_RUN exits:
> - If we haven't UFFDIO_CONTINUE'd yet, do that now and restart KVM_RUN.
> - If we have, then something bad has happened. Slow GUP already ran
> and failed, so we need to treat this in the same way we treat a
> MADV_POPULATE_WRITE failure above: userspace might just want to crash
> (or inject a memory error or something).
> 
> - James

next prev parent reply	other threads:[~2022-12-06 18:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-01 19:37 [RFC] Improving userfaultfd scalability for live migration James Houghton
2022-12-03  1:03 ` Sean Christopherson
2022-12-05 15:27   ` Peter Xu
2022-12-05 17:31     ` David Matlack
2022-12-05 18:03       ` David Matlack
2022-12-05 18:23         ` Sean Christopherson
2022-12-05 18:20       ` Sean Christopherson
2022-12-05 21:19         ` James Houghton
2022-12-06  1:06           ` Sean Christopherson
2022-12-06 17:35             ` James Houghton
2022-12-06 18:00               ` Sean Christopherson [this message]
2022-12-06 20:41                 ` James Houghton
2022-12-08  1:56                   ` David Matlack
2022-12-08 17:50                     ` James Houghton
2023-01-04  0:57                       ` Sean Christopherson
2023-01-04  1:05                         ` James Houghton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y4+DVdq1Pj3k4Nyz@google.com \
    --to=seanjc@google.com \
    --cc=aarcange@redhat.com \
    --cc=axelrasmussen@google.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dmatlack@google.com \
    --cc=jthoughton@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.