From: David Matlack <dmatlack@google.com>
To: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>, Anish Moorthy <amoorthy@google.com>,
Sean Christopherson <seanjc@google.com>,
Nadav Amit <nadav.amit@gmail.com>,
Paolo Bonzini <pbonzini@redhat.com>,
maz@kernel.org, oliver.upton@linux.dev,
James Houghton <jthoughton@google.com>,
bgardon@google.com, ricarkol@google.com,
kvm <kvm@vger.kernel.org>,
kvmarm@lists.linux.dev
Subject: Re: [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults.
Date: Thu, 11 May 2023 12:05:10 -0700 [thread overview]
Message-ID: <ZF08Zg90emoDJJIp@google.com> (raw)
In-Reply-To: <CAJHvVci4VuQ_vdpRKczg4ic6x7eZRXE4+ZUvzO-xU_9VJ1Vqvg@mail.gmail.com>
On Thu, May 11, 2023 at 10:33:24AM -0700, Axel Rasmussen wrote:
> On Thu, May 11, 2023 at 10:18 AM David Matlack <dmatlack@google.com> wrote:
> >
> > On Wed, May 10, 2023 at 2:50 PM Peter Xu <peterx@redhat.com> wrote:
> > > On Tue, May 09, 2023 at 01:52:05PM -0700, Anish Moorthy wrote:
> > > > On Sun, May 7, 2023 at 6:23 PM Peter Xu <peterx@redhat.com> wrote:
> > >
> > > What I wanted to do is to understand whether there's still chance to
> > > provide a generic solution. I don't know why you have had a bunch of pmu
> > > stack showing in the graph, perhaps you forgot to disable some of the perf
> > > events when doing the test? Let me know if you figure out why it happened
> > > like that (so far I didn't see), but I feel guilty to keep overloading you
> > > with such questions.
> > >
> > > The major problem I had with this series is it's definitely not a clean
> > > approach. Say, even if you'll all rely on userapp you'll still need to
> > > rely on userfaultfd for kernel traps on corner cases or it just won't work.
> > > IIUC that's also the concern from Nadav.
> >
> > This is a long thread, so apologies if the following has already been discussed.
> >
> > Would per-tid userfaultfd support be a generic solution? i.e. Allow
> > userspace to create a userfaultfd that is tied to a specific task. Any
> > userfaults encountered by that task use that fd, rather than the
> > process-wide fd. I'm making the assumption here that each of these fds
> > would have independent signaling mechanisms/queues and so this would
> > solve the scaling problem.
> >
> > A VMM could use this to create 1 userfaultfd per vCPU and 1 thread per
> > vCPU for handling userfault requests. This seems like it'd have
> > roughly the same scalability characteristics as the KVM -EFAULT
> > approach.
>
> I think this would work in principle, but it's significantly different
> from what exists today.
>
> The splitting of userfaultfds Peter is describing is splitting up the
> HVA address space, not splitting per-thread.
>
> I think for this design, we'd need to change UFFD registration so
> multiple UFFDs can register the same VMA, but can be filtered so they
> only receive fault events caused by some particular tid(s).
>
> This might also incur some (small?) overhead, because in the fault
> path we now need to maintain some data structure so we can lookup
> which UFFD to notify based on a combination of the address and our
> tid. Today, since VMAs and UFFDs are 1:1 this lookup is trivial.
I was (perhaps naively) assuming the lookup would be as simple as:
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 44d1ee429eb0..e9856e2ba9ef 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -417,7 +417,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
*/
mmap_assert_locked(mm);
- ctx = vma->vm_userfaultfd_ctx.ctx;
+ if (current->userfaultfd_ctx)
+ ctx = current->userfaultfd_ctx;
+ else
+ ctx = vma->vm_userfaultfd_ctx.ctx;
if (!ctx)
goto out;
>
> I think it's worth keeping in mind that a selling point of Anish's
> approach is that it's a very small change. It's plausible we can come
> up with some alternative way to scale, but it seems to me everything
> suggested so far is likely to require a lot more code, complexity, and
> effort vs. Anish's approach.
Agreed.
Mostly I think the per-thread UFFD approach would add complexity on the
userspace side of things. With Anish's approach userspace is able to
trivially re-use the vCPU thread (and it's associated pCPU if pinned) to
handle the request. That gets more complicated when juggling the extra
paired threads.
The per-thread approach would requires a new userfault UAPI change which
I think is a higher bar than the KVM UAPI change proposed here.
The per-thread approach would require KVM call into slow GUP and take
the mmap_lock before contacting userspace. I'm not 100% convinced that's
a bad thing long term (e.g. it avoids the false-positive -EFAULT exits
in Anish's proposal), but could have performance implications.
Lastly, inter-thread communication is likely slower than returning to
userspace from KVM_RUN. So the per-thread approach might increase the
end-to-end latency of demand fetches.
next prev parent reply other threads:[~2023-05-11 19:06 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-12 21:34 [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 01/22] KVM: selftests: Allow many vCPUs and reader threads per UFFD in demand paging test Anish Moorthy
2023-04-19 13:51 ` Hoo Robert
2023-04-20 17:55 ` Anish Moorthy
2023-04-21 12:15 ` Robert Hoo
2023-04-21 16:21 ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 02/22] KVM: selftests: Use EPOLL in userfaultfd_util reader threads and signal errors via TEST_ASSERT Anish Moorthy
2023-04-19 13:36 ` Hoo Robert
2023-04-19 23:26 ` Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 03/22] KVM: Allow hva_pfn_fast() to resolve read-only faults Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 04/22] KVM: x86: Set vCPU exit reason to KVM_EXIT_UNKNOWN at the start of KVM_RUN Anish Moorthy
2023-05-02 17:17 ` Anish Moorthy
2023-05-02 18:51 ` Sean Christopherson
2023-05-02 19:49 ` Anish Moorthy
2023-05-02 20:41 ` Sean Christopherson
2023-05-02 21:46 ` Anish Moorthy
2023-05-02 22:31 ` Sean Christopherson
2023-04-12 21:34 ` [PATCH v3 05/22] KVM: Add KVM_CAP_MEMORY_FAULT_INFO Anish Moorthy
2023-04-19 13:57 ` Hoo Robert
2023-04-20 18:09 ` Anish Moorthy
2023-04-21 12:28 ` Robert Hoo
2023-06-01 19:52 ` Oliver Upton
2023-06-01 20:30 ` Anish Moorthy
2023-06-01 21:29 ` Oliver Upton
2023-07-04 10:10 ` Kautuk Consul
2023-04-12 21:34 ` [PATCH v3 06/22] KVM: Add docstrings to __kvm_write_guest_page() and __kvm_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 07/22] KVM: Annotate -EFAULTs from kvm_vcpu_write_guest_page() Anish Moorthy
2023-04-20 20:52 ` Peter Xu
2023-04-20 23:29 ` Anish Moorthy
2023-04-21 15:00 ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 08/22] KVM: Annotate -EFAULTs from kvm_vcpu_read_guest_page() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 09/22] KVM: Annotate -EFAULTs from kvm_vcpu_map() Anish Moorthy
2023-04-20 20:53 ` Peter Xu
2023-04-20 23:34 ` Anish Moorthy
2023-04-21 14:58 ` Peter Xu
2023-04-12 21:34 ` [PATCH v3 10/22] KVM: x86: Annotate -EFAULTs from kvm_mmu_page_fault() Anish Moorthy
2023-04-12 21:34 ` [PATCH v3 11/22] KVM: x86: Annotate -EFAULTs from setup_vmgexit_scratch() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 12/22] KVM: x86: Annotate -EFAULTs from kvm_handle_page_fault() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 13/22] KVM: x86: Annotate -EFAULTs from kvm_hv_get_assist_page() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 14/22] KVM: x86: Annotate -EFAULTs from kvm_pv_clock_pairing() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 15/22] KVM: x86: Annotate -EFAULTs from direct_map() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 16/22] KVM: x86: Annotate -EFAULTs from kvm_handle_error_pfn() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 17/22] KVM: Introduce KVM_CAP_ABSENT_MAPPING_FAULT without implementation Anish Moorthy
2023-04-19 14:00 ` Hoo Robert
2023-04-20 18:23 ` Anish Moorthy
2023-04-24 21:02 ` Sean Christopherson
2023-06-01 16:04 ` Oliver Upton
2023-06-01 18:19 ` Oliver Upton
2023-06-01 18:59 ` Sean Christopherson
2023-06-01 19:29 ` Oliver Upton
2023-06-01 19:34 ` Sean Christopherson
2023-04-12 21:35 ` [PATCH v3 18/22] KVM: x86: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 19/22] KVM: arm64: Annotate (some) -EFAULTs from user_mem_abort() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 20/22] KVM: arm64: Implement KVM_CAP_ABSENT_MAPPING_FAULT Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 21/22] KVM: selftests: Add memslot_flags parameter to memstress_create_vm() Anish Moorthy
2023-04-12 21:35 ` [PATCH v3 22/22] KVM: selftests: Handle memory fault exits in demand_paging_test Anish Moorthy
2023-04-19 14:09 ` Hoo Robert
2023-04-19 16:40 ` Anish Moorthy
2023-04-20 22:47 ` Anish Moorthy
2023-04-27 15:48 ` James Houghton
2023-05-01 18:01 ` Anish Moorthy
2023-04-19 19:55 ` [PATCH v3 00/22] Improve scalability of KVM + userfaultfd live migration via annotated memory faults Peter Xu
2023-04-19 20:15 ` Axel Rasmussen
2023-04-19 21:05 ` Peter Xu
[not found] ` <CAF7b7mo68VLNp=QynfT7QKgdq=d1YYGv1SEVEDxF9UwHzF6YDw@mail.gmail.com>
2023-04-20 21:29 ` Peter Xu
2023-04-21 16:58 ` Anish Moorthy
2023-04-21 17:39 ` Nadav Amit
2023-04-24 17:54 ` Anish Moorthy
2023-04-24 19:44 ` Nadav Amit
2023-04-24 20:35 ` Sean Christopherson
2023-04-24 23:47 ` Nadav Amit
2023-04-25 0:26 ` Sean Christopherson
2023-04-25 0:37 ` Nadav Amit
2023-04-25 0:15 ` Anish Moorthy
2023-04-25 0:54 ` Nadav Amit
2023-04-27 16:38 ` James Houghton
2023-04-27 20:26 ` Peter Xu
2023-05-03 19:45 ` Anish Moorthy
2023-05-03 20:09 ` Sean Christopherson
[not found] ` <ZFLPlRReglM/Vgfu@x1n>
2023-05-03 21:27 ` Peter Xu
2023-05-03 21:42 ` Sean Christopherson
2023-05-03 23:45 ` Peter Xu
2023-05-04 19:09 ` Peter Xu
2023-05-05 18:32 ` Anish Moorthy
2023-05-08 1:23 ` Peter Xu
2023-05-09 20:52 ` Anish Moorthy
2023-05-10 21:50 ` Peter Xu
2023-05-11 17:17 ` David Matlack
2023-05-11 17:33 ` Axel Rasmussen
2023-05-11 19:05 ` David Matlack [this message]
2023-05-11 19:45 ` Axel Rasmussen
2023-05-15 15:16 ` Peter Xu
2023-05-15 15:05 ` Peter Xu
2023-05-15 17:16 ` Anish Moorthy
2023-05-05 20:05 ` Nadav Amit
2023-05-08 1:12 ` Peter Xu
2023-04-20 23:42 ` Anish Moorthy
2023-05-09 22:19 ` David Matlack
2023-05-10 16:35 ` Anish Moorthy
2023-05-10 22:35 ` Sean Christopherson
2023-05-10 23:44 ` Anish Moorthy
2023-05-23 17:49 ` Anish Moorthy
2023-06-01 22:43 ` Oliver Upton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZF08Zg90emoDJJIp@google.com \
--to=dmatlack@google.com \
--cc=amoorthy@google.com \
--cc=axelrasmussen@google.com \
--cc=bgardon@google.com \
--cc=jthoughton@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=maz@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=ricarkol@google.com \
--cc=seanjc@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox